Addressing Silent Data Corruption (SDC) with In-System Embedded Deterministic Testing
Source
Published
TL;DR
AI GeneratedSilent Data Corruption (SDC) is a critical issue in semiconductor design, especially in high-performance computing environments like AI data centers, where errors can occur without detection, leading to significant failures. In a presentation by Broadcom Inc. and Siemens EDA, it was revealed that SDC incidents can disrupt mission-critical operations, such as AI model training. To combat this, in-system testing capabilities are recommended for periodic checks without system downtime, with Siemens’ In-System Test (IST) solution being a key enabler. Challenges in design implementation, such as functional isolation and clock splitting, were addressed in Broadcom's methodology, showcasing the successful implementation and verification of the solution. The collaboration highlights the importance of embedded deterministic testing in improving reliability and reducing field failures in next-generation silicon.