[Seminar] Ensuring Reliability of Computer Hardware Systems
More than ever, the electronic devices that are critical to everyday personal life, to social infrastructures, and to national defense are dependent on computer systems that include increasingly sophisticated System-on-Chips (SoCs). As SoC designs for digital computer systems become increasingly complex and semiconductor process geometries shrink to nanometers, there is an emerging reliability issue, in which the occurrences of run-time defects exposed during user system operation after production tests will be on the rise. Run-time defects in hardware modules of SoCs result in losses of user data, operational contexts, and/or system services. International Technology Roadmap for Semiconductors (ITRS) forecasts that there are a number of physical and circuit phenomena in the future generation of semiconductors that will increase occurrences of run-time defects. In response, further extensions of system functionalities against the run-time defects are strongly needed to provide in-system defect detection and recovery. However, conventional online test methods to detect run-time defects have had critical limitations in testing time, software/hardware cooperation, and/or non-justifiable additional hardware costs that lower their feasibility for a broad range of applications.
My research on SoC architectures enables flexible in-system testing and recovery during normal system operation. The proposed approach provides a comprehensive SoC architectural method that performs testing to detect the run-time defects, mitigates them, quarantines the defect hardware functional module, and regenerates the lost function to continue post-detection system operation. The proposed architectures have been implemented in two experimental SoC designs based on the ARM processor and AMBA SoC bus architecture. Additional hardware costs are 3.3 and 1.2 percent of the total SoC hardware resources for in-system testing and recovery, respectively. The proposed methods have been emulated, demonstrated, and verified experimentally in an SoC test-bed.
Dr. Lok-Won Kim earned his Ph.D. from UCLA in 2011, where he has conducted research on security and reliability of hardware systems, digital designs (optimized for performance, area, power, timing, etc), automated design optimization, hardware based accelerators of machine learning algorithms, and reconfigurable computing. He authored and co-authored 6 journal, and 7 conference papers, and 2 US and 4 Korean patents. He has over eight years of professional experience in Apple, Cisco systems, Korea Electronics Technology Institute (KETI), and Hynix semiconductor that he has played a key role in design and verification of SoC designs. He was the recipient of the CAP award (Cisco Achievement Program award in the recognition of outstanding employee effort and achievement) at Cisco Systems. During his Ph.D. studies, Dr. Kim held research internship at IBM T. J. Watson research center at Yorktown Height, NY, where he performed research on highly-scalable and high performance hardware architecture for an artificial neural network algorithm. He also held a research internship at Broadcom, Irvine, CA, where he conducted research on design methodologies using automated RTL generators. He is currently with Apple Inc. at Cupertino, CA, where he is leading design and verification of application processors (or SoCs) used in iPhone, iPad, and another new product.