[Seminar] Addressing Challenges and Opportunities in Memory Subsystem

Sungjoo Yoo
Friday, May 16th 2014, 10:30am

문의: 통합설계 및 병렬처리 연구실 (880-7292)


Memory hierarchy requires new architecture and technology due to increasing demand of bandwidth and low power consumption. High memory bandwidth demands require large on-chip last-level cache (LLC). Conventional SRAM-based LLC suffers from large leakage power consumption as its capacity increases. Fast non-volatile memory namely spin-transfer torque RAM (STT-RAM) is a strong LLC candidate to exploit its benefits of low leakage and fast read access. However, the high write power consumption prevents STT-RAM from being applied, especially to programs having frequent data updates. In our work, we exploit dead writes to minimize LLC writes. We propose classifying dead writes into three types, dead-on-arrival fill, dead value fill and closing writes and present an LLC architecture which bypasses dead writes, thereby significantly reducing the power consumption of STT-RAM LLC. DRAM-based main memory is evolving towards 3D stacking in order to meet the bandwidth demands. Hybrid memory cube (HMC) provides orders of magnitude higher bandwidth in a single chip package. However, it suffers from significant idle power consumption, especially, due to standby power consumed by high-speed links. We propose a dynamic power management to minimize turning on the links with a negligible performance loss. Link activity is often dominated by prefetch traffics. In order to further reduce the link activity, we propose a two-level prefetcher scheme where the CPU die is equipped with a conservative prefetcher to reduce link traffics and the HMC has an aggressive prefetcher to reduce memory access latency. New package technology, namely, 3D die stacking and new applications such as graph computations revive the concept of processor-in-memory (PIM). In this talk, we introduce a message passing PIM architecture and an application to graph computation. Our preliminary experimental results show PIM can give orders of magnitude improvement in the performance of graph computation by exploiting the internal bandwidth of HMC-based main memory.

Speaker Bio

Sungjoo Yoo received Ph.D. from Seoul National University in 2000. He worked as researcher at TIMA laboratory, Grenoble France from 2000 to 2004. He was also principal engineer at Samsung System LSI from 2004 to 2008, where he led system-on-chip architecture design team and was involved in memory and bus architecture designs for mobile application processors and performance modeling and optimization of solid state disk. He joined POSTECH in 2008 and is now associate professor. His research interests include software, architecture, RTL and circuit design for low power SoC, and cache/memory and storage hierarchy based on emerging memory technologies such as phase-change RAM, spin-transfer torque RAM (including racetrack memory), resistive RAM and NAND Flash memory. He received Best Paper Award at International SoC Conference (ISOCC) in 2006 and Best Paper Award nominations at Design Automation Conference (DAC) in 2011 and Design Automation and Test in Europe (DATE) in 2002 and 2009.