직함: [seminar] Exploiting PCIe- and CXL-based Accelerators to Reduce Datacenter Memory Tax
Memory optimization kernel features, such as memory deduplication, are designed to improve the overall efficiency of systems like datacenter servers, and they have proven to be effective. However, when invoked, these kernel features notably disrupt the execution of applications, intensively consuming the server CPU's cycles and polluting its caches. To minimize such disruption, we propose to accelerate the intensive operations of these kernel features to a PCIe-based SmartNIC (SNIC) and a CXL-based FPGA. With SNIC, we first RDMA-copy the server's memory regions, on which these kernel features intend to operate, to an SNIC's memory region, exploiting SNIC's RDMA capability. Subsequently, leveraging SNIC's compute capability, we make the SNIC CPU perform the intensive operations of these kernel features. Lastly, we RDMA-copy their results back to a server's memory region, based on which it performs the remaining operations of the kernel features. To demonstrate the efficacy of our p roposal, we re-implement two memory optimization kernel features in Linux: (1) memory deduplication (ksm) and (2) compressed cache for swap pages (zswap), and then show that a system with our proposal provides a 55-89% decrease in 99th-percentile latency of co-running applications, compared to a conventional system, while preserving the benefits of deploying these kernel features. Furthermore, we take the state-of-the-art CXL-based FPGA providing a unified memory space and cache coherence between FPGA and CPU, and then demonstrate that CXL-based FPGA can offer considerably lower 99th-percentile latency than PCIe-based SNIC, practically eliminating the 99th-percentile latency increased by deploying ksm and zswap.
I am the W.J. ‘Jerry’ Sanders III – Advanced Micro Devices, Inc. Endowed Chair Professor at the University of Illinois, Urbana-Champaign and a fellow of ACM, IEEE, and NAI. From 2018 to 2020, I took a leave of absence and as a Sr. Vice President at a major memory manufacturing company I led the development of next-generation DRAM products, including the industry's first HBM-PIM that will play a significant role in shaping the future computing landscape. I have published more than 250 refereed articles to highly-selective conferences and journals in the field of digital circuit, processor architecture, and computer-aided design. The top three most frequently cited papers have more than 5000 citations and the total number of citations of all my papers approaches 16000. I was a recipient of many internationally recognized awards, including ACM/IEEE Most Influential ISCA Paper Award in 2017, and SIGMICRO 2021 Test of Time Awards in 2021. I am a hall of fame member of all three m ajor computer architecture conferences, IEEE HPCA , MICRO, and ISCA. Lastly, I am the first Korean who achieved these titles, awards, and recognitions mentioned above in the computer architecture field.