[Seminar] Accelerating Data-intensive Applications on Fast Storage
문의:이재진 교수(x1863, 880-1863)
As we enter the Big Data era, more and more data are collected, stored, and analyzed to get useful business insights from them. Along with this trend, the advent of high-performance NVMe interface and new memory technologies is demanding a new innovation in storage software stack to accelerate the performance of various data-intensive applications. In this talk, I first share my experiences in developing data-intensive applications including ForestDB. ForestDB is a fast, persistent key-value store developed in collaboration with Couchbase Inc. ForestDB uses a new hybrid indexing scheme called HB+trie (Hierarchical B+tree-based Trie) which is optimized for variable-length string keys. I will also discuss our work on accelerating data-intensive applications on fast storage using the user-space I/O framework called NVMeDirect. NVMeDirect enables application-specific optimizations by allowing user-space applications to access NVMe SSDs directly. Our evaluation results show that NVMeDirect improves the performance of Redis and ForestDB by up to 15% with small code changes. Finally, I will present the overall architecture and preliminary result of NVMeDirect 2.0, which enhances application portability by providing a simple user-level file system called ForestFS and file system call wrappers.
Jin-Soo Kim received the B.S., M.S., and Ph.D. degrees in Computer Engineering from Seoul National University, Korea, in 1991, 1993, and 1999, respectively. He is currently a Professor in Sungkyunkwan University (SKKU). Before joining SKKU, he was an Associate Professor at Korea Advanced Institute of Science and Technology (KAIST) from 2002 to 2008. He was also with the Electronics and Telecommunications Research Institute (ETRI) from 1999 to 2002 as a Senior Researcher, and with the IBM T. J. Watson Research Center from 1998 to 1999 as an Academic Visitor. His research interests include operating systems, storage systems, and parallel and distributed computing.