[DLS] How to Make Sparse Fast
호스트: 이재욱 교수(x1834)
Achieving high performance is no easy task. When it comes to program operating on sparse data, where there is very little hardware, language or compiler support, getting high performance is nearly impossible. As important modern applications such as data analytics and simulations operate on sparse data, lack of performance is becoming a critical issue. Achieving high performance was so important from the early days of computing, many researchers have spent their lifetime trying to extract more FLOPS out of critical codes. Hardcore performance engineers try to get to this performance nirvana single handedly without any help from languages, compilers or tools. In this talk, using two examples, TACO and GraphIt, I’ll argue that domain specific languages and compiler technology can reduce most of the performance optimization burden even in a very difficult domain such as sparse computations.
Tensor algebra is a powerful tool with applications in machine learning, data analytics, engineering, and science. Increasingly often the tensors are sparse, which means most components are zeros. To get the best performance, currently programmers are left to write kernels for every operation, with different mixes of sparse and dense tensors in different formats. There are countless combinations, which makes it impossible to manually implement and optimize them all. The Tensor Algebra Compiler (TACO) is the first system to automatically generate kernels for any tensor algebra operation on tensors in any of the commonly used formats. Its performance is competitive with best-in-class hand-optimized kernels in popular libraries, while supporting far more tensor operations.
GraphIt is a DSL and compiler for high-performance graph computing. GraphIt separates algorithm, schedule and physical data layout, providing the programmer with the ultimate control over optimization. GraphIt outperforms the state-of-the-art libraries and DLSs up to 2.4× on scale-free graphs and 4.7× on road graphs.
Saman P. Amarasinghe is a Professor and the Associate Department Head in the Department of Electrical Engineering and Computer Science at Massachusetts Institute of Technology and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL) where he leads the Commit compiler group. Under Saman's guidance, the Commit group developed the StreamIt, PetaBricks, StreamJIT, Halide, Simit, GraphIt, MILK, and Cimple programming languages and compilers, Tiramisu compiler framework, DynamoRIO dynamic instrumentation system, Superword level parallelism for SIMD vectorization, Program Shepherding to protect programs against external attacks, the OpenTuner extendable autotuner, and the Kendo deterministic execution system. He was the co-leader of the Raw architecture project. His research interests are in discovering novel approaches to improve the performance of modern computer systems without unduly increasing the complexity faced by the end users, application developers, compiler writers, or computer architects. Saman received his BS in Electrical Engineering and Computer Science from Cornell University in 1988, and his MSEE and Ph.D from Stanford University in 1990 and 1997, respectively. (Homepage: https://people.csail.mit.edu/saman)