[Seminar] Taming Subgraph Isomorphism for RDF Query Processing
Pohang University of Science and Technology (POSTECH)
호스트: 박근수 교수 (x1828, 880-1828)
RDF data are used to model knowledge in various areas such as life sciences, Semantic Web, bioinformatics, and social graphs. The size of real RDF data reaches billions of triples. This calls for a framework for efficiently processing RDF data. The core function of processing RDF data is subgraph pattern matching. There have been two completely different directions for supporting efficient subgraph pattern matching. One direction is to develop specialized RDF query processing engines exploiting the properties of RDF data for the last decade, while the other direction is to develop efficient subgraph isomorphism algorithms for general, labeled graphs for over 30 years. Although both directions have a similar goal (i.e., finding subgraphs in data graphs for a given query graph), they have been independently researched without clear reason. We argue that a subgraph isomorphism algorithm can be easily modified to handle the graph homomorphism, which is the RDF pattern matching semantics, by just removing the injectivity constraint. In this talk, based on the state-of-the-art subgraph isomorphism algorithm, we propose an in-memory solution, TurboHOM++, which is tamed for the RDF processing, and we compare it with the representative RDF processing engines for several RDF benchmarks in a server machine where billions of triples can be loaded in memory. In order to speed up TurboHOM++, we also provide a simple yet effective transformation and a series of optimization techniques. Extensive experiments using several RDF benchmarks show that TurboHOM++ consistently and significantly outperforms the representative RDF engines.
Professor Wook-Shin Han is currently a Full Professor in the Department of Creative IT Engineering and the Department of Computer Science and Engineering in POSTECH. Before that, he was an Associate Professor in the Department of Computer Science and Engineering in Kyungpook National University. He obtained his Ph.D. from KAIST in 2001. His primary research efforts have been devoted to developing new techniques in DBMS "engine research." He has developed an object-relational DBMS supporting multiple language bindings. He has also developed the tight coupling technology of DBMS with IR features. At the IBM Almaden Research Center, he has developed progressive query optimization in the parallel DB2 as a postdoc. He also invented the new concept of "parallelizing query optimization" for faster query compilation by exploiting multi-core architecture. Recently, he has developed a framework called iGraph for comparisons of subgraph isomorphism indexing and query processing algorithms. He extensively published at major international journals and conferences, including SIGMOD, VLDB, SIGKDD, ICDE, WWW, IEEE Transactions on Knowledge and Data Engineering (TKDE), and VLDB Journal. He regularly serves as a PC member for VLDB, SIGMOD, and ICDE. He (will) serves/served as an associate editor of several international journals including the VLDB Journal, IEEE TKDE and Information Sciences. He served as an industrial co-chair for ICDE 2015.