[Seminar] SociaLite: A Datalog-based query language for large-scale graph analysis

Jiwon Seo
PhD student
EE at Stanford
Monday, September 1st 2014, 1:15pm - Monday, September 1st 2014, 2:30pm
Bldg. 302, Room 308

◾호스트: 전병곤 교수(x1928, 02-880-1928)


Large-scale graph analysis is becoming important with the rise of world-wide social network services. Existing distributed systems such as MapReduce or Pregel fail to provide high-level programming model for efficient graph analysis.

For greater ease of use and efficiency, we propose SociaLite, a high-level graph query language based on Datalog. While Datalog is known to succinctly express graph algorithms, its performance has not been competitive when compared to that of imperative programming languages. This performance gap is greatly reduced in SociaLite with its extensions.

With SociaLite, users can provide high-level annotations on the data layout for optimized data structures. Also, they can define recursive aggregate functions which, as long as they are meet operations, can be evaluated incrementally and efficiently. Furthermore, SociaLite supports high-level abstractions for distributed computations. With SociaLite, programmers to simply annotate how data is to be distributed, then the necessary communication is automatically inferred to generate parallel code for cluster of multi-core machines. The evaluation of recursive aggregate functions is optimized with a delta stepping technique for distributed clusters. In addition, approximate computation is supported in SociaLite, allowing programmers to trade off accuracy for less time and space.

We evaluated SociaLite with six core graph algorithms used in many social network analyses. Our experiment with 64 Amazon EC2 8-core instances shows that SociaLite programs performed within a factor of two with respect to ideal weak scaling. Furthermore, with Intel research group, we compared SociaLite with other graph frameworks including GraphLab, Giraph (an open-source version of Pregel), and Combinatorial Blas. We demonstrate that among the compared frameworks, SociaLite gives the best programmability, and competitive performance that is faster than Giraph or GraphLab.

Speaker Bio

Jiwon Seo is a PhD student in EE at Stanford, working with Professor Monica Lam. His research interest includes distributed systems, query language processing, large-scale graph analytics, and big data mining. His thesis is on SociaLite, a high-level query language for large-scale graph analysis. SociaLite is presented in academic conferences including ICDE, VLDB, and SIGMOD; his ICDE paper is selected as one of best papers and was invited to TKDE journal. His work also attracted interest from industry, and was presented in Hadoop Summit and Python Conference as well as tech talks in Google, Facebook, and LinkedIn.