[Seminar] Data Processing and Analysis at Google

Sunghwan Ihm
Software Engineer
Wednesday, February 4th 2015, 11:00am

■ 호스트: 전병곤 교수(x1928, 02-880-1928)


Google has a strong track record of innovating large-scale big data processing and analysis technologies. From simple batch jobs to more complex multi-stage pipelines to real-time streaming analytics, various in-house tools have been developed and used in production at Google for many years. However, having multiple tools for different purposes imposes new challenges both on application writers and on tool developers. For application writers, having to learn different programming models and systems for adapting to rapidly changing data and business requirements hinders their productivity. For tool developers, maintaining many different systems and supporting their users also require a huge engineering effort, which is often duplicated. To address these challenges, we have developed Google Cloud Dataflow service that provides a unified programming model and runtime system for application writers and tool developers. In this talk, we first introduce Google's in-house big data processing and analysis technologies that influenced the design and implementation of Cloud Dataflow. We then present the programming model and API of Cloud Dataflow, which we recently open-sourced to the public. Lastly, we briefly describe the architecture and features of Cloud Dataflow’s runtime system.

Speaker Bio

Sunghwan Ihm is a Senior Software Engineer in the Systems Infrastructure group at Google, where he works on the next generation big data processing and analysis platform. Before that, he completed his Ph.D. (2011) in Computer Science at Princeton University. His dissertation work focuses on understanding and improving modern Web traffic caching. During his Ph.D. study, he had internships at Alcatel-Lucent Bell Labs and Intel Labs Berkeley. He received his B.S. (2004) and M.S. (2006) in Computer Science from KAIST.