Towards Interactive Big Data Processing Through Flash Burst Parallel Systems

소식
arrow_forward_ios
세미나

세미나

Towards Interactive Big Data Processing Through Flash Burst Parallel Systems

이름: Seo Jin Park

직함: Systems Research Engineer

소속:

Google Systems Research Group

주최: 전병곤 교수

날짜: 2022. 12. 6. 오전 11:00 - 오후 12:00

위치: 302동 311-1호 (소프트웨어 실습실)

요약

Today, many organizations store big data on the cloud and lease relatively small clusters of instances to run analytics queries, train machine learning models, and more. However, the exponential data growth, combined with the slowdown of Moore's law, makes it challenging (if not impossible) to run such big data processing tasks in real-time. Most applications run big data workloads on timescales of several minutes or hours and resort to complex, application-specific optimizations to reduce the amount of data processing required for interactive queries. This design pattern hurts developer productivity and restricts the scope of applications that can use big data. However, as we have many servers in a cloud datacenter, a natural question is "can we borrow thousands of servers briefly to accelerate big data processing enough to be interactive?"

In this talk, I'll share my vision to enable massively parallel data processing even for very short-duration (1-10 ms), which I call "flash bursts." This will empower interactive, real-time applications (e.g., cyber security attack defense, self-driving cars or drones, etc) to utilize much larger data than before. For this moonshot, I take a two-pronged approach. First, I restructure important big data applications (analytics and DNN training) so that they can run efficiently in a flash burst fashion. On this prong, the talk will focus on how I efficiently scaled distributed sorting to 100+ servers even for a 1-millisecond time budget. Second, I rebuild various layers in distributed systems to reduce overheads of flash burst scaling. On this prong, I will focus on how I removed the overheads of consistent replication.

연사 소개

Seo Jin Park is joining USC Computer Science Department as an Assistant Professor in 2023 Fall. He is currently spending a year at Google Systems Research Group as a Systems Research Engineer. Seo Jin did his postdoc at MIT CSAIL with Mohammad Alizadeh. He received a Ph.D. in Computer Science from Stanford University in 2019, advised by John Ousterhout. His research interest has been broadly in distributed systems: bringing consistency for low latency systems, improving the robustness of a blockchain protocol, optimizing consensus protocols, suppressing tail-latencies, and building efficient performance debugging tools.

expand_less

Human-Data Interaction for User Empowerment

expand_more

[SNU AI Seminar] 그 많던 AI 스타트업들은 다 어디로 갔을까?

세미나

Towards Interactive Big Data Processing Through Flash Burst Parallel Systems

소식

Towards Interactive Big Data Processing Through Flash Burst Parallel Systems