직함: Systems Research Engineer
Google Systems Research Group
Today, many organizations store big data on the cloud and lease relatively small clusters of instances to run analytics queries, train machine learning models, and more. However, the exponential data growth, combined with the slowdown of Moore's law, makes it challenging (if not impossible) to run such big data processing tasks in real-time. Most applications run big data workloads on timescales of several minutes or hours and resort to complex, application-specific optimizations to reduce the amount of data processing required for interactive queries. This design pattern hurts developer productivity and restricts the scope of applications that can use big data. However, as we have many servers in a cloud datacenter, a natural question is "can we borrow thousands of servers briefly to accelerate big data processing enough to be interactive?"
In this talk, I'll share my vision to enable massively parallel data processing even for very short-duration (1-10 ms), which I call "flash bursts." This will empower interactive, real-time applications (e.g., cyber security attack defense, self-driving cars or drones, etc) to utilize much larger data than before. For this moonshot, I take a two-pronged approach. First, I restructure important big data applications (analytics and DNN training) so that they can run efficiently in a flash burst fashion. On this prong, the talk will focus on how I efficiently scaled distributed sorting to 100+ servers even for a 1-millisecond time budget. Second, I rebuild various layers in distributed systems to reduce overheads of flash burst scaling. On this prong, I will focus on how I removed the overheads of consistent replication.
Seo Jin Park is joining USC Computer Science Department as an Assistant Professor in 2023 Fall. He is currently spending a year at Google Systems Research Group as a Systems Research Engineer. Seo Jin did his postdoc at MIT CSAIL with Mohammad Alizadeh. He received a Ph.D. in Computer Science from Stanford University in 2019, advised by John Ousterhout. His research interest has been broadly in distributed systems: bringing consistency for low latency systems, improving the robustness of a blockchain protocol, optimizing consensus protocols, suppressing tail-latencies, and building efficient performance debugging tools.