ML jobs are the new killer apps of datacenters. Large cloud providers were almost ready to declare that they were able to take on the challenges posed by nearly any apps. Then ML jobs arrived and are proving that it was premature for the providers to declare that. In my talk I will first quickly recap how the datacenter networks have evolved to realize the illusion of a gigantic single switch. Then I will show how ML jobs are reshaping the performance and availability requirements for datacenter networking yet again. I will also share a few high-level approaches that the providers are introducing to address those new challenges.
Chang Kim is a Principal Engineer at Google and work in the Net Infra group at GCP.
He was VP of Engineering at Moloco, a startup that provides cutting-edge ML and big-data processing solutions and services to the mobile and e-commerce industries. He was an adjunct professor at the CS department of Stanford University. Up until early 2021, He worked as CTO of Applications at Barefoot Division in Intel, and an Intel Fellow. He had also worked actively for P4.org, where he led various engineering and research projects regarding fully-programmable high-speed networking devices and their applications. Before getting involved with P4.org and Barefoot Networks, He had worked at Windows Azure, Microsoft’s cloud-service division and had led engineering and research projects on the architecture, performance, and management of datacenter networks.
He have knack of having interest in and working on a variety of topics, including large-scale ML and data-processing systems, applications of DNNs, ML infrastructure, programmable networking, domain-specific machine architectures, application acceleration, and debugging and diagnosis of large-scale distributed systems. Many of his engineering and research contributions — including In-band Network Telemetry, Tiny Packet Programs, VL2, Seawall, EyeQ, Ananta, and SEATTLE — are adopted in large production systems and services.
With his collaborators he received a few awards, including best paper awards from top-notch conferences, such as SIGCOMM, NSDI, and FAST. He was the recipient of Microsoft Rockstar Award 2013, an annual recognition for the strongest networking contributions Microsoft-wide. He received PhD in Computer Science at Princeton University, and MS/BS in Computer Engineering at Seoul National University.