[Seminar] Automated Load Testing at Facebook's Scale

김원호 (Wonho Kim)
Software engineer
Tuesday, December 16th 2014, 11:00am

■호스트: 전병곤 교수(x1928, 02-880-1928)


Facebook's infrastructure serves millions requests per second, providing reliable personalized experience to more than a billion people from all over the world. It is comprised of hundreds of distributed, interconnected internal services which rapidly evolve and change in a decentralized manner. This infrastructure is deployed amongst many geographically distributed datacenters.

This presents non-trivial capacity management challenges: we need a reliable tool for understanding capacity bottlenecks and analyzing performance of individual services so that we can appropriately allocate resources amongst these services. Understanding the capacity of each of these datacenters is critical to guarantee optimal user experience during planned and unplanned datacenter outages.

To meet these challenges, we developed Keanu, a family of automated continuous load testing tools running at different levels of infrastructure hierarchy. Keanu provides essential information for optimizing resource allocation, it identifies capacity regressions and is used as an A/B testing framework for improving performance of individual services. In this talk I’ll describe and motivate the approach for continuous load testing of large-scale systems. I’ll also present the common patterns that cause problems in our infrastructure and talk about the ways load testing is essential to unveil these issues.

Speaker Bio

Wonho Kim is a software engineer in the infrastructure group at Facebook, where he works on datacenter capacity management and disaster recovery. He received his Ph.D. in Computer Science from Princeton University (2012), an M.A. from Princeton University (2010), and a B.S. from Seoul National University (2006). He is a recipient of Doctoral Study Fellowship from Korean Foundation for Advanced Studies (KFAS).