Rethinking LLM Serving From the Application’s Perspective

Community
arrow_forward_ios
Seminars

Seminars

Name: 김인(In Gim)

Affiliation: Efficient Computing Lab

Host: 이영기 교수

Date: 10/20/2025 오후 04:00 - 오후 05:00

Location: 302동 311-1호

Summary

As LLMs become the core of modern AI applications, inference efficiency has become critical, not just for speed but also for sustainability.
An old lesson of systems design is that efficiency arises from understanding the workload. Yet today’s LLM serving systems are largely application agnostic. They are optimized for generic text completion, while real applications now perform far richer tasks such as invoking tools, retrieving data, executing code, and coordinating with other agents.
It raises a question: How should we rethink LLM serving, not from the system’s perspective, but from the application’s?
In this talk, I will explore that question and show how an application-centered approach leads to serving systems that are more programmable, flexible, and application aware.

Speaker Introduction

In Gim is a fourth-year Ph.D. student in Computer Science at Yale University. His research focuses on systems for machine learning, specifically on programmable systems for AI. His first-author works have been recognized by top venues like SOSP, MLSys, MobiSys, HotOS, EMNLP, and AAAI.

expand_more

From Artificial Intelligence to Active Inference: The Key to True AI and 6G World Brain

List

Seminars

Rethinking LLM Serving From the Application’s Perspective

Community

Rethinking LLM Serving From the Application’s Perspective