[SNU CSE Industry Seminar Series] Accelerating inference efficiency at scale

Name: 백준호
Host: 유승주 교수
Date: 2025/6/05 PM 02:00 - PM 03:00
Location: 301동 203호
대표 이미지
Summary

AI agents are eating the world. As AI inference workloads surge across datacenters, modern architectures must efficiently handle diverse tensor contraction patterns. Traditional designs—relying heavily on fixed-size matrix multiplication engines—struggle to deliver the scalability and flexibility needed for today’s models.

RNGD (pronounced "Renegade"), FuriosaAI's second-generation tensor contraction processor, introduces a novel architecture built to meet these challenges. Its coarse-grained processing elements (PEs) can dynamically be configured as a single large compute unit or as multiple independent units, adapting to a wide range of tensor shapes and sizes. This flexibility ensures efficient utilization across varying inference workloads.

RNGD incorporates several key architectural innovations to maximize performance and efficiency, including a circuit switch-based fetch network, input broadcasting, and buffer-based reuse mechanisms that reduce memory bandwidth pressure and improve data locality. These features collectively enable high throughput and energy-efficient computation, making RNGD a compelling solution for sustainable AI inference at scale.

Speaker Introduction

Furiosa AI CEO