AI agents are eating the world. As AI inference workloads surge across datacenters, modern architectures must efficiently handle diverse tensor contraction patterns. Traditional designs—relying heavily on fixed-size matrix multiplication engines—struggle to deliver the scalability and flexibility needed for today’s models.
RNGD (pronounced "Renegade"), FuriosaAI's second-generation tensor contraction processor, introduces a novel architecture built to meet these challenges. Its coarse-grained processing elements (PEs) can dynamically be configured as a single large compute unit or as multiple independent units, adapting to a wide range of tensor shapes and sizes. This flexibility ensures efficient utilization across varying inference workloads.
RNGD incorporates several key architectural innovations to maximize performance and efficiency, including a circuit switch-based fetch network, input broadcasting, and buffer-based reuse mechanisms that reduce memory bandwidth pressure and improve data locality. These features collectively enable high throughput and energy-efficient computation, making RNGD a compelling solution for sustainable AI inference at scale.
Furiosa AI CEO