We use cookies

We use cookies to ensure you get the best experience on our website. For more information on how we use cookies, please see our cookie policy.

Back to home

Nvidia Rubin CPX forms one half of new, "disaggregated" AI inference architecture — approach splits work between compute- and bandwidth-optimized chips for best performance

Source

Tom's Hardware

Published

TL;DR

AI Generated

Nvidia introduces the Rubin CPX GPU, part of a new "disaggregated" AI inference architecture that optimizes performance by splitting work between compute- and bandwidth-optimized chips. The Rubin CPX GPU is designed for compute-intensive tasks, while the standard Rubin GPU handles memory-bandwidth-limited tasks in AI inference. The Rubin CPX GPU offers 30 petaFLOPs of raw compute performance and 128 GB of GDDR7 memory, while the Vera Rubin NVL144 CPX rack, containing both Rubin GPUs, Vera CPUs, and high-speed memory, is expected to deliver 8 exaFLOPs NVFP4. Nvidia predicts significant revenue potential from AI systems utilizing the Rubin CPX GPU and plans to showcase the technology at GTC 2026.