Nvidia Rubin CPX forms one half of new, "disaggregated" AI inference architecture — approach splits work between compute- and bandwidth-optimized chips for best performance
Source
Published
TL;DR
AI GeneratedNvidia introduces the Rubin CPX GPU, part of a new "disaggregated" AI inference architecture that optimizes performance by splitting work between compute- and bandwidth-optimized chips. The Rubin CPX GPU is designed for compute-intensive tasks, while the standard Rubin GPU handles memory-bandwidth-limited tasks in AI inference. The Rubin CPX GPU offers 30 petaFLOPs of raw compute performance and 128 GB of GDDR7 memory, while the Vera Rubin NVL144 CPX rack, containing both Rubin GPUs, Vera CPUs, and high-speed memory, is expected to deliver 8 exaFLOPs NVFP4. Nvidia predicts significant revenue potential from AI systems utilizing the Rubin CPX GPU and plans to showcase the technology at GTC 2026.