Technology

Nvidia Rubin CPX forms one half of new, "disaggregated" AI inference architecture — approach splits work between compute- and bandwidth-optimized chips for best performance

Source

Tom's Hardware

Published

Sep 10, 2025

TL;DR

AI Generated

Nvidia introduces the Rubin CPX GPU, part of a new "disaggregated" AI inference architecture that optimizes performance by splitting work between compute- and bandwidth-optimized chips. The Rubin CPX GPU is designed for compute-intensive tasks, while the standard Rubin GPU handles memory-bandwidth-limited tasks in AI inference. The Rubin CPX GPU offers 30 petaFLOPs of raw compute performance and 128 GB of GDDR7 memory, while the Vera Rubin NVL144 CPX rack, containing both Rubin GPUs, Vera CPUs, and high-speed memory, is expected to deliver 8 exaFLOPs NVFP4. Nvidia predicts significant revenue potential from AI systems utilizing the Rubin CPX GPU and plans to showcase the technology at GTC 2026.

Read Full Article

SpaceX says it is going to begin manufacturing GPUs — $1.75 trillion IPO listing reportedly includes in-house GPU production

SpaceX's confidential $1.75 trillion IPO filing reveals plans to manufacture its own GPUs, investing billions in internal processor production due to a lack of long-term supply agreements with silicon suppliers. The company's intention to build GPUs, not specialized AI accelerators, is highlighted, with the naming convention still uncertain. While SpaceX's CEO confirmed plans for high-volume semiconductor manufacturing, the specifics of the GPUs remain unclear, raising questions about potential competition with existing AI GPU manufacturers like AMD and Nvidia. The S-1 form's confidential nature prevents verification of its content, leaving room for speculation on SpaceX's semiconductor endeavors.

Tom's Hardware•

1 week ago

Disaggregating LLM Inference: Inside the SambaNova Intel Heterogeneous Compute Blueprint

SambaNova Systems and Intel have introduced a blueprint for heterogeneous inference that optimizes modern large language model (LLM) workloads by utilizing specialized hardware for different phases of inference: GPUs for prefill, SambaNova RDUs for decode, and Intel Xeon 6 CPUs for agentic tools and orchestration. This approach addresses the complexity of agentic AI systems with varying compute demands. By isolating tasks onto specific hardware, the architecture improves efficiency, scalability, and cost-effectiveness. The design reflects a shift towards specialized compute fabrics and better supports the evolving landscape of AI reasoning systems.

SemiWiki•

1 week ago

Tests show $30,000 AI GPUs are terrible password crackers — RTX 5090 gaming GPU outperforms Nvidia H200 and AMD MI300X

Tests conducted by Specops revealed that expensive $30,000 AI GPUs like Nvidia's H200 and AMD's MI300X performed poorly compared to consumer graphics cards like Nvidia's RTX 5090 in password cracking. Despite the significant price difference, the RTX 5090 outperformed the other GPUs by 20% over the MI300X and 63.7% over the H200 on average. The RTX 5090 excelled due to its design with more INT32 cores, which are crucial for password cracking, while the AI GPUs prioritize different instructions for their machine learning tasks. This study underscores that consumer desktop GPUs remain the most efficient for password cracking tasks.

Tom's Hardware•

2 weeks ago

Intel and SambaNova team up on heterogenous AI inference platform — different hardware performs different workloads

Intel and SambaNova have collaborated on a new heterogeneous inference platform that utilizes different hardware components for various AI workloads. The platform leverages AI accelerators or GPUs for prefill, SambaNova's SN50 RDU for decoding, and Xeon 6 processors for agent-related operations and workload distribution. This architecture aims to compete with Nvidia by offering a scalable solution for enterprises and cloud operators, set to be available in the second half of 2026. The collaboration emphasizes the performance benefits of Xeon 6 processors and their compatibility with existing data center infrastructures.

Tom's Hardware•

3 weeks ago

Nvidia Rubin CPX forms one half of new, "disaggregated" AI inference architecture — approach splits work between compute- and bandwidth-optimized chips for best performance

TL;DR

Similar Articles

SpaceX says it is going to begin manufacturing GPUs — $1.75 trillion IPO listing reportedly includes in-house GPU production

Disaggregating LLM Inference: Inside the SambaNova Intel Heterogeneous Compute Blueprint

Tests show $30,000 AI GPUs are terrible password crackers — RTX 5090 gaming GPU outperforms Nvidia H200 and AMD MI300X

Intel and SambaNova team up on heterogenous AI inference platform — different hardware performs different workloads

We use cookies