Back to home
Technology

Nvidia's new CPX GPU aims to change the game in AI inference — how the debut of cheaper and cooler GDDR7 memory could redefine AI inference infrastructure

Source

Tom's Hardware

Published

TL;DR

AI Generated

Nvidia has introduced the Rubin CPX GPU, designed to enhance AI inference by focusing on the context phase with specialized hardware and 128GB of GDDR7 memory. The Rubin CPX aims to optimize long-context inference processing, enabling more efficient and cost-effective AI infrastructure. Nvidia's Dynamo software orchestration layer intelligently manages inference workloads across different GPUs in a disaggregated system, streamlining the process for developers. Companies like Cursor, Runway, and Magic are already planning to integrate Rubin CPX into their AI workflows for various applications. This shift in AI infrastructure represents a new paradigm, optimizing hardware resources for improved efficiency and scalability in AI processing.

Read Full Article

Similar Articles

Zuckerberg's Meta will beam sunlight from space to power AI data centers, solar-collecting satellites will orbit 22,000 miles above Earth — firm reserves 1 Gigawatt of orbital solar energy and 100 Gigawatt-hours of long-duration storage

Zuckerberg's Meta will beam sunlight from space to power AI data centers, solar-collecting satellites will orbit 22,000 miles above Earth — firm reserves 1 Gigawatt of orbital solar energy and 100 Gigawatt-hours of long-duration storage

Meta, led by Zuckerberg, plans to power its AI data centers with sunlight beamed from space using solar-collecting satellites in geosynchronous orbit 22,000 miles above Earth. The company has reserved 1 Gigawatt of orbital solar energy and 100 Gigawatt-hours of long-duration storage to address the increasing energy demands of its AI infrastructure. This move is part of Meta's strategy to secure long-term energy supplies for its expanding AI operations, with a first orbital demonstration planned for 2028 and potential commercial delivery by 2030. The partnerships with Overview Energy and Noon Energy aim to tackle the challenges of intermittency and long-duration energy storage in renewable energy systems.

Tom's Hardware
Global memory shortage expected to get worse before it gets better

Global memory shortage expected to get worse before it gets better

The global memory shortage is expected to worsen, with reports indicating that DRAM shortages may persist until the end of the decade. Major manufacturers like Samsung, SK Hynix, and Micron are investing in expanding production facilities, but the additional capacity won't be fully operational until 2027 or later, leading to a multi-year supply gap. The rise in AI infrastructure demand for high-bandwidth memory is prioritizing production over traditional DRAM used in consumer devices, causing further supply constraints. Analysts predict a shortfall in production growth compared to demand, potentially extending the memory shortage until 2030, resulting in continued high prices for consumers.

TweakTown
Disaggregating LLM Inference: Inside the SambaNova Intel Heterogeneous Compute Blueprint

Disaggregating LLM Inference: Inside the SambaNova Intel Heterogeneous Compute Blueprint

SambaNova Systems and Intel have introduced a blueprint for heterogeneous inference that optimizes modern large language model (LLM) workloads by utilizing specialized hardware for different phases of inference: GPUs for prefill, SambaNova RDUs for decode, and Intel Xeon 6 CPUs for agentic tools and orchestration. This approach addresses the complexity of agentic AI systems with varying compute demands. By isolating tasks onto specific hardware, the architecture improves efficiency, scalability, and cost-effectiveness. The design reflects a shift towards specialized compute fabrics and better supports the evolving landscape of AI reasoning systems.

SemiWiki
Analytics group signals possible delays at 40% of AI data center construction sites — companies deny schedule holdups, but satellite imagery indicates otherwise

Analytics group signals possible delays at 40% of AI data center construction sites — companies deny schedule holdups, but satellite imagery indicates otherwise

Several U.S. data center projects, including those involving Microsoft, OpenAI, and Oracle, are facing potential delays due to regulatory challenges, supply chain issues, and utility availability. Satellite imagery analysis by SynMax suggests that construction progress is slower than expected, with some projects possibly missing deadlines by over three months. Despite denials from companies involved, reports indicate a shortage of specialist workers and delays in construction. The increased demand for electricity to power AI data centers is also straining local utility providers, leading to further complications in project timelines.

Tom's Hardware

We use cookies

We use cookies to ensure you get the best experience on our website. For more information on how we use cookies, please see our cookie policy.