We use cookies

We use cookies to ensure you get the best experience on our website. For more information on how we use cookies, please see our cookie policy.

Back to home

Nvidia's CUDA Tile examined: AI giant releases programming style for Rubin, Feynman, and beyond — tensor-native execution model lays the foundation for Blackwell and beyond

Source

Tom's Hardware

Published

TL;DR

AI Generated

Nvidia's latest CUDA 13.1 release introduces the CUDA Tile programming path, shifting from SIMT to a tensor-native execution model, laying the groundwork for future architectures like Blackwell. This change allows developers to focus on data operations rather than low-level hardware details, improving performance scalability across GPU generations. The CUDA Tile IR serves as a virtual instruction set for tile workloads, while cuTile Python enables developers to create array- and tile-oriented kernels directly in Python. This shift aims to optimize performance for AI-centric algorithms initially and expand to various applications like scientific simulations and HPC workloads in the future.