Back to home
Technology

A bug that taught me more about PyTorch than years of using it

Source

Hacker News

Published

TL;DR

AI Generated

A PyTorch bug led to a loss plateau, revealing a niche issue with the PyTorch backend on Apple Silicon GPUs. The bug affected the encoder weights, causing them to freeze during training due to a GPU kernel bug. The bug was traced to addcmul_ and addcdiv_ operations failing on non-contiguous memory layouts, impacting the Adam optimizer. The fix involved making weights contiguous at initialization and upgrading to PyTorch ≥2.4. The investigation process uncovered insights into PyTorch internals, memory layouts, and kernel implementations. The bug was fixed locally, and a PR was submitted to address similar issues in other operations.

Read Full Article

Similar Articles

MIT Technology Review

The Download: a new Christian phone network, and debugging LLMs

A new US phone network for Christians is launching, blocking porn and gender-related content with network-level controls. Goodfire, a San Francisco startup, released Silico, a tool for debugging AI models by allowing users to adjust parameters during training. The National Science Foundation faced mass firings, impacting US science funding and governance. China's AI labs are releasing open-source models, challenging the traditional Silicon Valley approach. Elon Musk admitted using OpenAI models for xAI training, sparking debate on AI ethics and practices.

MIT Technology Review
Intel has reportedly cancelled discrete gaming GPUs for the upcoming Xe3P Arc "Celestial" family — gaming GPU remains uncertain even for the next-gen Xe4 "Druid" lineup that lands in 2027

Intel has reportedly cancelled discrete gaming GPUs for the upcoming Xe3P Arc "Celestial" family — gaming GPU remains uncertain even for the next-gen Xe4 "Druid" lineup that lands in 2027

Intel has reportedly scrapped plans for discrete gaming GPUs in the upcoming Xe3P Arc "Celestial" family, leaving the fate of gaming GPUs uncertain even for the Xe4 "Druid" lineup expected in 2027. The Celestial GPU was originally intended for a 2025 launch but was replaced by Battlemage, with Xe3P now serving other purposes. Intel's focus seems to be shifting towards AI applications, with leaks suggesting a potential late-2027 release for the Druid architecture. The future of dedicated gaming GPUs from Intel remains speculative, with the possibility of a revival with the Druid lineup.

Tom's Hardware
SpaceX says it is going to begin manufacturing GPUs — $1.75 trillion IPO listing reportedly includes in-house GPU production

SpaceX says it is going to begin manufacturing GPUs — $1.75 trillion IPO listing reportedly includes in-house GPU production

SpaceX's confidential $1.75 trillion IPO filing reveals plans to manufacture its own GPUs, investing billions in internal processor production due to a lack of long-term supply agreements with silicon suppliers. The company's intention to build GPUs, not specialized AI accelerators, is highlighted, with the naming convention still uncertain. While SpaceX's CEO confirmed plans for high-volume semiconductor manufacturing, the specifics of the GPUs remain unclear, raising questions about potential competition with existing AI GPU manufacturers like AMD and Nvidia. The S-1 form's confidential nature prevents verification of its content, leaving room for speculation on SpaceX's semiconductor endeavors.

Tom's Hardware
Testing DirectStorage with GPU decompression — do Blackwell GPUs have the upper hand?

Testing DirectStorage with GPU decompression — do Blackwell GPUs have the upper hand?

The article discusses the testing of DirectStorage with GPU decompression, focusing on whether Blackwell GPUs have an advantage in handling this technology. DirectStorage aims to optimize storage technology for faster asset streaming and reduced CPU overhead, with support for GPU decompression added in version 1.1. While Nvidia GPUs initially struggled with DirectStorage, Blackwell GPUs, like the 5090, showed improved performance with GPU decompression enabled. Tests on various Blackwell GPUs, including the 5070 and 5060, demonstrated consistent performance gains with DirectStorage. The article explores the potential reasons behind Blackwell GPUs handling GPU decompression more effectively, pointing to advancements in architecture and scheduling capabilities.

Tom's Hardware

We use cookies

We use cookies to ensure you get the best experience on our website. For more information on how we use cookies, please see our cookie policy.