Technology

Google's TurboQuant reduces AI LLM cache memory capacity requirements by at least six times — up to 8x performance boost on Nvidia H100 GPUs, compresses KV caches to 3 bits with no accuracy loss

Source

Tom's Hardware

Published

Mar 25, 2026

TL;DR

AI Generated

Google's TurboQuant, a compression algorithm, reduces the memory capacity requirements of AI LLM cache by at least six times, providing up to an 8x performance boost on Nvidia H100 GPUs. It compresses KV caches to 3 bits without any loss in model accuracy. The algorithm eliminates overhead through a two-stage process involving PolarQuant and Quantized Johnson-Lindenstrauss (QJL). TurboQuant achieved perfect downstream scores on various benchmarks and showed strong results in vector search, outperforming baselines. This training-free algorithm, suitable for production inference and large-scale vector search systems, will be presented at ICLR 2026.

Read Full Article

Amazon employees admit to using AI unnecessarily to pump up internal usage scores — workers complain of intense pressure to use AI tools

Amazon employees have admitted to artificially inflating their usage of AI tools to meet internal targets, with some using an in-house platform called MeshClaw to boost their numbers. While Amazon stated that usage stats wouldn't affect performance evaluations, employees felt pressure to utilize these tools, leading to concerns about the accuracy of demand figures driving billions in AI infrastructure spending. This practice, known as "tokenmaxxing," has raised questions about the reliability of AI consumption data and the potential impact on the industry's growth. Experts suggest a shift towards measuring efficient token usage rather than sheer volume to ensure sustainable returns on massive investments in AI infrastructure.

Tom's Hardware•

2 hours ago

IPLM: Future Forward Webinar May 19th

The IPLM: Future Forward webinar on May 19th will showcase the latest developments in Perforce IPLM, focusing on how modern teams can handle growing design complexity while accelerating innovation. Hosted by IPLM Senior Product Manager and Senior Product Owner, the webinar will cover enhancements in workflow efficiency, performance gains from a modernized tech stack, stronger end-to-end traceability, and a preview of upcoming innovations. The session is recommended for Perforce IPLM users, semiconductor design and engineering teams, IP management leaders, and anyone involved in scaling design workflows. Attendees will gain practical insights on modernizing their IP lifecycle strategy using Perforce tools.

SemiWiki•

2 hours ago

MIT Technology Review

The Download: a Nobel winner on AI, and the case for fixing everything

Nobel-winning economist Daron Acemoglu discusses AI's impact on productivity and the future of work, emphasizing the need for human labor despite technological advancements. Stewart Brand advocates for the importance of maintenance in his new book, highlighting the significance of caring for various aspects of society. Additionally, the article mentions the discovery of the first zero-day exploit created by AI and other notable tech news stories, including OpenAI's cybersecurity efforts and developments in the hantavirus vaccine.

MIT Technology Review•

3 hours ago

Standard 90-day vulnerability disclosure policy is likely dead thanks to AI, expert warns that AI can weaponize patches in 30 minutes — LLM-assisted bug-hunting ushers in a new cyberworld order

AI-assisted code scanning tools are accelerating the discovery and exploitation of software vulnerabilities, rendering the traditional 90-day disclosure policy ineffective. Security researcher Himanshu Anand warns that AI can weaponize patches within 30 minutes, leading to a new cyberworld order where developers need to prioritize security measures like LLM-assisted bug-hunting. The rapid identification of vulnerabilities by AI-powered tools raises concerns about the effectiveness of current security practices, prompting Anand to advocate for immediate fixes for critical security issues. While open-source software benefits from quick patch distribution, closed-source software may face challenges as AI tools become more adept at identifying vulnerabilities.

Tom's Hardware•

4 hours ago

Amazon employees admit to using AI unnecessarily to pump up internal usage scores — workers complain of intense pressure to use AI tools

Tom's Hardware•

2 hours ago

IPLM: Future Forward Webinar May 19th

SemiWiki•

2 hours ago

MIT Technology Review

The Download: a Nobel winner on AI, and the case for fixing everything

MIT Technology Review•

3 hours ago

Standard 90-day vulnerability disclosure policy is likely dead thanks to AI, expert warns that AI can weaponize patches in 30 minutes — LLM-assisted bug-hunting ushers in a new cyberworld order

Tom's Hardware•

4 hours ago

Google's TurboQuant reduces AI LLM cache memory capacity requirements by at least six times — up to 8x performance boost on Nvidia H100 GPUs, compresses KV caches to 3 bits with no accuracy loss

TL;DR

Similar Articles

Amazon employees admit to using AI unnecessarily to pump up internal usage scores — workers complain of intense pressure to use AI tools

IPLM: Future Forward Webinar May 19th

The Download: a Nobel winner on AI, and the case for fixing everything

Standard 90-day vulnerability disclosure policy is likely dead thanks to AI, expert warns that AI can weaponize patches in 30 minutes — LLM-assisted bug-hunting ushers in a new cyberworld order

We use cookies

Google's TurboQuant reduces AI LLM cache memory capacity requirements by at least six times — up to 8x performance boost on Nvidia H100 GPUs, compresses KV caches to 3 bits with no accuracy loss

TL;DR

Similar Articles

Amazon employees admit to using AI unnecessarily to pump up internal usage scores — workers complain of intense pressure to use AI tools

IPLM: Future Forward Webinar May 19th

The Download: a Nobel winner on AI, and the case for fixing everything

Standard 90-day vulnerability disclosure policy is likely dead thanks to AI, expert warns that AI can weaponize patches in 30 minutes — LLM-assisted bug-hunting ushers in a new cyberworld order