Back to home
Technology

New Deepseek model drastically reduces resource usage by converting text and documents into images — 'vision-text compression' uses up to 20 times fewer tokens

Source

Tom's Hardware

Published

TL;DR

AI Generated

A new Deepseek AI model, DeepSeek-OCR, converts text into images using 'vision-text compression,' reducing token usage by up to 20 times while maintaining accuracy. The model consists of the DeepEncoder and DeepSeek3B-MoE-A570M decoder, which work together to understand textual context within images with fewer tokens. This approach is beneficial for handling tabulated data, graphs, and visual information in fields like finance, science, and medicine. While maintaining a 97% accuracy rate with less than a 10x token reduction, accuracy drops to 60% with a 20x reduction, indicating diminishing returns but potential cost savings for AI models. The model is available for exploration on platforms like Hugging Face and GitHub.

Read Full Article

Similar Articles

MIT Technology Review

Three reasons why DeepSeek’s new model matters

DeepSeek's new V4 model is significant for three key reasons. Firstly, it offers high performance at a fraction of the cost of comparable models, making cutting-edge AI capabilities more accessible. Secondly, V4 introduces a new approach to memory efficiency by handling 1 million tokens in its context window, reducing computing power and memory usage significantly. Lastly, V4 marks a shift towards Chinese chip optimization, specifically for Huawei's Ascend chips, challenging the dominance of US chip giant Nvidia and potentially signaling China's progress in building a parallel AI infrastructure.

MIT Technology Review
Intel introduces its own Neural Compression technology with a fallback mode that works on GPUs without dedicated AI cores — early performance is on the level of Nvidia NTC

Intel introduces its own Neural Compression technology with a fallback mode that works on GPUs without dedicated AI cores — early performance is on the level of Nvidia NTC

Intel has unveiled its Neural Compression technology, which aims to reduce the size of video game textures in VRAM and storage, similar to Nvidia's NTC. The technology offers two modes: a quality mode achieving a 9x compression ratio and a more aggressive mode reaching an 18x compression ratio. Intel's solution utilizes BC1 texture compression and linear algebra for its XMX-accelerated portion, with a fallback mode that operates on GPUs lacking dedicated AI cores. The tech is designed to accelerate install times, save disk space, and optimize VRAM usage, with Intel showcasing competitive compression ratios compared to Nvidia's NTC.

Tom's Hardware
Testing DirectStorage with GPU decompression — do Blackwell GPUs have the upper hand?

Testing DirectStorage with GPU decompression — do Blackwell GPUs have the upper hand?

The article discusses the testing of DirectStorage with GPU decompression, focusing on whether Blackwell GPUs have an advantage in handling this technology. DirectStorage aims to optimize storage technology for faster asset streaming and reduced CPU overhead, with support for GPU decompression added in version 1.1. While Nvidia GPUs initially struggled with DirectStorage, Blackwell GPUs, like the 5090, showed improved performance with GPU decompression enabled. Tests on various Blackwell GPUs, including the 5070 and 5060, demonstrated consistent performance gains with DirectStorage. The article explores the potential reasons behind Blackwell GPUs handling GPU decompression more effectively, pointing to advancements in architecture and scheduling capabilities.

Tom's Hardware
Google's TurboQuant reduces AI LLM cache memory capacity requirements by at least six times — up to 8x performance boost on Nvidia H100 GPUs, compresses KV caches to 3 bits with no accuracy loss

Google's TurboQuant reduces AI LLM cache memory capacity requirements by at least six times — up to 8x performance boost on Nvidia H100 GPUs, compresses KV caches to 3 bits with no accuracy loss

Google's TurboQuant, a compression algorithm, reduces the memory capacity requirements of AI LLM cache by at least six times, providing up to an 8x performance boost on Nvidia H100 GPUs. It compresses KV caches to 3 bits without any loss in model accuracy. The algorithm eliminates overhead through a two-stage process involving PolarQuant and Quantized Johnson-Lindenstrauss (QJL). TurboQuant achieved perfect downstream scores on various benchmarks and showed strong results in vector search, outperforming baselines. This training-free algorithm, suitable for production inference and large-scale vector search systems, will be presented at ICLR 2026.

Tom's Hardware

We use cookies

We use cookies to ensure you get the best experience on our website. For more information on how we use cookies, please see our cookie policy.