Back to home
Technology

DeepSeek tests “sparse attention” to slash AI processing costs

Source

Ars Technica

Published

TL;DR

AI Generated

DeepSeek, a Chinese AI company facing export restrictions on advanced AI chips, has developed "DeepSeek Sparse Attention" (DSA) to enhance processing efficiency in its latest language model, DeepSeek-V3.2-Exp. This technique, similar to sparse transformers used by OpenAI and Google Research, aims to reduce computational costs. DeepSeek claims its implementation achieves "fine-grained sparse attention" and has cut API prices by 50%. The company's focus on optimizing performance with limited resources highlights the ongoing efforts to enhance AI models while managing processing costs.

Read Full Article

Similar Articles

SemiEngineering

Silent Data Corruption: A Major Reliability Challenge in Large-Scale LLM Training (TU Berlin)

Researchers at Technische Universitat Berlin published a technical paper on the challenges of Silent Data Corruption (SDC) in Large Language Model (LLM) training. As LLMs grow in size, hardware-induced faults like SDC can bypass detection mechanisms, leading to severe consequences during training. The study explores how intermittent SDC impacts LLM pretraining, highlighting the sensitivity of different factors like bit positions and kernel functions. The research proposes a lightweight detection method to identify harmful parameter updates and demonstrates the effectiveness of recomputing training steps upon detection in mitigating corruption.

SemiEngineering
U.S. Commerce Sec. Lutnick says American AI dominates DeepSeek, thanks Trump for AI Action Plan — OpenAI and Anthropic beat Chinese models across 19 different benchmarks

U.S. Commerce Sec. Lutnick says American AI dominates DeepSeek, thanks Trump for AI Action Plan — OpenAI and Anthropic beat Chinese models across 19 different benchmarks

U.S. Commerce Secretary Howard Lutnick praises American AI models from OpenAI and Anthropic for outperforming Chinese DeepSeek models across 19 benchmarks in a recent NIST study. Lutnick credits President Trump's AI Action Plan for boosting American AI innovation and infrastructure. The study highlights American models' superiority in software engineering and cyber tasks, with cost efficiency and improved security. Despite Chinese AI company DeepSeek releasing new models, concerns persist over potential risks to national security posed by their adoption.

Tom's Hardware
DeepSeek’s new AI model debuts with support for China-native chips and CANN, a replacement for Nvidia's CUDA — Chinese chipmakers Huawei, Cambricon, and Hygon get first-class support

DeepSeek’s new AI model debuts with support for China-native chips and CANN, a replacement for Nvidia's CUDA — Chinese chipmakers Huawei, Cambricon, and Hygon get first-class support

DeepSeek has unveiled its latest AI model, DeepSeek-V3.2-Exp, optimized for Chinese chips and CANN, a CUDA replacement. The model aims to reduce costs for long-context inference with a sparse attention mechanism. Chinese chipmakers like Huawei, Cambricon, and Hygon are actively supporting the model for immediate deployment on their hardware. This move signals China's commitment to AI sovereignty by prioritizing domestic platforms over Nvidia's CUDA ecosystem. The model's compatibility with both Chinese and Nvidia accelerators highlights the country's readiness for a future less reliant on Nvidia hardware.

Tom's Hardware
Anthropic says its new AI model “maintained focus” for 30 hours on multistep tasks

Anthropic says its new AI model “maintained focus” for 30 hours on multistep tasks

Anthropic has unveiled its latest AI model, Claude Sonnet 4.5, which the company touts as its most advanced model yet, featuring enhanced coding and computer usage capabilities. The company also introduced Claude Code 2.0, a command-line AI agent for developers, and the Claude Agent SDK for building custom AI coding agents. Notably, Anthropic claims that Sonnet 4.5 demonstrated sustained focus on complex, multistep tasks for over 30 hours, a significant improvement over previous models that tended to lose coherence over time. The Claude family includes models of varying sizes – Haiku, Sonnet, and Opus – with Sonnet striking a balance between contextual depth and operational efficiency.

Ars Technica

We use cookies

We use cookies to ensure you get the best experience on our website. For more information on how we use cookies, please see our cookie policy.