Technology

LLM Benchmarking Shows Capabilities Doubling Every 7 Months

Source

IEEE Spectrum

Published

Jul 2, 2025

TL;DR

AI Generated

Read Full Article

Silent Data Corruption: A Major Reliability Challenge in Large-Scale LLM Training (TU Berlin)

Researchers at Technische Universitat Berlin published a technical paper on the challenges of Silent Data Corruption (SDC) in Large Language Model (LLM) training. As LLMs grow in size, hardware-induced faults like SDC can bypass detection mechanisms, leading to severe consequences during training. The study explores how intermittent SDC impacts LLM pretraining, highlighting the sensitivity of different factors like bit positions and kernel functions. The research proposes a lightweight detection method to identify harmful parameter updates and demonstrates the effectiveness of recomputing training steps upon detection in mitigating corruption.

SemiEngineering•

2 weeks ago

Anthropic says its new AI model “maintained focus” for 30 hours on multistep tasks

Anthropic has unveiled its latest AI model, Claude Sonnet 4.5, which the company touts as its most advanced model yet, featuring enhanced coding and computer usage capabilities. The company also introduced Claude Code 2.0, a command-line AI agent for developers, and the Claude Agent SDK for building custom AI coding agents. Notably, Anthropic claims that Sonnet 4.5 demonstrated sustained focus on complex, multistep tasks for over 30 hours, a significant improvement over previous models that tended to lose coherence over time. The Claude family includes models of varying sizes – Haiku, Sonnet, and Opus – with Sonnet striking a balance between contextual depth and operational efficiency.

Ars Technica•

8 months ago

Data Centers May House AI—But Operators Don’t Trust AI (Yet)

IEEE Spectrum•

9 months ago

MIT Technology Review

OpenAI has finally released open-weight language models

OpenAI has released new open-weight large language models called "gpt-oss," available in two sizes and comparable to existing models on benchmarks. These models can be downloaded, run, and modified locally, unlike those on OpenAI's web interface. The release aims to cater to users frustrated by the lack of open models and to compete with Chinese models gaining popularity. OpenAI's move aligns with the US government's emphasis on open models and could help reestablish its position in the AI landscape. The models are released under a permissive license, facilitating customization and research.

MIT Technology Review•