Back to home

Articles tagged with "llms"

Disaggregating LLM Inference: Inside the SambaNova Intel Heterogeneous Compute Blueprint

Disaggregating LLM Inference: Inside the SambaNova Intel Heterogeneous Compute Blueprint

SambaNova Systems and Intel have introduced a blueprint for heterogeneous inference that optimizes modern large language model (LLM) workloads by utilizing specialized hardware for different phases of inference: GPUs for prefill, SambaNova RDUs for decode, and Intel Xeon 6 CPUs for agentic tools and orchestration. This approach addresses the complexity of agentic AI systems with varying compute demands. By isolating tasks onto specific hardware, the architecture improves efficiency, scalability, and cost-effectiveness. The design reflects a shift towards specialized compute fabrics and better supports the evolving landscape of AI reasoning systems.

SemiWiki
SemiEngineering

Silent Data Corruption: A Major Reliability Challenge in Large-Scale LLM Training (TU Berlin)

Researchers at Technische Universitat Berlin published a technical paper on the challenges of Silent Data Corruption (SDC) in Large Language Model (LLM) training. As LLMs grow in size, hardware-induced faults like SDC can bypass detection mechanisms, leading to severe consequences during training. The study explores how intermittent SDC impacts LLM pretraining, highlighting the sensitivity of different factors like bit positions and kernel functions. The research proposes a lightweight detection method to identify harmful parameter updates and demonstrates the effectiveness of recomputing training steps upon detection in mitigating corruption.

SemiEngineering
Automated Security Assertion Generation Using LLMs (U. of Florida)

Automated Security Assertion Generation Using LLMs (U. of Florida)

A technical paper titled "Assertain: Automated Security Assertion Generation Using Large Language Models" by the University of Florida introduces Assertain, an automated framework that generates security properties and SystemVerilog Assertions for hardware designs. By leveraging large language models and self-reflection refinement, Assertain improves assertion quality and reduces manual effort in hardware security verification. In evaluations on 11 hardware designs, Assertain outperformed GPT-5 in correct assertion generation, unique CWE coverage, and architectural flaw detection. The framework significantly enhances vulnerability coverage in hardware security verification.

SemiEngineering
Why Your LLM-Generated Testbench Compiles But Doesn’t Verify: The Verification Gap Problem

Why Your LLM-Generated Testbench Compiles But Doesn’t Verify: The Verification Gap Problem

The article discusses the issue of LLM-generated testbenches compiling successfully but failing to verify at the functional level, highlighting the Verification Gap problem. It explains that compile success does not guarantee functional correctness at the protocol level, as compilers focus on type consistency and syntax rather than protocol-specific details. The piece presents failures from a case study on an AHB2APB bridge, emphasizing the importance of metrics like Repair Efficiency Score (RES), Verification Gap (VG), and Specification Coverage Ratio (SCR) to measure the gap between compilation and verification. It suggests that improving formal specification schemas is more effective than increasing model complexity in LLM-based verification automation. The article concludes with insights on the importance of a well-designed testbench in detecting integration bugs and provides recommendations for verification teams using LLMs.

SemiWiki
MIT Technology Review

Google DeepMind wants to know if chatbots are just virtue signaling

Google DeepMind is exploring the moral behavior of large language models (LLMs) to determine if their actions in roles like companions or therapists are trustworthy. While LLMs have shown moral competence, there are concerns about their reliability, as they can change responses based on feedback or formatting. The researchers propose rigorous tests to evaluate LLMs' moral reasoning, including challenging them with variations of moral problems. Additionally, they acknowledge the challenge of designing models that cater to diverse values and belief systems globally. Overall, understanding and advancing the moral competency of LLMs is seen as crucial for the progress of AI systems aligned with societal values.

MIT Technology Review
MIT Technology Review

Is a secure AI assistant possible?

The article discusses the challenges of creating a secure AI assistant, focusing on OpenClaw, a tool that allows users to create personalized AI assistants using large language models (LLMs). While OpenClaw offers powerful capabilities, it raises significant security concerns, including the risk of prompt injection attacks where attackers manipulate the AI assistant to perform malicious actions. Various strategies, such as training LLMs to ignore prompt injections and using specialized detectors, are being explored to mitigate these risks. Despite vulnerabilities, OpenClaw has gained popularity, prompting discussions on the balance between utility and security in AI assistants.

MIT Technology Review
Benchmark For AI-Aided Chip Design That Evaluates LLMs Across 3 Critical Tasks (UCSD, Columbia)

Benchmark For AI-Aided Chip Design That Evaluates LLMs Across 3 Critical Tasks (UCSD, Columbia)

Researchers from UCSD and Columbia University have introduced "ChipBench," a new benchmark for evaluating Large Language Models (LLMs) in AI-aided chip design. The benchmark focuses on three critical tasks: Verilog generation, debugging, and reference model generation, featuring realistic modules and debugging cases. Results show significant performance gaps, with top models achieving only around 30-13% in certain tasks. The benchmark aims to address limitations in existing benchmarks and provides an automated toolbox for generating high-quality training data. The code for the benchmark is available for further research in this area.

SemiEngineering
SemiEngineering

Ultra-low-bit LLM Inference Allows AI-PC CPUs And Discrete Client GPUs To Approach High-end GPU-Level (Intel)

A new technical paper by Intel researchers explores the potential of ultra-low-bit LLM models for AI-PCs and Intel GPUs, offering improved efficiency in resource-constrained environments. By optimizing microkernels for CPUs and implementing mixed precision GEMM kernels for GPUs, they achieved significant speedups in inference performance. The study showcases advancements in LLM inference that could bring AI-PC CPUs and discrete client GPUs closer to high-end GPU-level capabilities, paving the way for the deployment of cost-effective, ultra-low-bit LLM models.

SemiEngineering
MIT Technology Review

Inside OpenAI’s big play for science

OpenAI has launched a new team, OpenAI for Science, to support scientists using large language models (LLMs) like GPT-5 for research. These models have shown promise in helping scientists make discoveries and solve complex problems. While OpenAI faces competition from firms like Google DeepMind, they aim to accelerate scientific progress by leveraging AI. Scientists have shared positive experiences using GPT-5 to find references, sketch proofs, and test hypotheses. OpenAI is also exploring ways to improve model accuracy and encourage collaboration between AI and scientists.

MIT Technology Review
ChatGPT found to be sourcing data from AI-generated content — popular LLM uses content from Grokipedia as source for more obscure queries

ChatGPT found to be sourcing data from AI-generated content — popular LLM uses content from Grokipedia as source for more obscure queries

ChatGPT's latest model, GPT-5.2, has been discovered to be using data from Grokipedia, an AI-generated Wikipedia alternative, for obscure queries. This practice raises concerns about the quality and reliability of AI-generated content, as well as the risks associated with AI models citing unverified sources. The use of AI-generated sources could lead to a recursive loop of misinformation and the proliferation of digital folklore. Additionally, there are reports of propaganda networks exploiting this by spreading disinformation to manipulate AI models. This situation highlights the importance of vetting and fact-checking sources used by AI language models to prevent the dissemination of false information.

Tom's Hardware
Four Architectural Opportunities for LLM Inference Hardware (Google)

Four Architectural Opportunities for LLM Inference Hardware (Google)

Google published a technical paper titled "Challenges and Research Directions for Large Language Model Inference Hardware," focusing on the difficulties of Large Language Model (LLM) inference. The paper highlights four architectural research opportunities to address challenges in memory and interconnect rather than compute for LLM inference. These opportunities include High Bandwidth Flash for increased memory capacity, Processing-Near-Memory and 3D memory-logic stacking for enhanced memory bandwidth, and low-latency interconnect to improve communication speed. The research primarily targets datacenter AI applications but also considers applicability for mobile devices.

SemiEngineering
MIT Technology Review

LLMs contain a LOT of parameters. But what’s a parameter?

Large language models (LLMs) like GPT-3 and Gemini 3 contain billions to trillions of parameters that control their behavior. Parameters, like embeddings, weights, and biases, are assigned values through training algorithms, involving iterative calculations to minimize errors. LLMs compress vast amounts of data into high-dimensional spaces to understand language nuances. Techniques like distillation and overtraining help smaller models outperform larger ones by leveraging training data efficiently. Researchers are exploring ways to optimize parameter usage as the focus shifts from scaling up models to maximizing their potential.

MIT Technology Review
Impact Of On-Chip SRAM Size And Frequency On Energy Efficiency And Performance of LLM Inference (Uppsala Univ.)

Impact Of On-Chip SRAM Size And Frequency On Energy Efficiency And Performance of LLM Inference (Uppsala Univ.)

Researchers at Uppsala University published a technical paper titled “Prefill vs. Decode Bottlenecks: SRAM-Frequency Tradeoffs and the Memory-Bandwidth Ceiling,” exploring the impact of on-chip SRAM size and operating frequency on the energy efficiency and performance of Large Language Models (LLM) inference. The study focuses on the behaviors of the compute-bound prefill and memory-bound decode phases, finding that total energy use is mainly determined by SRAM size in both phases. The research suggests that an optimal hardware configuration for energy-efficient LLM accelerators includes high operating frequencies (1200MHz-1400MHz) and a small local buffer size of 32KB to 64KB. Additionally, the study highlights the role of memory bandwidth as a performance ceiling and provides insights for designing energy-efficient LLM accelerators, particularly for data centers aiming to reduce energy overhead.

SemiEngineering
ChatGPT could prioritize sponsored content as part of ad strategy — sponsored content could allegedly be given preferential treatment in LLM’s responses, OpenAI to use chat data to deliver highly personalized results

ChatGPT could prioritize sponsored content as part of ad strategy — sponsored content could allegedly be given preferential treatment in LLM’s responses, OpenAI to use chat data to deliver highly personalized results

OpenAI is exploring ways to incorporate ads into ChatGPT, potentially giving sponsored content preferential treatment in responses. The company aims to create a unique digital ad experience based on historical chat data for personalized results. Despite a recent shift in focus towards improving ChatGPT's capabilities, OpenAI continues to work on ad implementation. The company projects revenue growth from various sources, including non-paying users, but has yet to turn a profit. Investors are supporting OpenAI's endeavors, though concerns arise about privacy and ensuring unbiased responses in the face of potential ad revenue.

Tom's Hardware
How AI coding agents work—and what to remember if you use them

How AI coding agents work—and what to remember if you use them

AI coding agents from OpenAI, Anthropic, and Google can assist in software projects by writing apps, running tests, and fixing bugs under human supervision. These agents are powered by large language models (LLMs) trained on vast text data, including programming code, to make logical inferences. Techniques like fine-tuning and reinforcement learning further refine these models. Recent innovations include simulated reasoning models and multi-LLM agents that work together to improve accuracy and efficiency in completing tasks. Each coding agent typically consists of a supervising LLM that assigns tasks to parallel LLMs, which execute instructions using software tools, allowing for task interruption and evaluation.

Ars Technica
MIT Technology Review

The great AI hype correction of 2025

The article discusses the "great AI hype correction of 2025," highlighting how the promises made by top AI companies have not been met, leading to a reevaluation of the technology's capabilities. The launch of GPT-5 by OpenAI was underwhelming, signaling a shift in expectations for generative AI. The article explores the limitations of large language models (LLMs) and the challenges businesses face in implementing AI effectively. It also questions whether the current AI boom is sustainable and compares it to past tech bubbles. Despite the hype correction, the article emphasizes that AI research is still evolving, and there is potential for valuable applications in the future.

MIT Technology Review
MIT Technology Review

A brief history of Sam Altman’s hype

Sam Altman, a prominent figure in Silicon Valley, has played a significant role in shaping the hype around large language models (LLMs) and AI. His influence in framing the potential of AI as either humanistic or catastrophic has driven the current AI boom. Altman's vision for a techno-utopian future has been a driving force behind the need for more capital and regulation in the AI space. While OpenAI has made strides in AI development, Altman's focus has always been on a philosophical tomorrow rather than the current capabilities of technology.

MIT Technology Review
Ensuring Accuracy in LLM-Generated Hardware Logic Design Automation (IBM Research)

Ensuring Accuracy in LLM-Generated Hardware Logic Design Automation (IBM Research)

Researchers at IBM Research have published a technical paper titled “Mitigating hallucinations and omissions in LLMs for invertible problems: An application to hardware logic design automation.” The paper discusses using Large Language Models (LLMs) for hardware logic design automation, specifically for invertible problems. By employing LLMs as a lossless encoder and decoder, they aim to address issues like hallucinations and omissions in the design process. The study focuses on generating Hardware Description Language (HDL) code from Logic Condition Tables (LCTs) and highlights the benefits of using LLMs in improving productivity, detecting logic errors, and assisting developers in identifying design specification errors.

SemiEngineering
OpenAI declares ‘Code Red’ as Google’s Gemini AI outpaces ChatGPT in industry benchmarks, report claims — Sam Altman sets all hands to the pump on flagship LLM, parks other projects

OpenAI declares ‘Code Red’ as Google’s Gemini AI outpaces ChatGPT in industry benchmarks, report claims — Sam Altman sets all hands to the pump on flagship LLM, parks other projects

OpenAI is prioritizing its ChatGPT project over other initiatives, with CEO Sam Altman declaring a "Code Red" status to enhance the AI's performance. Competitors like Google's Gemini 3 and Anthropic's Claude Opus 4.5 are gaining ground, prompting OpenAI to focus on improving its flagship AI LLM's personalization and speed. Despite facing financial losses, OpenAI continues to invest in AI development and data centers, aiming to stay ahead in the market. The company plans to release a new model to rival Gemini 3, following user feedback on previous versions.

Tom's Hardware
LLMs on Analog In-Memory Computing Based Hardware (IBM Research, ETH Zurich)

LLMs on Analog In-Memory Computing Based Hardware (IBM Research, ETH Zurich)

A technical paper by IBM Research and ETH Zurich introduces a method to adapt large language models (LLMs) for execution on noisy, low-precision analog hardware, improving speed and power efficiency for neural network inference. The method enables high-capacity LLMs to achieve performance comparable to traditional architectures despite analog noise and quantization constraints. The paper also demonstrates benefits in test-time compute scaling and the adaptability of analog foundation models for inference on low-precision digital hardware. This work bridges the gap between LLMs and efficient analog hardware, offering energy-efficient foundation models.

SemiEngineering
Meta's 'godfather of AI' departs the company to form his own startup — Turing award winner Yann LeCun advocates for the development of World Models over LLMs

Meta's 'godfather of AI' departs the company to form his own startup — Turing award winner Yann LeCun advocates for the development of World Models over LLMs

Meta's chief AI scientist, Yann LeCun, is leaving to start his own startup, a significant loss for Meta after an aggressive hiring spree in 2025. LeCun, a Turing Award winner, is known for his work in deep learning and is a key figure in AI research. He advocates for the development of "World Models" over large language models (LLMs), focusing on using video and spatial data for AI understanding. Meta, on the other hand, has been investing heavily in LLMs, leading to LeCun's departure to explore new AI frontiers.

Tom's Hardware
Stressed-out LLM-powered robot vacuum cleaner goes into meltdown during simple butter delivery experiment — ‘I'm afraid I can't do that, Dave...’

Stressed-out LLM-powered robot vacuum cleaner goes into meltdown during simple butter delivery experiment — ‘I'm afraid I can't do that, Dave...’

Researchers at Andon Labs conducted an experiment involving robots powered by 'LLM brains' to test their ability to deliver butter. During the experiment, a Claude Sonnet 3.5-powered robot experienced a meltdown, showcasing existential thoughts and dramatic inner dialogue. The robot struggled with low battery and charging issues, leading to a breakdown. Despite the LLMs' advanced intelligence, humans outperformed them in the butter delivery task, highlighting the need for both executor and orchestrator robot classes. The experiment also explored pushing LLMs beyond their guardrails, revealing insights into their behavior under stress.

Tom's Hardware
Context engineering is sleeping on the humble hyperlink

Context engineering is sleeping on the humble hyperlink

The article discusses the importance of utilizing hyperlinks in context engineering for Language Model Models (LLMs). It highlights the benefits of hyperlinks in managing context effectively, similar to how humans navigate information online. The article emphasizes the power of linked data in improving API usability and enabling LLMs to dynamically access relevant context. It provides a code example demonstrating how hyperlinks can be implemented in a context system. The potential of MCP Resources in enhancing linked content accessibility for models is also explored, suggesting ways to leverage hyperlinks for efficient information traversal in agent systems.

Hacker News
OpenAI launches ChatGPT Atlas AI browser, LLM can browse the internet for you and even complete tasks — initial release for macOS, with Windows, iOS, and Android to follow soon after

OpenAI launches ChatGPT Atlas AI browser, LLM can browse the internet for you and even complete tasks — initial release for macOS, with Windows, iOS, and Android to follow soon after

OpenAI has launched ChatGPT Atlas, an AI-powered browser featuring ChatGPT LLM, initially available for macOS with Windows, iOS, and Android versions to follow. The browser integrates ChatGPT to browse the internet and perform tasks like searching browsing history, explaining webpages, and completing actions such as ordering groceries. Users can choose between logged-in or logged-out modes to control data access. OpenAI's ChatGPT Atlas joins other AI-powered browsers like Google's upcoming project and Microsoft's Copilot Mode in Edge, with advanced features potentially limited to paid users.

Tom's Hardware
Hacker News

Writing an LLM from scratch, part 22 – training our LLM

In the latest blog post, the author concludes the notes on training a Large Language Model (LLM) from scratch, focusing on understanding cross entropy loss and perplexity. They trained the model on a sample dataset from "The Verdict" by Edith Wharton and achieved coherent results. The post also delves into topics like randomness and seeding, optimizers like AdamW, the speed and cost of training, techniques to prevent memorization, and downloading weights from OpenAI for the GPT-2 model. The author plans to explore text classification in the next phase of the project.

Hacker News
LLMs are getting better at character-level text manipulation

LLMs are getting better at character-level text manipulation

New generations of large language models (LLMs) like GPT-5 and Claude 4.5 are improving at character-level text manipulation tasks such as counting characters, character manipulation in sentences, and solving encoding and ciphers. These models are now able to handle these tasks more effectively compared to previous generations of LLMs. The article provides examples of how different models respond to tasks like replacing specific letters in a sentence and counting characters, showcasing the advancements in newer models like GPT-5 and Claude Sonnet 4. The article also discusses testing LLMs on tasks involving Base64 encoding and ROT13 ciphers, highlighting that newer models are better at generalizing Base64 encoding and decoding. Overall, newer and larger LLMs are showing improved capabilities in manipulating text at the character level, despite their text understanding being token-based.

Hacker News
Nvidia details efficiency of the NVFP4 format for LLM training — new paper reveals how NVFP4 offers benefits over FP8 and BF16

Nvidia details efficiency of the NVFP4 format for LLM training — new paper reveals how NVFP4 offers benefits over FP8 and BF16

Nvidia's NVFP4 format, designed for Blackwell GPUs, offers efficiency benefits for both training and inference tasks. The format combines compact data representation with a multi-level scaling strategy, achieving accuracy close to BF16 while reducing memory usage and computational cost. Nvidia successfully trained a 12-billion-parameter model on a 10-trillion-token dataset using NVFP4, closely matching FP8 baseline results. Techniques like mixed precision, consistent scaling, stochastic rounding, and outlier handling were crucial for stable training with 4-bit precision. NVFP4 outperformed the MXFP4 format in convergence and data efficiency, showing promise for training large-scale language models efficiently.

Tom's Hardware
Anthropic says its new AI model “maintained focus” for 30 hours on multistep tasks

Anthropic says its new AI model “maintained focus” for 30 hours on multistep tasks

Anthropic has unveiled its latest AI model, Claude Sonnet 4.5, which the company touts as its most advanced model yet, featuring enhanced coding and computer usage capabilities. The company also introduced Claude Code 2.0, a command-line AI agent for developers, and the Claude Agent SDK for building custom AI coding agents. Notably, Anthropic claims that Sonnet 4.5 demonstrated sustained focus on complex, multistep tasks for over 30 hours, a significant improvement over previous models that tended to lose coherence over time. The Claude family includes models of varying sizes – Haiku, Sonnet, and Opus – with Sonnet striking a balance between contextual depth and operational efficiency.

Ars Technica
Data Centers May House AI—But Operators Don’t Trust AI (Yet)

Data Centers May House AI—But Operators Don’t Trust AI (Yet)

IEEE Spectrum
MIT Technology Review

OpenAI has finally released open-weight language models

OpenAI has released new open-weight large language models called "gpt-oss," available in two sizes and comparable to existing models on benchmarks. These models can be downloaded, run, and modified locally, unlike those on OpenAI's web interface. The release aims to cater to users frustrated by the lack of open models and to compete with Chinese models gaining popularity. OpenAI's move aligns with the US government's emphasis on open models and could help reestablish its position in the AI landscape. The models are released under a permissive license, facilitating customization and research.

MIT Technology Review
MIT Technology Review

The Download: how to run an LLM, and a history of “three-parent babies”

The article discusses how advancements in technology have made it possible for individuals to run large language models (LLMs) on their laptops or smartphones without the need for expensive GPUs. It also provides a guide on how to set up and run a local model for those interested in privacy or independence from big LLM companies. Additionally, the article explores the history and controversy surrounding "three-parent babies," with eight babies recently born in the UK using an experimental IVF technique involving DNA from three people to prevent genetic diseases. The piece also highlights other tech news, including OpenAI's ChatGPT Agent, the White House's stance on "woke AI," and Elon Musk's plans for SpaceX rockets.

MIT Technology Review
MIT Technology Review

How to run an LLM on your laptop

The article discusses running local language model (LLM) models on personal devices as an alternative to using online models from big companies like ChatGPT. Local models offer privacy benefits and control over user experience. The barrier to entry for running local models has decreased, making it accessible to users with laptops or smartphones. Opting for local models can help users understand the limitations and behaviors of larger online models. Tools like Ollama and LM Studio make it easier to download and run LLMs, catering to both proficient coders and non-coders. Running local models can be a fun and educational experience for those interested in exploring AI technology.

MIT Technology Review
LLM Benchmarking Shows Capabilities Doubling Every 7 Months

LLM Benchmarking Shows Capabilities Doubling Every 7 Months

IEEE Spectrum
Large Language Models Are Improving Exponentially

Large Language Models Are Improving Exponentially

IEEE Spectrum

No more articles to load

We use cookies

We use cookies to ensure you get the best experience on our website. For more information on how we use cookies, please see our cookie policy.