Back to home
Technology

Writing an LLM from scratch, part 22 – training our LLM

Source

Hacker News

Published

TL;DR

AI Generated

In the latest blog post, the author concludes the notes on training a Large Language Model (LLM) from scratch, focusing on understanding cross entropy loss and perplexity. They trained the model on a sample dataset from "The Verdict" by Edith Wharton and achieved coherent results. The post also delves into topics like randomness and seeding, optimizers like AdamW, the speed and cost of training, techniques to prevent memorization, and downloading weights from OpenAI for the GPT-2 model. The author plans to explore text classification in the next phase of the project.

Read Full Article

Similar Articles

MIT Technology Review

Three reasons why DeepSeek’s new model matters

DeepSeek's new V4 model is significant for three key reasons. Firstly, it offers high performance at a fraction of the cost of comparable models, making cutting-edge AI capabilities more accessible. Secondly, V4 introduces a new approach to memory efficiency by handling 1 million tokens in its context window, reducing computing power and memory usage significantly. Lastly, V4 marks a shift towards Chinese chip optimization, specifically for Huawei's Ascend chips, challenging the dominance of US chip giant Nvidia and potentially signaling China's progress in building a parallel AI infrastructure.

MIT Technology Review
Disaggregating LLM Inference: Inside the SambaNova Intel Heterogeneous Compute Blueprint

Disaggregating LLM Inference: Inside the SambaNova Intel Heterogeneous Compute Blueprint

SambaNova Systems and Intel have introduced a blueprint for heterogeneous inference that optimizes modern large language model (LLM) workloads by utilizing specialized hardware for different phases of inference: GPUs for prefill, SambaNova RDUs for decode, and Intel Xeon 6 CPUs for agentic tools and orchestration. This approach addresses the complexity of agentic AI systems with varying compute demands. By isolating tasks onto specific hardware, the architecture improves efficiency, scalability, and cost-effectiveness. The design reflects a shift towards specialized compute fabrics and better supports the evolving landscape of AI reasoning systems.

SemiWiki
Automated Security Assertion Generation Using LLMs (U. of Florida)

Automated Security Assertion Generation Using LLMs (U. of Florida)

A technical paper titled "Assertain: Automated Security Assertion Generation Using Large Language Models" by the University of Florida introduces Assertain, an automated framework that generates security properties and SystemVerilog Assertions for hardware designs. By leveraging large language models and self-reflection refinement, Assertain improves assertion quality and reduces manual effort in hardware security verification. In evaluations on 11 hardware designs, Assertain outperformed GPT-5 in correct assertion generation, unique CWE coverage, and architectural flaw detection. The framework significantly enhances vulnerability coverage in hardware security verification.

SemiEngineering
How SW and HW Vulnerabilities Can Complement LLM-Specific Algorithmic Attacks (UT Austin, Intel et al.)

How SW and HW Vulnerabilities Can Complement LLM-Specific Algorithmic Attacks (UT Austin, Intel et al.)

A technical paper titled “Cascade: Composing Software-Hardware Attack Gadgets for Adversarial Threat Amplification in Compound AI Systems” by UT Austin, Intel Labs, Symmetry Systems, Microsoft, and Georgia Tech explores how software and hardware vulnerabilities can combine with LLM-specific algorithmic attacks to compromise the integrity of compound AI pipelines. The paper demonstrates two novel attacks that leverage system-level vulnerabilities along with algorithmic weaknesses to breach AI safety and confidentiality. By systematically analyzing attack primitives and mapping vulnerabilities to different stages of an attack lifecycle, the paper emphasizes the importance of addressing traditional vulnerabilities for robust defense strategies in the future.

SemiEngineering

We use cookies

We use cookies to ensure you get the best experience on our website. For more information on how we use cookies, please see our cookie policy.