Writing an LLM from scratch, part 22 – training our LLM
Source
Hacker News
Published
TL;DR
AI GeneratedIn the latest blog post, the author concludes the notes on training a Large Language Model (LLM) from scratch, focusing on understanding cross entropy loss and perplexity. They trained the model on a sample dataset from "The Verdict" by Edith Wharton and achieved coherent results. The post also delves into topics like randomness and seeding, optimizers like AdamW, the speed and cost of training, techniques to prevent memorization, and downloading weights from OpenAI for the GPT-2 model. The author plans to explore text classification in the next phase of the project.