We use cookies

We use cookies to ensure you get the best experience on our website. For more information on how we use cookies, please see our cookie policy.

Back to home

Writing an LLM from scratch, part 22 – training our LLM

Source

Hacker News

Published

TL;DR

AI Generated

In the latest blog post, the author concludes the notes on training a Large Language Model (LLM) from scratch, focusing on understanding cross entropy loss and perplexity. They trained the model on a sample dataset from "The Verdict" by Edith Wharton and achieved coherent results. The post also delves into topics like randomness and seeding, optimizers like AdamW, the speed and cost of training, techniques to prevent memorization, and downloading weights from OpenAI for the GPT-2 model. The author plans to explore text classification in the next phase of the project.