back to home

karpathy / build-nanogpt

Video+code lecture on building nanoGPT from scratch

4,837 stars
770 forks
35 issues
PythonJupyter Notebook

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing karpathy/build-nanogpt in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/karpathy/build-nanogpt)
Preview:Analyzed by RepoMind

Repository Overview (README excerpt)

Crawler view

build nanoGPT This repo holds the from-scratch reproduction of nanoGPT. The git commits were specifically kept step by step and clean so that one can easily walk through the git commit history to see it built slowly. Additionally, there is an accompanying video lecture on YouTube where you can see me introduce each commit and explain the pieces along the way. We basically start from an empty file and work our way to a reproduction of the GPT-2 (124M) model. If you have more patience or money, the code can also reproduce the GPT-3 models. While the GPT-2 (124M) model probably trained for quite some time back in the day (2019, ~5 years ago), today, reproducing it is a matter of ~1hr and ~$10. You'll need a cloud GPU box if you don't have enough, for that I recommend Lambda. Note that GPT-2 and GPT-3 and both simple language models, trained on internet documents, and all they do is "dream" internet documents. So this repo/video this does not cover Chat finetuning, and you can't talk to it like you can talk to ChatGPT. The finetuning process (while quite simple conceptually - SFT is just about swapping out the dataset and continuing the training) comes after this part and will be covered at a later time. For now this is the kind of stuff that the 124M model says if you prompt it with "Hello, I'm a language model," after 10B tokens of training: And after 40B tokens of training: Lol. Anyway, once the video comes out, this will also be a place for FAQ, and a place for fixes and errata, of which I am sure there will be a number :) For discussions and questions, please use Discussions tab, and for faster communication, have a look at my Zero To Hero Discord, channel **#nanoGPT**: Video Let's reproduce GPT-2 (124M) YouTube lecture Errata Minor cleanup, we forgot to delete of the bias once we switched to flash attention, fixed with a recent PR. Earlier version of PyTorch may have difficulty converting from uint16 to long. Inside , we added to use numpy to convert uint16 to int32 before converting to torch tensor and then converting to long. The function takes an arg , to which I tried to stubbornly just pass hoping it works ok, but PyTorch actually really wants just the type and creates errors in some version of PyTorch. So we want e.g. the device to get stripped to . Currently, device (Apple Silicon) would become CPU, I'm not 100% sure this is the intended PyTorch way. Confusingly, is actually used by both the forward and backward pass. Moved up the line so that it also gets applied to the forward pass. Prod For more production-grade runs that are very similar to nanoGPT, I recommend looking at the following repos: • litGPT • TinyLlama FAQ License MIT