This repository contains my study implementation and notes for @karpathy's build-nanogpt tutorial. It's a from-scratch reproduction of GPT-2.
train.py- Main training script with complete GPT implementation including:- Multi-head self-attention with Flash Attention optimization
- MLP blocks with GELU activation
- Layer normalization and residual connections
- Data loading and distributed training support
- HellaSwag evaluation integration
-
fineweb.py- FineWeb-Edu dataset downloader and tokenizer- Downloads 10B token dataset for pretraining
- GPT-2 tokenization using tiktoken
- Efficient data sharding for large-scale training
-
input.txt- Sample text data for quick experiments
hellaswag.py- HellaSwag benchmark evaluation script- Common sense reasoning evaluation
- Multiple choice completion task
- Model performance comparison utilities
play.ipynb- Jupyter notebook for interactive experimentation- Model testing and inference
- Training visualization
- Architecture exploration
- Original Tutorial by Andrej Karpathy
- Attention Is All You Need - Original Transformer paper
- GPT-2 Paper
- HellaSwag Benchmark