LLMs from Scratch

A step-by-step implementation of a GPT-like large language model from the ground up — covering everything from tokenization and attention mechanisms to pretraining, finetuning, and alignment. Built entirely in PyTorch with no high-level wrappers, so every component is transparent and hackable.

Beyond the core GPT implementation, this repo also includes from-scratch builds of Llama 3.2, Qwen3, and Gemma 3, plus modern techniques like LoRA, DPO, KV Cache, GQA, MoE, and Sliding Window Attention.

#	Chapter	Description
1	Understanding Large Language Models	LLM fundamentals, development lifecycle, and landscape overview
2	Working with Text Data	Tokenization, byte-pair encoding, data loaders, and embedding layers
3	Coding Attention Mechanisms	Self-attention, causal masking, and multi-head attention from scratch
4	Implementing a GPT Model	Full GPT architecture — transformer blocks, layer norm, text generation
5	Pretraining on Unlabeled Data	Training loop, loss computation, weight loading, and text generation
6	Finetuning for Classification	Adapting the pretrained model for spam classification
7	Finetuning to Follow Instructions	Instruction tuning with Alpaca-style datasets

#	Appendix	Description
A	Introduction to PyTorch	PyTorch fundamentals: tensors, autograd, datasets, and training
B	References and Further Reading	Curated reading list and references
C	Exercise Solutions	Solutions for chapter exercises
D	Adding Bells and Whistles to the Training Loop	Cosine annealing, warmup, gradient clipping, and more
E	Parameter-efficient Finetuning with LoRA	Low-Rank Adaptation implemented from scratch

Bonus Content

Each chapter includes bonus notebooks and scripts that go well beyond the basics:

Chapter 2 — Working with Text Data

Byte-pair encoder comparison (custom vs tiktoken)
Embedding layers vs matrix multiplication equivalence
Data loader intuition deep-dive
BPE tokenizer from scratch

Chapter 3 — Coding Attention Mechanisms

Efficient multi-head attention variants
Understanding PyTorch buffers

Chapter 4 — Implementing a GPT Model

Performance analysis and FLOPS estimation
KV Cache implementation
Grouped-Query Attention (GQA)
Multi-Head Latent Attention (MLA)
Sliding Window Attention (SWA)
Mixture-of-Experts (MoE)

Chapter 5 — Pretraining on Unlabeled Data

Alternative weight loading strategies
Pretraining on Project Gutenberg data
Learning rate schedulers and gradient clipping
Hyperparameter tuning
GPT → Llama 3.2 architecture conversion
Memory-efficient weight loading
Extending tokenizers
LLM training speed optimizations
Qwen3 from scratch (0.6B dense + 30B-A3B MoE)
Gemma 3 from scratch (1B)
Interactive chat UI

Chapter 6 — Finetuning for Classification

Additional experiments (last vs first token, extended context)
IMDb movie review classification
Interactive classification UI

Chapter 7 — Finetuning to Follow Instructions

Dataset preparation utilities
Model evaluation (local Llama 3 + GPT-4 API)
Direct Preference Optimization (DPO) from scratch
Synthetic dataset generation
Interactive instruction-following UI

Project Structure

├── ch01/ - ch07/          # Main chapters (notebooks + scripts)
├── appendix-A/ - E/       # Supplementary material
├── pkg/llms_from_scratch/  # Installable Python package
│   ├── ch02.py - ch07.py  # Chapter implementations as modules
│   ├── llama3.py           # Llama 3 architecture
│   ├── qwen3.py            # Qwen3 architecture
│   ├── generate.py         # Text generation utilities
│   ├── kv_cache/           # KV cache implementations
│   └── tests/              # Test suite
├── setup/                  # Environment setup guides
├── reasoning-from-scratch/ # Reasoning model experiments
├── pyproject.toml          # Project config & dependencies
├── pixi.toml               # Conda-based environment manager
└── requirements.txt        # Pip requirements (quick reference)

Getting Started

Requirements

Python >=3.10, <3.14
PyTorch >=2.2.2

Installation

Option 1 — pip (recommended for most users)

pip install -r requirements.txt

Option 2 — Install as a package

pip install -e ./pkg

Then import anywhere:

from llms_from_scratch.ch04 import GPTModel
from llms_from_scratch.generate import generate

Option 3 — pixi (reproducible conda environment)

pixi install
pixi run jupyter lab

Running the Notebooks

jupyter lab

Open any chapter notebook (e.g. ch02/01_main-chapter-code/ch02.ipynb) and run cells sequentially.

Key Technologies

Category	Stack
Deep Learning	PyTorch
Tokenization	tiktoken, sentencepiece, Hugging Face tokenizers
Model Formats	safetensors, PyTorch state_dict
Notebooks	JupyterLab
Testing	pytest, nbval
APIs	OpenAI, Hugging Face Hub
Interactive UI	Chainlit
Data & Viz	numpy, pandas, matplotlib, scikit-learn

Model Architectures Implemented

Model	Type	Location
GPT-2 (124M)	Dense transformer	Ch04 – Ch07 (core)
Llama 3.2	Dense transformer + RoPE + GQA	Ch05 bonus
Qwen3 0.6B	Dense transformer	Ch05 bonus
Qwen3 30B-A3B	Mixture-of-Experts	Ch05 bonus
Gemma 3 1B	Dense transformer + SWA	Ch05 bonus

Advanced Techniques

KV Cache — inference-time memory optimization for autoregressive generation
Grouped-Query Attention (GQA) — reduces KV heads for efficiency
Multi-Head Latent Attention (MLA) — compressed latent KV projections
Sliding Window Attention (SWA) — bounded context for long sequences
Mixture-of-Experts (MoE) — sparse activation with expert routing
LoRA — parameter-efficient finetuning via low-rank adapters
DPO — alignment without a reward model

Supported Environments

Environment	Notes
Local (CPU)	Works on a laptop — no GPU required
Local (GPU)	CUDA or Apple MPS for faster training
Google Colab	Free tier works for most chapters
Docker	DevContainer config included in `setup/`
AWS SageMaker	CloudFormation template in `setup/`

License

This project is licensed under the Apache License 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
appendix-A		appendix-A
appendix-B		appendix-B
appendix-C		appendix-C
appendix-D		appendix-D
appendix-E		appendix-E
ch01		ch01
ch02		ch02
ch03		ch03
ch04		ch04
ch05		ch05
ch06		ch06
ch07		ch07
pkg/llms_from_scratch		pkg/llms_from_scratch
setup		setup
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
pixi.toml		pixi.toml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLMs from Scratch

Table of Contents

Bonus Content

Project Structure

Getting Started

Requirements

Installation

Running the Notebooks

Key Technologies

Model Architectures Implemented

Advanced Techniques

Supported Environments

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLMs from Scratch

Table of Contents

Bonus Content

Project Structure

Getting Started

Requirements

Installation

Running the Notebooks

Key Technologies

Model Architectures Implemented

Advanced Techniques

Supported Environments

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages