Skip to content

Commit 371cea7

Browse files
committed
Update README.md, add instructions for CUDA support, and optimize the project description
1 parent 27e1cef commit 371cea7

1 file changed

Lines changed: 73 additions & 32 deletions

File tree

README.md

Lines changed: 73 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,61 +1,102 @@
1-
# microgpt (optimized)
1+
# microgpt (optimized + CUDA)
22

33
![microgpt logo](assets/microgpt.png)
44

5-
Optimized version of [Karpathy's microgpt](https://karpathy.ai/microgpt.html), the most atomic way to train and inference a GPT in pure, dependency-free Python.
5+
A minimal GPT project with two aligned implementations:
66

7-
**293 lines, 0 dependencies.** All optimizations preserve the original simplicity.
7+
- `microgpt.py`: pure Python, dependency-free reference implementation.
8+
- `microgpt_cuda.cu`: CUDA/C++ implementation for Windows (MSVC + CUDA), optimized for speed while keeping the same model/training logic.
89

9-
## What's Changed
10+
## What this repo focuses on
1011

11-
| Optimization | Lines | Impact |
12-
|---|---|---|
13-
| Direct `__truediv__` implementation | +8 | ~20-30% fewer computation graph nodes per step |
14-
| Fused `cross_entropy` (log-softmax + NLL) | +5 | Fewer nodes + better numerical stability |
15-
| Iterative `backward()` topological sort | 0 | Eliminates recursion depth limit |
16-
| `sum(losses[1:], losses[0])` | 0 | Removes phantom `Value(0)` node |
17-
| Adam running product | +2 | Numerically stable bias correction at large step counts |
18-
| `with open()` file handle | +1 | Proper resource cleanup |
19-
| **Weight tying** (wte = lm_head) | -1 | Standard GPT-2 practice, fewer params |
20-
| **Cosine LR schedule** | 0 | Smoother decay than linear |
21-
| **Train/val split** (90/10) | +3 | Basic ML hygiene, detect overfitting |
22-
| **Periodic validation** (every 100 steps) | +10 | Pure-float NLL eval on held-out docs |
23-
| **Gradient clipping** (global norm) | +4 | Prevents exploding gradients, stabilizes training |
24-
| **AdamW weight decay** | +1 | Decoupled regularization |
25-
| **Top-k sampling** (k=5) | +4 | Higher quality inference, avoids garbage tokens |
26-
| **Per-step timing** | +3 | Performance observability in ms/step |
12+
- Keep the project small and readable.
13+
- Preserve algorithmic parity between Python and CUDA paths.
14+
- Push performance through GPU residency and kernel fusion where it matters.
2715

28-
**Total: +50 lines** (243 -> 293), no new dependencies.
16+
Core model/training recipe (both paths):
2917

30-
## Files
18+
- Character tokenizer with `<BOS>`.
19+
- GPT-style block with RMSNorm, causal multi-head attention, and ReLU^2 MLP.
20+
- Weight tying (`wte` reused as LM head).
21+
- AdamW + cosine LR + global grad clipping.
22+
- Train/val split, periodic validation, top-k sampling inference.
3123

32-
- **`microgpt.py`** - Complete optimized Python algorithm (runnable)
33-
- **`microgpt_cuda.cu`** - CUDA/C++ port with full train/val/inference loop
34-
- **`microgpt_optimized.html`** - Syntax-highlighted 3-column view with change annotations
35-
- **`CMakeLists.txt`** - CMake entrypoint for CUDA build
24+
## Repository layout
3625

37-
## Quick Start
26+
- `microgpt.py`: full Python algorithm (train + val + inference).
27+
- `microgpt_cuda.cu`: full CUDA/C++ algorithm (train + val + inference).
28+
- `microgpt_optimized.html`: side-by-side Python/CUDA code converter view.
29+
- `CMakeLists.txt`: CUDA build entry.
30+
- `input.txt`: corpus (auto-downloaded if missing on first run).
31+
32+
## Quick start (Python)
3833

3934
```bash
4035
python microgpt.py
4136
```
4237

43-
It auto-downloads `input.txt` on first run, trains for 500 steps with periodic validation, then generates samples via top-k sampling.
38+
If `input.txt` is missing, the script downloads the default names dataset automatically.
39+
40+
## Quick start (CUDA / Windows)
41+
42+
Prerequisites:
4443

45-
## CUDA Build
44+
- NVIDIA GPU + compatible driver
45+
- CUDA Toolkit (your setup: CUDA 13.1)
46+
- Visual Studio 2022 (MSVC, x64 toolchain)
47+
- CMake 3.24+
48+
49+
Build:
4650

4751
```bash
48-
cmake -S . -B build -G "Visual Studio 17 2022" -A x64
52+
cmake -S . -B build -G "Visual Studio 17 2022" -A x64 -DCMAKE_CUDA_ARCHITECTURES=86
4953
cmake --build build --config Release
54+
```
55+
56+
Run:
57+
58+
```bash
59+
.\build\Release\microgpt_cuda.exe --help
5060
.\build\Release\microgpt_cuda.exe
5161
```
5262

53-
For quick smoke tests:
63+
Smoke test:
5464

5565
```bash
5666
.\build\Release\microgpt_cuda.exe --steps 5 --samples 3
5767
```
5868

69+
## CUDA CLI options
70+
71+
- `--steps <int>`: training steps (default `500`)
72+
- `--val-every <int>`: validation interval (default `100`)
73+
- `--val-docs <int>`: max validation docs per eval (default `20`)
74+
- `--samples <int>`: generated samples after training (default `20`)
75+
- `--top-k <int>`: top-k for sampling (default `5`)
76+
- `--temperature <float>`: sampling temperature (default `0.6`)
77+
- `--seed <int>`: RNG seed (default `42`)
78+
79+
## Important implementation notes
80+
81+
- CUDA path keeps parameters, gradients, and optimizer states on GPU.
82+
- Training step is fused into one kernel launch (forward + backward + grad clip + AdamW update).
83+
- Current fused implementation is specialized to `n_layer = 1` (same as current Python config).
84+
- `kMaxVocab = 256` in `microgpt_cuda.cu`; if your dataset exceeds this, increase it and rebuild.
85+
- Default `CMAKE_CUDA_ARCHITECTURES` is `86`; set it to your GPU architecture when needed.
86+
87+
## Code converter page
88+
89+
Open `microgpt_optimized.html` in a browser to switch between:
90+
91+
- Python view
92+
- CUDA view
93+
- Bilingual side-by-side comparison
94+
95+
This is useful for checking one-to-one conceptual mapping between the two codebases.
96+
5997
## Credits
6098

61-
Original by [@karpathy](https://github.com/karpathy) - [microgpt](https://karpathy.ai/microgpt.html) | [Gist](https://gist.github.com/karpathy/8627fe009c40f57531cb18360106ce95)
99+
Original microgpt idea and baseline by [@karpathy](https://github.com/karpathy):
100+
101+
- https://karpathy.ai/microgpt.html
102+
- https://gist.github.com/karpathy/8627fe009c40f57531cb18360106ce95

0 commit comments

Comments
 (0)