Skip to content

Commit 25c3e2f

Browse files
committed
Refactor code structure for improved readability and maintainability
1 parent 5aa4296 commit 25c3e2f

3 files changed

Lines changed: 1234 additions & 12 deletions

File tree

CMakeLists.txt

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
cmake_minimum_required(VERSION 3.24)
2+
project(microgpt_cuda LANGUAGES CXX CUDA)
3+
4+
set(CMAKE_CXX_STANDARD 17)
5+
set(CMAKE_CUDA_STANDARD 17)
6+
set(CMAKE_CXX_STANDARD_REQUIRED ON)
7+
set(CMAKE_CUDA_STANDARD_REQUIRED ON)
8+
9+
if(NOT DEFINED CMAKE_CUDA_ARCHITECTURES)
10+
set(CMAKE_CUDA_ARCHITECTURES 86)
11+
endif()
12+
13+
add_executable(microgpt_cuda microgpt_cuda.cu)
14+
target_compile_options(microgpt_cuda PRIVATE $<$<COMPILE_LANGUAGE:CUDA>:--use_fast_math>)

README.md

Lines changed: 28 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
![microgpt logo](assets/microgpt.png)
44

5-
Optimized version of [Karpathy's microgpt](https://karpathy.ai/microgpt.html) the most atomic way to train and inference a GPT in pure, dependency-free Python.
5+
Optimized version of [Karpathy's microgpt](https://karpathy.ai/microgpt.html), the most atomic way to train and inference a GPT in pure, dependency-free Python.
66

77
**293 lines, 0 dependencies.** All optimizations preserve the original simplicity.
88

@@ -12,34 +12,50 @@ Optimized version of [Karpathy's microgpt](https://karpathy.ai/microgpt.html)
1212
|---|---|---|
1313
| Direct `__truediv__` implementation | +8 | ~20-30% fewer computation graph nodes per step |
1414
| Fused `cross_entropy` (log-softmax + NLL) | +5 | Fewer nodes + better numerical stability |
15-
| Iterative `backward()` topological sort | ±0 | Eliminates recursion depth limit |
16-
| `sum(losses[1:], losses[0])` | ±0 | Removes phantom `Value(0)` node |
17-
| Adam β running product | +2 | Numerically stable bias correction at large step counts |
15+
| Iterative `backward()` topological sort | 0 | Eliminates recursion depth limit |
16+
| `sum(losses[1:], losses[0])` | 0 | Removes phantom `Value(0)` node |
17+
| Adam running product | +2 | Numerically stable bias correction at large step counts |
1818
| `with open()` file handle | +1 | Proper resource cleanup |
19-
| **Weight tying** (wte = lm_head) | -1 | Standard GPT-2 practice, 432 fewer params |
20-
| **Cosine LR schedule** | ±0 | Smoother decay than linear, matches Karpathy's latest |
19+
| **Weight tying** (wte = lm_head) | -1 | Standard GPT-2 practice, fewer params |
20+
| **Cosine LR schedule** | 0 | Smoother decay than linear |
2121
| **Train/val split** (90/10) | +3 | Basic ML hygiene, detect overfitting |
2222
| **Periodic validation** (every 100 steps) | +10 | Pure-float NLL eval on held-out docs |
2323
| **Gradient clipping** (global norm) | +4 | Prevents exploding gradients, stabilizes training |
24-
| **AdamW weight decay** | +1 | Decoupled regularization (Loshchilov & Hutter 2019) |
24+
| **AdamW weight decay** | +1 | Decoupled regularization |
2525
| **Top-k sampling** (k=5) | +4 | Higher quality inference, avoids garbage tokens |
2626
| **Per-step timing** | +3 | Performance observability in ms/step |
2727

28-
**Total: +50 lines** (243 293), no new dependencies.
28+
**Total: +50 lines** (243 -> 293), no new dependencies.
2929

3030
## Files
3131

32-
- **`microgpt.py`** — The complete optimized algorithm (runnable)
33-
- **`microgpt_optimized.html`** — Syntax-highlighted 3-column view with change annotations
32+
- **`microgpt.py`** - Complete optimized Python algorithm (runnable)
33+
- **`microgpt_cuda.cu`** - CUDA/C++ port with full train/val/inference loop
34+
- **`microgpt_optimized.html`** - Syntax-highlighted 3-column view with change annotations
35+
- **`CMakeLists.txt`** - CMake entrypoint for CUDA build
3436

3537
## Quick Start
3638

3739
```bash
3840
python microgpt.py
3941
```
4042

41-
It will auto-download `input.txt` (names dataset) on first run, train for 500 steps with periodic validation, then generate samples via top-k sampling.
43+
It auto-downloads `input.txt` on first run, trains for 500 steps with periodic validation, then generates samples via top-k sampling.
44+
45+
## CUDA Build
46+
47+
```bash
48+
cmake -S . -B build -G "Visual Studio 17 2022" -A x64
49+
cmake --build build --config Release
50+
.\build\Release\microgpt_cuda.exe
51+
```
52+
53+
For quick smoke tests:
54+
55+
```bash
56+
.\build\Release\microgpt_cuda.exe --steps 5 --samples 3
57+
```
4258

4359
## Credits
4460

45-
Original by [@karpathy](https://github.com/karpathy) [microgpt](https://karpathy.ai/microgpt.html) | [Gist](https://gist.github.com/karpathy/8627fe009c40f57531cb18360106ce95)
61+
Original by [@karpathy](https://github.com/karpathy) - [microgpt](https://karpathy.ai/microgpt.html) | [Gist](https://gist.github.com/karpathy/8627fe009c40f57531cb18360106ce95)

0 commit comments

Comments
 (0)