DINOv3 reimplemented in ~800 lines of pure, dependency-free Python. No torch, no numpy — just stdlib. Inspired by Karpathy's microGPT.
- Custom autograd engine (
Raw+Tensorwith backward pass) - Vision Transformer with Rotary Position Embeddings (RoPE)
- DINO + iBOT dual-head self-supervised objectives
- EMA centering (stable) and optional Sinkhorn-Knopp
- KoLeo regularization, register tokens, optional layer scale
- KNN evaluation on MNIST
python microdinov3.py
Downloads MNIST automatically (~60MB). Runs on CPU.
| Eval | Accuracy | Baseline |
|---|---|---|
| KNN on pre-head CLS (32-dim) | 35.2% | 10% random |
| KNN on DINO head output (32-dim) | 29.2% | 10% random |
output.txt contains a full training log.
DINOv2's Sinkhorn-Knopp centering needs large batches and many prototypes to stay stable (otherwise it collapses to uniform). At this scale, DINOv1-style EMA centering is more reliable, so both are implemented and EMA is default. Gram anchoring is scaffolded but unused.