Skip to content

Latest commit

 

History

History
107 lines (88 loc) · 9.14 KB

File metadata and controls

107 lines (88 loc) · 9.14 KB

Awesome tiny machine learning projects

Awesome

A curated collection of github projects with tiny code base. Most of them are primarily interesting for educational purposes, but some of them (e.g. tinygrad) compete with large and complex projects.

My Image

Contents

Andrej Karpathy

  • cryptos - Pure Python from-scratch zero-dependency implementation of Bitcoin for educational purposes.
  • llama2.c - Inference Llama 2 in one file of pure C.
  • llm.c - LLM training in simple, raw C/CUDA.
  • micrograd - A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API.
  • minbpe - Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
  • minGPT - A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training.
  • nanoGPT - The simplest, fastest repository for training/finetuning medium-sized GPT.
  • nano-llama31 - nanoGPT style version of Llama 3.1.
  • nanochat - The best ChatGPT that $100 can buy.

Diffusion models

  • diffusion-gpt - An annotated implementation of a character-level disrete diffusion model for text generation. Inspired by nanoGPT.
  • micro_diffusion - Micro-budget training of large-scale diffusion models by Sony Research.
  • minimal-text-diffusion - A minimal implementation of diffusion model for text generation. Also contains a basic list of papers/blogs/videos for a deeper dive into diffusion models.

GPU

  • penny - hand-written gpu communication lib (nccl).
  • tiny-gpu - A minimal GPU design in Verilog to learn how GPUs work from the ground up.

🤗Huggingface

  • nanotron - Minimalistic large language model 3D-parallelism training.
  • nanoVLM - The simplest, fastest repository for training/finetuning small-sized VLMs.
  • picotron - The minimalist & most-hackable repository for pre-training Llama-like models with 4D Parallelism. It is designed with simplicity and educational purposes in mind.
  • smolagents - A barebones library for agents that think in code.
  • smol-course - A course on aligning smol models.
  • smollm - Everything about the SmolLM and SmolVLM family of models.

Inference engines

  • flex-nano-vllm - FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.
  • mini-sglang - Mini-Sglang by sgl-project.
  • nano-vllm - A lightweight vLLM implementation built from scratch.
  • tiny-vllm - You're going to build a high performance LLM inference engine with C++ and CUDA.
  • tokasaurus - LLM inference engine optimized for throughput-intensive workloads. On throughput-focused benchmarks, Tokasaurus can outperform vLLM and SGLang by up to 3x+.

LLMs

  • minimind - Project aims to train a super-small language model MiniMind with only 3 RMB cost and 2 hours, starting completely from scratch.
  • modded-nanogpt - NanoGPT (124M) in 3 minutes on 8xH100.
  • modded-nanogpt-rwkv - Modified variant of nanoGPT for RWKV.
  • nanoMoE - The simplest, fastest repository for training/finetuning medium-sized MoE-based GPTs. Also, an awesome post by the author about MoE.
  • nanoT5 - Fast & Simple repository for pre-training and fine-tuning T5-style models.
  • needle - Distillation Gemini 3.1 into a 26m parameter model. It works especially well with function calling.
  • parameter-golf - OpenAI Model Craft Challenge: train the best language model that fits in a 16MB artifact and trains in under 10 minutes on 8xH100s.

PyTorch Foundation

  • gpt-fast - Simple and efficient pytorch-native transformer text generation. LLaMA like, gptq, tensor parallelism, spec decoding, etc.
  • LeanRL - LeanRL is a fork of CleanRL where hand-picked scripts have been re-written using PyTorch 2 features, mainly torch.compile and cudagraphs.
  • segment-anything-fast - Segment Anything over 8x using only pure, native PyTorch.

RecSys

Reinforcement learning

  • Mini-R1 - Minimal reproduction of DeepSeek R1-Zero. Code built upon trl.
  • minimalRL - Implementations of basic RL algorithms with minimal lines of codes.
  • nano-aha-moment - Inspired by TinyZero and Mini-R1, but designed to be much simpler, cleaner, and faster, with every line of code visible and understandable.
  • nanoRLHF - This project aims to perform RLHF training from scratch, implementing almost all core components manually except for PyTorch and Triton.
  • TinyZero - Minimal reproduction of DeepSeek R1-Zero. Code built upon verl.

Tabular ML

  • nanoTabPFN - Train your own small TabPFN in less than 500 LOC and a few minutes. The purpose of this repository is to be a good starting point for students and researchers that are interested in learning about how TabPFN works under the hood.

ML

  • mini-swe-agent - The 100 line AI agent that solves GitHub issues or helps you in your command line.
  • nano-graphrag - A simple, easy-to-hack GraphRAG implementation.
  • tinygrad - You like pytorch? You like micrograd? You love tinygrad! ❤️
  • tinyvector - A tiny nearest-neighbor embedding database built with SQLite and Pytorch.

ML & CyberSec

  • subwiz - nanoGPT based model, trained to discover subdomains.

C

  • agent-c - A ultra-lightweight AI agent written in C that communicates with OpenRouter API and executes shell commands.
  • flux2.c - FLUX.2-klein-4B Pure C Implementation. Zero external dependencies beyond the C standard library. By the creator of redis + Claude.
  • miniaudio - Audio playback and capture library written in C, in a single source file.
  • nanoMPI - A minimal MPI Implementation loosely based on OpenMPI. nanoMPI allows beginners to the field of distributed computing to quickly see answers to questions like "how is a ring allreduce implemented?"

Low-level

  • tiny-tpu - A minimal tensor processing unit (TPU), inspired by Google's TPU V2 and V1.