|
1 | | -# <strong>Welcome to MiniMind!</strong> |
| 1 | +# Welcome to MiniMind! |
2 | 2 |
|
3 | 3 | <figure markdown> |
4 | 4 |  |
|
7 | 7 |
|
8 | 8 | ## 📌 Introduction |
9 | 9 |
|
10 | | -MiniMind is a super-small language model project trained completely from scratch, requiring **only $0.5 + 2 hours** to train a **26M** language model! |
| 10 | +**MiniMind** is a complete, open-source project for training ultra-small language models from scratch with minimal cost. Train a **26M** ChatBot in just **2 hours** with only **$3** on a single 3090 GPU! |
11 | 11 |
|
12 | 12 | - **MiniMind** series is extremely lightweight, the smallest version is **1/7000** the size of GPT-3 |
13 | | -- The project open-sources the minimalist structure of large models, including: |
14 | | - - Mixture of Experts (MoE) |
15 | | - - Dataset cleaning |
16 | | - - Pretraining |
17 | | - - Supervised Fine-Tuning (SFT) |
18 | | - - LoRA fine-tuning |
19 | | - - Direct Preference Optimization (DPO) |
20 | | - - Model distillation |
21 | | -- All core algorithm code is reconstructed from scratch using native PyTorch, without relying on third-party abstract interfaces |
22 | | -- This is not only a full-stage open-source reproduction of large language models, but also a tutorial for getting started with LLMs |
23 | | - |
24 | | -!!! note "Training Cost" |
25 | | - "2 hours" is based on NVIDIA 3090 hardware (single card) testing, "$0.5" refers to GPU server rental cost |
26 | | - |
27 | | -## ✨ Key Features |
28 | | - |
29 | | -- **Ultra-low cost**: Single 3090, 2 hours, $0.5 to train a ChatBot from scratch |
30 | | -- **Complete pipeline**: Covers Tokenizer, pretraining, SFT, LoRA, DPO, distillation full process |
31 | | -- **Education-friendly**: Clean code, suitable for learning LLM principles |
32 | | -- **Ecosystem compatible**: Supports `transformers`, `llama.cpp`, `vllm`, `ollama` and other mainstream frameworks |
33 | | - |
34 | | -## 📊 Model List |
35 | | - |
36 | | -| Model (Size) | Inference Memory (Approx.) | Release | |
37 | | -|------------|----------|---------| |
38 | | -| MiniMind2-small (26M) | 0.5 GB | 2025.04.26 | |
39 | | -| MiniMind2-MoE (145M) | 1.0 GB | 2025.04.26 | |
40 | | -| MiniMind2 (104M) | 1.0 GB | 2025.04.26 | |
| 13 | +- Complete implementation covering: |
| 14 | + - **Tokenizer training** with custom vocabulary |
| 15 | + - **Pretraining** (knowledge learning) |
| 16 | + - **Supervised Fine-Tuning (SFT)** (conversation patterns) |
| 17 | + - **LoRA fine-tuning** (parameter-efficient adaptation) |
| 18 | + - **Direct Preference Optimization (DPO)** (human preference alignment) |
| 19 | + - **RLAIF algorithms** (PPO/GRPO/SPO - reinforcement learning) |
| 20 | + - **Knowledge distillation** (compress large model knowledge) |
| 21 | + - **Model reasoning distillation** (DeepSeek-R1 style) |
| 22 | + - **YaRN algorithm** (context length extrapolation) |
| 23 | +- **Pure PyTorch implementation**: All core algorithms are implemented from scratch using native PyTorch, without relying on third-party abstract interfaces |
| 24 | +- **Educational value**: This is not only a full-stage open-source reproduction of large language models, but also a comprehensive tutorial for getting started with LLMs |
| 25 | +- **Extended capabilities**: MiniMind now supports [MiniMind-V](https://github.com/jingyaogong/minimind-v) for vision multimodal tasks |
| 26 | + |
| 27 | +!!! note "Training Cost & Time" |
| 28 | + "2 hours" is based on **NVIDIA 3090** hardware (single card) testing |
| 29 | + |
| 30 | + "$3" refers to GPU server rental cost |
| 31 | + |
| 32 | + With 8× RTX 4090 GPUs, training time can be compressed to **under 10 minutes** |
| 33 | + |
| 34 | +## ✨ Key Highlights |
| 35 | + |
| 36 | +- **Ultra-low cost**: Single 3090, 2 hours, $3 to train a fully functional ChatBot from scratch |
| 37 | +- **Complete pipeline**: Tokenizer → Pretraining → SFT → LoRA → DPO/RLAIF → Distillation → Reasoning |
| 38 | +- **Latest algorithms**: Implements cutting-edge techniques including GRPO, SPO, and YaRN |
| 39 | +- **Education-friendly**: Clean, well-documented code suitable for learning LLM principles |
| 40 | +- **Ecosystem compatible**: Seamless support for `transformers`, `trl`, `peft`, `llama.cpp`, `vllm`, `ollama`, and `Llama-Factory` |
| 41 | +- **Full capabilities**: Supports multi-GPU training (DDP/DeepSpeed), model visualization (Wandb/SwanLab), and dynamic checkpoint management |
| 42 | +- **Production-ready**: OpenAI API protocol support for easy integration with third-party UIs (FastGPT, Open-WebUI, etc.) |
| 43 | +- **Multimodal extension**: Extended to vision with [MiniMind-V](https://github.com/jingyaogong/minimind-v) |
| 44 | + |
| 45 | +## 📊 Model Series |
| 46 | + |
| 47 | +### MiniMind2 Series (Latest - 2025.04.26) |
| 48 | + |
| 49 | +| Model | Parameters | Vocabulary | Layers | Hidden Dim | Context | Inference Memory | |
| 50 | +|-------|-----------|------------|--------|-----------|---------|-----------------| |
| 51 | +| MiniMind2-small | 26M | 6,400 | 8 | 512 | 2K | ~0.5 GB | |
| 52 | +| MiniMind2-MoE | 145M | 6,400 | 8 | 640 | 2K | ~1.0 GB | |
| 53 | +| MiniMind2 | 104M | 6,400 | 16 | 768 | 2K | ~1.0 GB | |
| 54 | + |
| 55 | +### MiniMind-V1 Series (Legacy - 2024.09.01) |
| 56 | + |
| 57 | +| Model | Parameters | Vocabulary | Layers | Hidden Dim | Context | |
| 58 | +|-------|-----------|------------|--------|-----------|---------| |
| 59 | +| minimind-v1-small | 26M | 6,400 | 8 | 512 | 2K | |
| 60 | +| minimind-v1-moe | 104M | 6,400 | 8 | 512 | 2K | |
| 61 | +| minimind-v1 | 108M | 6,400 | 16 | 768 | 2K | |
| 62 | + |
| 63 | +## 📅 Latest Updates (2025-10-24) |
| 64 | + |
| 65 | +🔥 **RLAIF Training Algorithms**: Native implementation of PPO, GRPO, and SPO |
| 66 | + |
| 67 | +- **YaRN Algorithm**: RoPE length extrapolation for improved long-sequence handling |
| 68 | +- **Adaptive Thinking**: Reasoning models support optional thinking chains |
| 69 | +- **Full template support**: Tool calling and reasoning tags (`<tool_call>`, `<think>`, etc.) |
| 70 | +- **Visualization**: Switched from WandB to [SwanLab](https://swanlab.cn/) (China-friendly) |
| 71 | +- **Reasoning models**: Complete MiniMind-Reason series based on DeepSeek-R1 distillation |
| 72 | + |
| 73 | +## 🎯 Project Contents |
| 74 | + |
| 75 | +- Complete MiniMind-LLM architecture code (Dense + MoE models) |
| 76 | +- Detailed Tokenizer training code |
| 77 | +- Full training pipeline: Pretrain → SFT → LoRA → RLHF/RLAIF → Distillation |
| 78 | +- High-quality, curated and deduplicated datasets at all stages |
| 79 | +- Native PyTorch implementation of key algorithms, minimal third-party dependencies |
| 80 | +- Multi-GPU training support (single-machine multi-card DDP, DeepSpeed, distributed clusters) |
| 81 | +- Visualization with Wandb/SwanLab |
| 82 | +- Model evaluation on third-party benchmarks (C-Eval, C-MMLU, OpenBookQA) |
| 83 | +- YaRN algorithm for RoPE context length extrapolation |
| 84 | +- OpenAI API protocol server for easy integration |
| 85 | +- Streamlit web UI for chat |
| 86 | +- Full compatibility with community tools: llama.cpp, vllm, ollama, Llama-Factory |
| 87 | +- MiniMind-Reason models: Complete open-source data + weights for reasoning distillation |
41 | 88 |
|
42 | 89 | ## 🚀 Quick Navigation |
43 | 90 |
|
44 | | -- [Quick Start](quickstart.md) - Environment setup, model download, quick testing |
45 | | -- [Model Training](training.md) - Pretraining, SFT, LoRA, DPO training process |
| 91 | +- **[Quick Start](quickstart.md)** - Environment setup, model download, quick testing |
| 92 | +- **[Model Training](training.md)** - Pretraining, SFT, LoRA, RLHF, RLAIF, and reasoning training |
46 | 93 |
|
47 | | -## 🔗 Related Links |
| 94 | +## 🔗 Links & Resources |
48 | 95 |
|
| 96 | +**Project Repositories**: |
49 | 97 | - **GitHub**: [https://github.com/jingyaogong/minimind](https://github.com/jingyaogong/minimind) |
50 | 98 | - **HuggingFace**: [MiniMind Collection](https://huggingface.co/collections/jingyaogong/minimind-66caf8d999f5c7fa64f399e5) |
51 | | -- **ModelScope**: [MiniMind Models](https://www.modelscope.cn/profile/gongjy) |
52 | | -- **Online Demo**: [ModelScope Studio](https://www.modelscope.cn/studios/gongjy/MiniMind) |
| 99 | +- **ModelScope**: [MiniMind Profile](https://www.modelscope.cn/profile/gongjy) |
| 100 | + |
| 101 | +**Online Demos**: |
| 102 | +- [ModelScope Studio - Standard Chat](https://www.modelscope.cn/studios/gongjy/MiniMind) |
| 103 | +- [ModelScope Studio - Reasoning Model](https://www.modelscope.cn/studios/gongjy/MiniMind-Reasoning) |
| 104 | +- [Bilibili Video Introduction](https://www.bilibili.com/video/BV12dHPeqE72/) |
| 105 | + |
| 106 | +**Vision Extension**: |
| 107 | +- [MiniMind-V](https://github.com/jingyaogong/minimind-v) - Multimodal vision language models |
| 108 | + |
| 109 | +## 💡 Why MiniMind? |
| 110 | + |
| 111 | +The AI community is flooded with high-cost, complex frameworks that abstract away the fundamentals. MiniMind aims to democratize LLM learning by: |
| 112 | + |
| 113 | +1. **Lowering the barrier**: No need for expensive GPUs or cloud services |
| 114 | +2. **Understanding, not just using**: Learn every detail from tokenization to inference |
| 115 | +3. **End-to-end learning**: Train from scratch, not just fine-tune existing models |
| 116 | +4. **Code clarity**: Pure PyTorch implementations you can read and understand |
| 117 | +5. **Practical results**: Get a working ChatBot with minimal resources |
| 118 | + |
| 119 | +As we say: **"Building a Lego airplane is far more exciting than flying first class!"** |
| 120 | + |
| 121 | +--- |
| 122 | + |
| 123 | +Next: [Get Started →](quickstart.md) |
53 | 124 |
|
0 commit comments