Open and efficient models for agentic AI — reproducible training pipelines with fully transparent data, techniques, and weights.
// Install the Nemotron training recipes
$ git clone https://github.com/NVIDIA/nemotron
$ cd nemotron && uv sync
// Run the full Nano3 pipeline
$ uv run nemotron nano3 data prep pretrain --run YOUR-CLUSTER
$ uv run nemotron nano3 pretrain --run YOUR-CLUSTER
$ uv run nemotron nano3 data prep sft --run YOUR-CLUSTER
$ uv run nemotron nano3 sft --run YOUR-CLUSTER
$ uv run nemotron nano3 data prep rl --run YOUR-CLUSTER
$ uv run nemotron nano3 rl --run YOUR-CLUSTERNote: The
--run YOUR-CLUSTERflag submits jobs to your configured Slurm cluster via NeMo-Run. See Execution through NeMo-Run for setup instructions.
::::{grid} 1 2 2 2 :gutter: 3
:::{grid-item-card} Usage Cookbook :link: usage-cookbook/README :link-type: doc
Deployment guides for Nemotron models: TensorRT-LLM, vLLM, SGLang, NIM, and Hugging Face. :::
:::{grid-item-card} Use Case Examples :link: use-case-examples/README :link-type: doc
End-to-end applications: RAG agents, ML agents, and multi-agent systems. :::
::::
::::{grid} 1 2 2 2 :gutter: 3
:::{grid-item-card} Nemotron 3 Nano :link: train/nano3/README :link-type: doc
31.6B total / 3.6B active parameters, 25T tokens, up to 1M context. Hybrid Mamba-Transformer with sparse MoE.
Stages: Pretraining → SFT → RL :::
::::
The Nemotron training pipeline follows a four-stage approach with full artifact lineage tracking:
| Stage | Name | Description |
|---|---|---|
| 0 | Pretraining | Base model training on large text corpus |
| 1 | SFT | Supervised fine-tuning for instruction following |
| 2 | RL | Reinforcement learning for alignment |
| 3 | Evaluation | Benchmark testing with NeMo Evaluator |
| Open Models | Transparent training data, techniques, and weights for community innovation |
| Compute Efficiency | Model pruning enabling higher throughput via TensorRT-LLM |
| High Accuracy | Built on frontier open models with human-aligned reasoning |
| Flexible Deployment | Deploy anywhere — edge, single GPU, or data center with NIM |
- Complete Pipelines — From raw data to deployment-ready models
- Artifact Lineage — Full traceability via W&B from data to model
- Production-Grade — Built on NVIDIA's NeMo stack (Megatron-Bridge, NeMo-RL)
- Reproducible — Versioned configs, data blends, and checkpoints
- Tech Report — Nemotron 3 Nano methodology
- Model Weights — Pre-trained checkpoints on HuggingFace
- Pre-training Datasets — Open pre-training data
- Post-training Datasets — SFT and RL data
- Artifact Lineage — W&B integration guide
:caption: Usage Cookbook
:hidden:
usage-cookbook/README.md
usage-cookbook/Nemotron-Nano2-VL/README.md
usage-cookbook/Nemotron-Parse-v1.1/README.md
:caption: Use Case Examples
:hidden:
use-case-examples/README.md
use-case-examples/Simple Nemotron-3-Nano Usage Example/README.md
use-case-examples/Data Science ML Agent/README.md
use-case-examples/RAG Agent with Nemotron RAG Models/README.md
:caption: Training Recipes
:hidden:
train/nano3/README.md
train/artifacts.md
:caption: Nano3 Stages
:hidden:
train/nano3/pretrain.md
train/nano3/sft.md
train/nano3/rl.md
train/nano3/eval.md
train/nano3/import.md
:caption: Nemotron Kit
:hidden:
train/kit.md
train/nvidia-stack.md
train/nemo-run.md
train/omegaconf.md
train/wandb.md
train/cli.md
train/data-prep.md
train/evaluator.md