Releases · NVIDIA-NeMo/Automodel · GitHub

23 Oct 19:24

chtruong814

NVIDIA NeMo-Automodel 0.1.2 Latest

Latest

Features:
- Included support for limiting the number of samples with the ColumnMappedDataset
Bug Fixes (step scheduler):
- Switched to zero-based indexing
- Epoch length accounts for accumulation steps

Assets 2

08 Oct 14:18

chtruong814

NVIDIA NeMo-Automodel 0.1.0

New Features

Pretraining support for
- Models under 40B with PyT FSDP2
- Larger models by applying PyT PP
- TP can also be used for models with a TP plan
- Large MOE via custom implementations
Knowledge distillation for LLMs (requires same tokenizer)
FP8 with torchao (requires torch.compile)
Parallelism
- HSDP with FSDP2
- Auto Pipelining Support
Checkpointing
- Pipeline support (load and save)
- Parallel load with meta device
Data
- ColumnMapped Dataset for single-turn SFT
- Pretrain Data: Megatron-Core and Nano-gpt compatible data
Performance https://docs.nvidia.com/nemo/automodel/latest/performance-summary.html
- Pretraining benchmark for Large MoE user-defined models
- Fast DeepSeek v3 implementation with DeepEP

Megatron FSDP support
Packed sequence support
Triton kernels for LoRA

Assets 2

17 Sep 13:59

chtruong814

NVIDIA NeMo-Automodel 0.1.0rc0 Pre-release

Pre-release

Prerelease: NVIDIA NeMo-Automodel 0.1.0rc0 (2025-09-17)

Assets 2