📝 Blog
NeMo RL: Run High throughput Reinforcement Learning with End to End FP8 Precision
✨ Highlights
Container
Both linux/amd64 and linux/arm64 Docker containers are available on NGC as nvcr.io/nvidia/nemo-rl:v0.6.0.
Here are the major software components included in the container:
| Software Component | Version |
|---|---|
| NeMo-RL | 0.6.0 |
| NeMo-Gym | 0.3.0rc0+1a4912e |
| NeMo-Automodel | 0.3.0rc0+92635e7 |
| Megatron-Bridge | 0.5.0+95e5f38 |
| Megatron-Core | 0.18.0+d30c3ae |
| Pytorch | 2.10.0 |
| vllm | 0.17.1 |
The NeMo-RL container is built on top of the nvcr.io/nvidia/cuda-dl-base:25.05-cuda12.9-devel-ubuntu24.04
If you would like to build this container, or nightly containers, yourself, we provide the exact instructions we use at https://docs.nvidia.com/nemo/rl/latest/docker.html#release-image.
LoRA for GRPO and DPO
Building on the LoRA SFT support introduced in v0.5, NeMo RL v0.6 extends LoRA (Low-Rank Adaptation) to GRPO and DPO workflows. This enables parameter-efficient reinforcement learning and preference optimization with minimal modifications to existing recipes. LoRA for GRPO and DPO is supported with both the Megatron backend and the DTensor V2 (Automodel) backend.
Megatron LoRA GRPO:
policy:
megatron_cfg:
enabled: true
peft:
enabled: true
dim: 128
alpha: 512
exclude_modules: ['*out_proj*']DTensor V2 LoRA GRPO:
policy:
dtensor_cfg:
lora_cfg:
enabled: true
dim: 128
alpha: 512
exclude_modules: ['*out_proj*']
match_all_linear: false
use_triton: falseExample recipes:
- GRPO LoRA (Megatron): grpo-nanov3-30BA3B-2n8g-megatron-lora.yaml
- GRPO LoRA (DTensor): grpo-nanov3-30BA3B-2n8g-fsdp2-lora.yaml
GDPO: Multi-Reward RL Training
NeMo RL v0.6 introduces GDPO (Group reward-Decoupled Normalization Policy Optimization), a reinforcement learning method designed for multi-reward training. While existing approaches commonly apply GRPO in multi-reward settings, they can lead to reward advantage collapse, reducing training signal resolution and causing unstable or failed convergence. GDPO resolves this by decoupling reward normalization across individual rewards, preserving their relative differences and enabling more faithful preference optimization.
To enable GDPO:
grpo:
adv_estimator:
name: "gdpo"
normalize_rewards: true
use_leave_one_out_baseline: falseNote that this method only has an effect when training involves more than one reward function. GDPO also supports async RL mode. See the GRPO guide for details.
ProRLv2
NeMo RL v0.6 adds the ProRLv2 configuration pattern (blog), which bundles GRPO with a set of stability and efficiency techniques commonly used for long-horizon RL fine-tuning:
- DAPO dynamic sampling: skip prompt-groups with zero reward variance
- Decoupled (asymmetric) clipping: different lower/upper clip bounds for better exploration
- Token-level policy gradient loss
- Importance sampling correction: ICE-POP / seq-mask-tis for backend-mismatch filtering
- Reinforce++-Baseline: decoupled local/global advantage normalization
- "Stop properly" penalty for truncated responses
uv run examples/run_grpo_math.py --config examples/configs/prorlv2.yamlFor the full walkthrough, see the ProRLv2 guide.
Speculative Decoding
NeMo RL now supports speculative decoding for rollout acceleration, including methods such as external draft models, Eagle3, and MTP. A smaller draft model runs in vLLM and proposes tokens that the policy model verifies, speeding up generation. Two modes are available:
- Offline: a fixed draft model is used only for faster generation; the RL loop does not update it.
- Online: NeMo RL currently supports online draft model training only for Eagle3. It attaches an Eagle3 draft model to the Megatron policy worker, trains it alongside the policy, and refits both policy and draft weights into vLLM — keeping the drafter aligned with RL updates.
Generation-only example:
policy:
generation:
backend: "vllm"
vllm_kwargs:
speculative_config:
method: "eagle3"
model: /path/to/eagle3-draft
num_speculative_tokens: 3Online draft training example:
policy:
megatron_cfg:
enabled: true
draft:
enabled: true
model_name: ${policy.generation.vllm_kwargs.speculative_config.model}
loss_weight: 1.0
generation:
backend: "vllm"
vllm_kwargs:
speculative_config:
method: "eagle3"
model: /path/to/eagle3-draft
num_speculative_tokens: 3
draft_tensor_parallel_size: 1Example recipe: examples/configs/recipes/llm/grpo-qwen3-1.7b-1n8g-megatron-eagle3.yaml. For the full guide, see the Eagle3 Speculative Decoding documentation.
SGLang Inference Backend
NeMo RL now supports SGLang as a generation backend alongside vLLM and Megatron inference. SGLang can be used for GRPO rollouts with a simple config change:
policy:
generation:
backend: "sglang"
sglang_cfg:
model_path: ${policy.model_name}
gpus_per_server: 1
dtype: ${policy.precision}
context_length: 512
mem_fraction_static: 0.7SGLang is currently supported with the DTensor V2 (Automodel) policy backend only. We are actively working with the SGLang team on improving this integration and adding support for the Megatron backend.
Example recipes: grpo-qwen3-0.6b-1n8g-sglang.yaml, grpo-qwen2.5-math-1.5b-instruct-1n8g-fsdp2tp1-sglang.yaml.
Muon Optimizer
NeMo RL now supports the Muon (MomentUm Orthogonalized by Newton-schulz) optimizer for SFT and RL training. Muon achieves higher sample efficiency compared to AdamW by applying Newton-Schulz orthogonalization to momentum-based updates. Muon is supported with the Megatron backend.
policy:
megatron_cfg:
enabled: true
optimizer:
optimizer: "dist_muon"
muon_momentum: 0.95
muon_scale_mode: "spectral"
muon_num_ns_steps: 5
use_distributed_optimizer: false
use_precision_aware_optimizer: falseFor the full guide, see the Muon Optimizer documentation.
YaRN Long-Context Training
YaRN (Yet another RoPE extensioN) extends a model's usable context window beyond the length it was pretrained on by rescaling RoPE frequencies. NeMo RL supports YaRN RoPE scaling for SFT, GRPO, DPO, RM, and distillation workflows via the Megatron backend.
policy:
max_total_sequence_length: 65536
megatron_cfg:
enabled: true
hf_config_overrides:
rope_scaling:
rope_type: yarn
rope_theta: 1000000
factor: ${div:${policy.max_total_sequence_length},${policy.hf_config_overrides.rope_scaling.original_max_position_embeddings}}
original_max_position_embeddings: 40960
truncate: true
beta_fast: 32
beta_slow: 1
mscale: 1
mscale_all_dim: 0Example recipes: grpo-qwen2.5-1.5B-4n8g-megatron-yarn-256k.yaml, sft-qwen3-0.6B-1n8g-megatron-yarn-64k.yaml. For the full guide, see the YaRN documentation.
Chunked Linear Cross-Entropy Fusion Loss
A memory-efficient cross-entropy loss that computes the loss directly from hidden states by chunking the sequence dimension, projecting each chunk to logits on the fly, computing per-token log probabilities, and discarding logits before moving to the next chunk. This extends the maximum trainable sequence length significantly (e.g. from <65K to >100K tokens) and produces numerically equivalent loss values.
Now supported for both SFT and DPO workflows with the Megatron backend:
policy:
megatron_cfg:
use_linear_ce_fusion_loss: true
linear_ce_fusion_chunk_size: 256Example recipes: sft-qwen2.5-math7b-1n8g-megatron_chunked_linear_ce_loss.yaml, dpo-qwen2.5-math7b-1n8g-megatron_chunked_linear_ce_loss.yaml.
Model Support
Nemotron
- Nemotron Nano v3 is now supported on
main. See this guide for reproducible instructions on how to post-train the Nemotron 3 Nano model with NeMo RL. - Nemotron Super v3 is supported on the
super-v3branch. See the Nemotron 3 Super guide for details.
Qwen3.5 and GLM-4.7-Flash
NeMo RL adds GRPO training support for Qwen3.5 dense and MoE models (both LLM and VLM), and GLM-4.7-Flash. Example recipes:
- grpo-qwen3.5-9b-1n8g-megatron.yaml (Qwen3.5 9B dense)
- grpo-qwen3.5-35ba3b-2n8g-megatron-ep16.yaml (Qwen3.5 35B MoE)
- grpo-qwen3.5-397ba17b-32n8g-megatron.yaml (Qwen3.5 397B MoE)
- grpo-glm47-flash-4n8g-automodel.yaml (GLM-4.7-Flash)
- grpo-qwen3.5-35ba3b-geo3k-2n8g-megatron-ep16.yaml (Qwen3.5 VLM)
For the full model support matrix, please refer to our model support documentation.
⚡ Performance Optimizations
- Fused sequence packing for loss: A new
fuse_lossoption undersequence_packingconfig eliminates the overhead of separating packed sequences for individual loss computation (#1904). - Reduced memory footprint for ChunkedDistributedLogProb: Optimized the chunked distributed log-probability computation to reduce peak GPU memory usage (#1895).
- Shard concat overhead reduction: Reduced overhead in the shard concatenation operation used during distributed training data sharding (#2002).
- MoE alltoall token dispatcher default: Changed the default MoE token dispatcher type to
alltoallfor improved MoE model performance (#2004).
View the v0.6.0 performance numbers from our published recipes at https://docs.nvidia.com/nemo/rl/latest/about/performance-summary.html .
SWE-RL Benchmark
NeMo RL now includes a SWE RL release benchmark demonstrating a long-context, multi-step RL rollout. See the performance numbers here with the accompanying recipe and scripts in #2327. SWE support currently can be found on the super-v3 branch.
For information about replicating SWE RL on the Nemotron Super V3 model, see this guide for details.
Notable Additions
-
Top-p and top-k sampling in GRPO: Users can now configure top-p and top-k sampling parameters for GRPO advantage estimation, enabling more controlled sampling during training (#2053).
-
Configurable attention backend for Megatron: A new
attention_backendconfig parameter for the Megatron training backend allows users to select different attention implementations (e.g. FlashAttention, TransformerEngine DotProductAttention) (#1628). -
LoRA checkpoint merge and HF export: New tooling to merge LoRA adapter weights back into a base Megatron checkpoint and export as a standalone Hugging Face checkpoint, enabling deployment of LoRA-trained models without the separate adapter at inference time (#2173).
-
save_optimizerflag: A newsave_optimizerboolean in the checkpoint config (default:true). When set tofalse, optimizer state is excluded from checkpoints, reducing checkpoint size and save time (#1843). -
Fault tolerance launcher: NeMo RL integrates with
nvidia-resiliency-extfor automatic fault tolerance and recovery for distributed training runs. Install via thenvrxoptional extra and use theft_launcherto get heartbeat monitoring, automatic restarts, and recovery from checkpoints. See the Fault Tolerance Launcher Guide. -
Major dependency upgrades: Python ≥3.13.13, PyTorch 2.10.0, Ray 2.54.0, Transformers 5.3.0, vLLM 0.17.1, SGLang 0.5.10. These enable compatibility with the latest ecosystem and unlock new features across all backends.
-
System prompt support in math data processor: Added
system_prompt
support tomath_hf_data_processor(#2216).
Notable Fixes
- Fixed a checkpoint loading bug in Megatron LoRA GRPO (#2075).
- Fixed FP8
_apply_state_dict_to_modelfor correct checkpoint restoration (#2233). - Fixed
use_linear_ce_fusion_losswhen used with certain configurations (#2232). - Fixed GPT-OSS export and bumped Megatron-Bridge for compatibility (#2257).
- Fixed Gemma3 model support (#2185).
- Fixed
make_sequence_length_divisible_byin config (#2135). - Fixed async GRPO offload (#2119).
- Fixed Megatron checkpoint loading without optimizer and improved warning detection (#2159).
- Allowed wandb config value changes on resume (#2137).
- Addressed security vulnerabilities and CVEs (#2236, #2214, #2201).
📊 Release Runs
We have provided Tensorboard logs to release runs to give you a head start on what to expect from our recipes.
To view these Tensorboard logs easily, we've provided a Google Collab to download and serve the Tensorboard logs.
What's Changed
- fix: Handle disabled validation in SFT training by @sahgerlad in #1611
- fix: Fix crash when using cp in dtensor path by @yfw in #1663
- fix: Fix Fp8 sequence padding for PP>1 case by @guyueh1 in #1579
- test: Perf recipe for v0.5 by @guyueh1 in #1667
- fix: Fix fp8 after vllm v0.11.2 bump by @guyueh1 in #1660
- fix: Fix crash when using activation_checkpointing by @yfw in #1676
- feat: add dapo recipe and test by @ZhiyuLi-Nvidia in #1617
- feat: DTensorPolicyV2 GPT-OSS SFT support by @adil-a in #1470
- fix: grad norm calculation for dtensor v2 by @hemildesai in #1693
- feat: Add Nemotron‑3 Nano 30B A3B BF16 SFT nightly tests (FSDP2, +LoRA) by @RayenTian in #1648
- feat: Support prefetching of specific envs by @hemildesai in #1692
- fix: Fix DTensor slice crash after PyTorch 2.9 bump by @zpqiu in #1689
- fix: grad norm check for automodel gpt oss nightly by @hemildesai in #1708
- fix: relax nanov3 nightly test metrics strict by @RayenTian in #1712
- fix: on GB200 use single-thread checkpoint save to avoid Cpu OOM by @guyueh1 in #1703
- perf: [Perf recipe] Change TP 16->32 for deepseek GB200 sync benchmark by @guyueh1 in #1715
- docs: Add doc for nano-v3 by @yfw in #1694
- fix: Disable cudnn sdpa backend when using activation checkpointing by @yfw in #1717
- fix: log metrics that can be coerced to scalars by @terrykong in #1723
- fix: use median instead of mean for logprob error for stability in nightlies by @terrykong in #1722
- fix: gemma3 27b must now have skip_tokenizer_init=False in vllm by @terrykong in #1721
- fix: fix several nightly tests that were flaky by @terrykong in #1724
- fix: apply offloading change from v2 to v1 by @terrykong in #1726
- fix: mcore generation config restored in nightly test by @terrykong in #1720
- feat: Megatron SFT LoRA by @arendu in #1629
- build: Update aiohttp and urlib3 by @chtruong814 in #1746
- fix: patch pytorch aten.alias.default shard strategy by @RayenTian in #1728
- feat: RL support for custom moe models in dtensor v2 by @hemildesai in #1695
- fix: split dtensorv1 vllm dependency by @yuki-97 in #1638
- build: Resolve CVEs for gnupg and aiohttp by @chtruong814 in #1755
- build: Bump mamba to d68d16e and causal-conv1d to 67e0a9d by @chtruong814 in #1759
- ci: Clean up disk space for lint check by @chtruong814 in #1768
- docs: Adding dtensor TP debugging summary by @joyang-nv in #1767
- docs: Update image syntax in dtensor TP accuracy guide for consistency by @RayenTian in #1780
- fix: fix formatting for async docs by @parthchadha in #1783
- ci: Add nightly and release tests for gb200 by @chtruong814 in #1788
- feat: NeMo Gym refresh 20260113 by @bxyu-nvidia in #1773
- perf: DeepEP interface in megatron backend by @guyueh1 in #1794
- feat: refactor init of dtensor policy v2 by @hemildesai in #1709
- build: Update pyasn1 to >= 0.6.2 by @chtruong814 in #1791
- docs: Adding k8 guide by @vinhngx in #1764
- test: Add grpo-qwen3-30ba3b-4n8g-40k config to performance test suite. by @sfawzy-nv in #1623
- docs: v0.5 performance results update by @guyueh1 in #1772
- docs: model support page by @terrykong in #1799
- refactor: split train and val dataset in response dataset by @yuki-97 in #1649
- docs: fix pytorch anchor link: PYTORCH_CUDA_ALLOC_CONF->PYTORCH_ALLOC_CONF by @terrykong in #1806
- fix: log validation data by @parthchadha in #1805
- feat: Add SGLang rollout backend and tests by @RolaoDenthu in #1674
- refactor: reuse setup data by @yuki-97 in #1808
- feat: refactor megatron init by @ashors1 in #1646
- build: Bump setuptools >= 80.10.1 and wheel >= 0.46.2 by @chtruong814 in #1822
- build: Bump setuptools to 80.10.2 by @chtruong814 in #1830
- feat: refactor common data utilities of dtensor policy v2 by @hemildesai in #1710
- feat: add FT launcher config and resiliency dependency [1/4] by @yashaswikarnati in #1824
- fix: move ft_config.yaml outside examples/configs by @yashaswikarnati in #1839
- docs: Add notes for FP8 recipe in docs/fp8.md by @guyueh1 in #1829
- feat: Timer for the data sharding and job submission by @guyueh1 in #1802
- feat: Allow loading of more general data types by @nathan-az in #1834
- chore: add assert for dtensor v2 cpu offload by @yuki-97 in #1817
- build: Bump protobuf to 6.33.5 and python-multipart to 0.0.22 by @chtruong814 in #1850
- feat: refactor megatron data utils by @ashors1 in #1651
- feat: support stateless group and decouple vLLM in train backend by @shuyixiong in #1842
- docs: update readme post 0.5 by @euronymous-aithal in #1856
- docs: fix readme post 0.5 by @euronymous-aithal in #1858
- feat: Support lora in dtensor grpo workflow by merging weight by @RayenTian in #1797
- chore: add nanov3 lora sft recipe to doc by @RayenTian in #1860
- ci: Allow repo to self publish docs by @chtruong814 in #1821
- fix: fix statistic of probs_ratio_clamped_min/max by @yuki-97 in #1818
- feat: support multiple datasets for response dataset by @yuki-97 in #1691
- refactor: unify entrypoint for different envs by @yuki-97 in #1841
- feat: add lora config for dpo dtensor backend by @RayenTian in #1826
- fix: add log_plot to the logger interface by @terrykong in #1862
- refactor: split train and val dataset in preference dataset by @yuki-97 in #1763
- chore: add assert for tp4 batch variant accuracy issue by @yuki-97 in #1861
- fix: prevent crash in rollout metric calculation when just 1 value by @terrykong in #1864
- feat: add val_at_end for all algorithms by @terrykong in #1863
- ci: Add secrets detector by @chtruong814 in #1854
- feat: Add bisecting tooling for nightly test regressions by @terrykong in #1223
- docs: add release runs to front page readme for 0.5 by @terrykong in #1879
- fix: Remove redundant nested loop in
move_modelby @nathan-az in #1880 - docs: Fix a step time number for deepseek by @guyueh1 in #1890
- feat: refactor train utilities for dtensor policy v2 by @hemildesai in #1757
- feat: add speculative decoding during post-training by @isomap in #1785
- feat: Add Nemotron‑3 Nano 30B A3B GRPO nightly tests (FSDP2, +LoRA) by @RayenTian in #1866
- ci: Fix docs publishing by @chtruong814 in #1898
- feat: Implement ProRLv2 recipe by @hijkzzz in #1809
- feat: add way of excluding generation backends and disable sglang tests in CI by @terrykong in #1855
- feat: Update mlflow to work better with env vars, manual run id, fix tests by @nathan-az in #1874
- feat: unify nemogym dataset by @yuki-97 in #1807
- feat: improve dataset by @yuki-97 in #1893
- fix: fix enable_seq_packing and apply_temperature_scaling in DTensor v2 by @yuki-97 in #1900
- chore: Centralize OmegaConf resolver registration by @RayenTian in #1882
- fix: Fix DCP-to-HF conversion for model-wrapped checkpoints by @RayenTian in #1881
- fix: add missing functional test by @yuki-97 in #1883
- fix: fix and re-enable rm env functional test by @RayenTian in #1905
- feat: start nemo gym and other environments with cached venvs by @terrykong in #1927
- fix: Mxfp8 training fix sequence padding by @guyueh1 in #1884
- fix: use seq_length instead of padded_seq_length for topk output padding by @zpqiu in #1929
- fix: Update sglang source by @RolaoDenthu in #1926
- chore: bump mcore and mbridge by @yfw in #1902
- feat: refactor mcore train/forward utilities by @ashors1 in #1654
- docs: Document Gym + RL integration design by @ananthsub in #1762
- feat: retry rollout if generation_logprobs contains NaN by @guyueh1 in #1885
- feat: Support build custom flashinfer by @guyueh1 in #1886
- fix: async llm engine didnt have get_metrics() by @terrykong in #1943
- feat: Mask sequences with high logprob error by @yfw in #1838
- feat: ProRLv2 - add seq-mask-tis truncated importance sampling type by @hijkzzz in #1899
- ci: Update release-docs workflow to use FW-CI-templates v0.72.0 by @chtruong814 in #1965
- fix: speedup minimize and minimize-check in config_cli by @hemildesai in #1964
- docs: update features.md to reflect v0.5 release and v0.6 roadmap by @seonjinn in #1966
- fix: add mask seq with high logp err to nemo gym config by @cmunley1 in #1980
- chore: upgrade wandb to 0.25+ by @Kipok in #1979
- feat: Remove do_not_average_loss by @yfw in #1988
- chore: remove .swp config file by @zhongbozhu in #1998
- fix: Fix adv estimator configs by @yfw in #1994
- feat: Nano v3 RL Recipe by @yfw in #1989
- build: Add vllm arm precompiled wheel env variable by @ananthsub in #1970
- build: Update dockerfile to support Nsight install on arm platforms by @ananthsub in #1939
- chore: Switch to mcore upstream main by @ahmadki in #1990
- fix: Re-enable tests/functional/test_converters.sh functional test by @RayenTian in #2005
- ci: Enable nightly docs update by @chtruong814 in #2021
- feat: Omni dataloader for HF models by @yuanhangsu1986 in #2016
- feat: support multiple dataloader for grpo by @yuki-97 in #1698
- build: Do not install decord on arm by @chtruong814 in #2034
- ci: add a fast test suite by @terrykong in #2031
- test: fix bug in deselection and make fast tests even faster by @terrykong in #2038
- test: add a diagnostic script for prefix caching naning by @terrykong in #1987
- feat: async grpo + nemo gym by @terrykong in #1985
- refactor: refactor loss function by @yuki-97 in #1920
- fix: remove label name from CI concurrency group by @terrykong in #2044
- feat: Megatron LoRA GRPO w/ Weight Merging by @vadam5 in #1889
- fix: device mismatch when DPO validation at start with CPU offload(Nemotron) by @RayenTian in #1930
- build: Replace decord with decord2 by @chtruong814 in #2040
- ci: Allow cancelling of unit tests by @chtruong814 in #2045
- ci: Fix copy-pr-bot config by @chtruong814 in #2067
- perf: Update moe_token_dispatcher_type default to alltoall by @parthmannan in #2004
- fix: checkpoint loading bug in Megatron LoRA GRPO by @vadam5 in #2075
- ci: Enable GB200 runners by @chtruong814 in #2017
- perf: Reduce memory footprint for ChunkedDistribuedLogProb by @nujoug in #1895
- ci: Switch to merge-commit CI by @ko3n1g in #2077
- docs: Add news item for Nemotron 3 Super by @yfw in #2099
- feat: support top-p top-k in grpo by @yuki-97 in #2053
- ci: skip container build for CI:docs level by @terrykong in #2106
- feat: improve research template by @yuki-97 in #2094
- feat: support GDPO (New) by @nbasyl in #2069
- ci: Ensure fast functional tests are ran if test level is Lfast by @chtruong814 in #2108
- test: add megatron bump suite by @terrykong in #2068
- docs(optimizer): Add Muon post-training support by @ashors1 in #1848
- fix: fix sft-openmathinstruct2 by @yuki-97 in #2120
- fix: fix async grpo offload by @yuki-97 in #2119
- perf: Fuse sequence packing for loss function by @nujoug in #1904
- chore: add entity to wandb config by @ananthsub in #2113
- feat: support async GDPO by @nbasyl in #2118
- ci: Enable claude review by @thomasdhc in #2121
- ci: Fix sso user check by @chtruong814 in #2126
- feat: Add chunked linear ce loss function from hidden states by @pengdurice in #2036
- feat: add dpo lora megatron functional test by @RayenTian in #2125
- docs: Add note about vllm bug priot to 0.17.0 by @yfw in #2128
- ci: Fix nightly tests by @kajalj22 in #2109
- chore: test FW-CI-templates ko3n1g/fix/linkcheck-retry-backoff by @ko3n1g in #2131
- fix: fix make_sequence_length_divisible_by in config by @yuki-97 in #2135
- feat: Added save_optimizer flag to control if saving optimizer in checkpointing by @odedovadia in #1843
- chore: update all to transformers v5 (+torch 2.10, ray 2.54, vllm/sglang tot) by @hemildesai in #1962
- docs: Update nano docs to point to
mainby @yfw in #2147 - ci: Build RL main on Azure by @chtruong814 in #2145
- Revert "ci: Build RL main on Azure (#2145)" by @chtruong814 in #2155
- fix: add is_async to CheckpointingConfig TypedDict by @dmvevents in #1991
- chore: make vlm config inherit from base config by @yuki-97 in #2154
- perf: shard concat overhead by @pjo256 in #2002
- feat: Add linear CE loss fusion for DPO by @pengdurice in #2139
- fix: Add debug parameter to reduce verbose output by @sahgerlad in #1664
- fix: fix doc test by @yuki-97 in #2160
- chore: remove CodeRabbit configuration by @terrykong in #2161
- ci: remove automodel integration file consistency check by @terrykong in #2162
- fix: fix megatron load w/o optimizer and fix warning detect by @yuki-97 in #2159
- docs: update README by @RayenTian in #2167
- ci: run doc tests for Lfast label by @terrykong in #2164
- feat: add Claude Code skills, CLAUDE.md, and interactive PR review by @terrykong in #2169
- ci: upgrade GitHub Actions for Node.js 24 compatibility by @ko3n1g in #2138
- ci: add broken links false positives for flaky nvidia docs URL by @terrykong in #2179
- chore: bumpup Megatron-Bridge submodule to main by @ZhiyuLi-Nvidia in #2039
- docs: fix SFT tool calling format to match OpenAI spec by @terrykong in #2168
- feat: Add attention_backend config support for Megatron policy by @sahgerlad in #1628
- chore: add tracking_uri configuration for MLflow in multiple YAML files by @RayenTian in #2170
- chore: bump
_code_freezeworkflow tov0.86.0by @ko3n1g in #2181 - fix: align sequence_length_pad_multiple in lm_policy by @yuki-97 in #2182
- fix: fix gb200 nightly by @yuki-97 in #2183
- fix: allow wandb config value changes on resume by @gkaplun-nvidia in #2137
- feat: Add Eagle3 online speculative decoding support by @isomap in #2078
- fix: add Gym submodule to git safe.directory in CI by @kajalj22 in #2186
- fix: fix dsv3 by disable mtp by @yuki-97 in #2191
- chore: bump Gym submodule to latest main by @terrykong in #2195
- fix: pin torchaudio to pytorch-cu129 index by @terrykong in #2198
- chore: bump vllm 0.17.0 -> 0.17.1 by @terrykong in #2196
- chore: point dsv3 model_name to convert doc by @yuki-97 in #2204
- fix: revert logprob_batch_size to keep same perf as before by @yuki-97 in #2192
- chore: update versions to address CVEs by @kajalj22 in #2201
- feat: Add YaRN rope scaling support on Magatron-Bridge by @RayenTian in #2188
- feat: support override HF model name in convert_megatron_to_hf by @dhineshkumar-r in #2202
- fix: fix gemma3 by @yuki-97 in #2185
- chore: loosen gen_kl_error to avoid flaky fail in CI by @yuki-97 in #2215
- chore: fix CVEs for v0.6 by @kajalj22 in #2214
- fix: add system_prompt support to math_hf_data_processor by @rishy2 in #2216
- feat: add Qwen3.5 & GLM-4.7-Flash model support by @zpqiu in #2151
- feat: Merge megatron checkpoints with lora adapters and convert to HF format by @pengdurice in #2173
- fix: add nvidia-resiliency-ext to default dependencies by @terrykong in #2228
- fix: fix eagle3 nightly by @yuki-97 in #2231
- fix: fix use_linear_ce_fusion_loss by @yuki-97 in #2232
- fix: fix fp8 _apply_state_dict_to_model by @yuki-97 in #2233
- chore: upgrade Python from 3.12 to 3.13 by @kajalj22 in #2220
- fix: address security vulnerabilities by @kajalj22 in #2236
- fix: fix qwen3.5 nightly by @yuki-97 in #2241
- chore: bump Megatron-Bridge to latest main (7110a96) by @yuki-97 in #2223
- chore: clean up nemo.tron by @yuki-97 in #2240
- fix: fix h100 release/performance by @yuki-97 in #2184
- fix: fix gb200 release/performance by @yuki-97 in #2189
- docs: update ProRL v2 guide and config by @hijkzzz in #2237
- chore: upgrade Python 3.13.13, sglang 0.5.10, mlflow, pytest, flash-infer by @kajalj22 in #2243
- chore: update version and references for r0.6.0 release (#2261) by @kajalj22 in #2262
- cp:
fix: fix gpt oss export + bump mbridge (2249)intor0.6.0by @svcnvidia-nemo-ci in #2257 - build: drop rc0 pre-release tag and add dynamic git versioning (#2235) by @kajalj22 in #2263
- cp:
feat: add nvidia-resiliency-ext as nvrx optional extra (2264)into r0.6.0 by @svcnvidia-nemo-ci in #2266 - cp:
docs: add yarn doc (2283)intor0.6.0by @svcnvidia-nemo-ci in #2284 - cp:
fix: fix OOM (2285)intor0.6.0by @svcnvidia-nemo-ci in #2286 - cp:
docs: fix typos and errors in training-backends.md (2287)intor0.6.0by @svcnvidia-nemo-ci in #2288 - docs: cherry-pick #2193 — fix typos, grammar, and table issues from QA by @terrykong in #2290
- cp:
ci: lower distillation seqpack accuracy thresh(2306)intor0.6.0by @svcnvidia-nemo-ci in #2308 - cp:
fix: loosen memory threshold for sft-llama3.1-... (2310)intor0.6.0by @svcnvidia-nemo-ci in #2311 - cp:
fix: install logsage transitive deps for ft_launcher (2304)intor0.6.0by @svcnvidia-nemo-ci in #2307 - cp:
docs: Fixing docs for Muon optimizer (2301)intor0.6.0by @svcnvidia-nemo-ci in #2313 - cp:
perf: Perf test scripts update for v0.6 (2300)intor0.6.0by @svcnvidia-nemo-ci in #2317 - fix: Fix perf regression by @guyueh1 in #2328
- perf: Improve deepseek benchmark perf by @guyueh1 in #2333
- test: Fix oom for grpo-dapomath17k-dsv3-32n4g-megatron by @guyueh1 in #2342
- cp:
docs: Perf page update for v0.6 (2346)intor0.6.0by @svcnvidia-nemo-ci in #2364
New Contributors
- @adil-a made their first contribution in #1470
- @arendu made their first contribution in #1629
- @vinhngx made their first contribution in #1764
- @sfawzy-nv made their first contribution in #1623
- @RolaoDenthu made their first contribution in #1674
- @yashaswikarnati made their first contribution in #1824
- @shuyixiong made their first contribution in #1842
- @isomap made their first contribution in #1785
- @hijkzzz made their first contribution in #1809
- @cmunley1 made their first contribution in #1980
- @Kipok made their first contribution in #1979
- @zhongbozhu made their first contribution in #1998
- @yuanhangsu1986 made their first contribution in #2016
- @vadam5 made their first contribution in #1889
- @parthmannan made their first contribution in #2004
- @nujoug made their first contribution in #1895
- @nbasyl made their first contribution in #2069
- @thomasdhc made their first contribution in #2121
- @pengdurice made their first contribution in #2036
- @odedovadia made their first contribution in #1843
- @dmvevents made their first contribution in #1991
- @pjo256 made their first contribution in #2002
- @gkaplun-nvidia made their first contribution in #2137
- @dhineshkumar-r made their first contribution in #2202
- @rishy2 made their first contribution in #2216
- @svcnvidia-nemo-ci made their first contribution in #2257
Full Changelog: v0.5.0...v0.6.0