Release Release v0.6.0 · NVIDIA-NeMo/RL

📝 Blog

NeMo RL: Run High throughput Reinforcement Learning with End to End FP8 Precision

✨ Highlights

Container

Both linux/amd64 and linux/arm64 Docker containers are available on NGC as nvcr.io/nvidia/nemo-rl:v0.6.0.

Here are the major software components included in the container:

Software Component	Version
NeMo-RL	0.6.0
NeMo-Gym	0.3.0rc0+1a4912e
NeMo-Automodel	0.3.0rc0+92635e7
Megatron-Bridge	0.5.0+95e5f38
Megatron-Core	0.18.0+d30c3ae
Pytorch	2.10.0
vllm	0.17.1

The NeMo-RL container is built on top of the nvcr.io/nvidia/cuda-dl-base:25.05-cuda12.9-devel-ubuntu24.04

If you would like to build this container, or nightly containers, yourself, we provide the exact instructions we use at https://docs.nvidia.com/nemo/rl/latest/docker.html#release-image.

LoRA for GRPO and DPO

Building on the LoRA SFT support introduced in v0.5, NeMo RL v0.6 extends LoRA (Low-Rank Adaptation) to GRPO and DPO workflows. This enables parameter-efficient reinforcement learning and preference optimization with minimal modifications to existing recipes. LoRA for GRPO and DPO is supported with both the Megatron backend and the DTensor V2 (Automodel) backend.

Megatron LoRA GRPO:

policy:
  megatron_cfg:
    enabled: true
    peft:
      enabled: true
      dim: 128
      alpha: 512
      exclude_modules: ['*out_proj*']

DTensor V2 LoRA GRPO:

policy:
  dtensor_cfg:
    lora_cfg:
      enabled: true
      dim: 128
      alpha: 512
      exclude_modules: ['*out_proj*']
      match_all_linear: false
      use_triton: false

Example recipes:

GRPO LoRA (Megatron): grpo-nanov3-30BA3B-2n8g-megatron-lora.yaml
GRPO LoRA (DTensor): grpo-nanov3-30BA3B-2n8g-fsdp2-lora.yaml

GDPO: Multi-Reward RL Training

NeMo RL v0.6 introduces GDPO (Group reward-Decoupled Normalization Policy Optimization), a reinforcement learning method designed for multi-reward training. While existing approaches commonly apply GRPO in multi-reward settings, they can lead to reward advantage collapse, reducing training signal resolution and causing unstable or failed convergence. GDPO resolves this by decoupling reward normalization across individual rewards, preserving their relative differences and enabling more faithful preference optimization.

To enable GDPO:

grpo:
  adv_estimator:
    name: "gdpo"
    normalize_rewards: true
    use_leave_one_out_baseline: false

Note that this method only has an effect when training involves more than one reward function. GDPO also supports async RL mode. See the GRPO guide for details.

ProRLv2

NeMo RL v0.6 adds the ProRLv2 configuration pattern (blog), which bundles GRPO with a set of stability and efficiency techniques commonly used for long-horizon RL fine-tuning:

DAPO dynamic sampling: skip prompt-groups with zero reward variance
Decoupled (asymmetric) clipping: different lower/upper clip bounds for better exploration
Token-level policy gradient loss
Importance sampling correction: ICE-POP / seq-mask-tis for backend-mismatch filtering
Reinforce++-Baseline: decoupled local/global advantage normalization
"Stop properly" penalty for truncated responses

uv run examples/run_grpo_math.py --config examples/configs/prorlv2.yaml

For the full walkthrough, see the ProRLv2 guide.

Speculative Decoding

NeMo RL now supports speculative decoding for rollout acceleration, including methods such as external draft models, Eagle3, and MTP. A smaller draft model runs in vLLM and proposes tokens that the policy model verifies, speeding up generation. Two modes are available:

Offline: a fixed draft model is used only for faster generation; the RL loop does not update it.
Online: NeMo RL currently supports online draft model training only for Eagle3. It attaches an Eagle3 draft model to the Megatron policy worker, trains it alongside the policy, and refits both policy and draft weights into vLLM — keeping the drafter aligned with RL updates.

Generation-only example:

policy:
  generation:
    backend: "vllm"
    vllm_kwargs:
      speculative_config:
        method: "eagle3"
        model: /path/to/eagle3-draft
        num_speculative_tokens: 3

Online draft training example:

policy:
  megatron_cfg:
    enabled: true
  draft:
    enabled: true
    model_name: ${policy.generation.vllm_kwargs.speculative_config.model}
    loss_weight: 1.0
  generation:
    backend: "vllm"
    vllm_kwargs:
      speculative_config:
        method: "eagle3"
        model: /path/to/eagle3-draft
        num_speculative_tokens: 3
        draft_tensor_parallel_size: 1

Example recipe: examples/configs/recipes/llm/grpo-qwen3-1.7b-1n8g-megatron-eagle3.yaml. For the full guide, see the Eagle3 Speculative Decoding documentation.

SGLang Inference Backend

NeMo RL now supports SGLang as a generation backend alongside vLLM and Megatron inference. SGLang can be used for GRPO rollouts with a simple config change:

policy:
  generation:
    backend: "sglang"
    sglang_cfg:
      model_path: ${policy.model_name}
      gpus_per_server: 1
      dtype: ${policy.precision}
      context_length: 512
      mem_fraction_static: 0.7

SGLang is currently supported with the DTensor V2 (Automodel) policy backend only. We are actively working with the SGLang team on improving this integration and adding support for the Megatron backend.

Example recipes: grpo-qwen3-0.6b-1n8g-sglang.yaml, grpo-qwen2.5-math-1.5b-instruct-1n8g-fsdp2tp1-sglang.yaml.

Muon Optimizer

NeMo RL now supports the Muon (MomentUm Orthogonalized by Newton-schulz) optimizer for SFT and RL training. Muon achieves higher sample efficiency compared to AdamW by applying Newton-Schulz orthogonalization to momentum-based updates. Muon is supported with the Megatron backend.

policy:
  megatron_cfg:
    enabled: true
    optimizer:
      optimizer: "dist_muon"
      muon_momentum: 0.95
      muon_scale_mode: "spectral"
      muon_num_ns_steps: 5
      use_distributed_optimizer: false
      use_precision_aware_optimizer: false

For the full guide, see the Muon Optimizer documentation.

YaRN Long-Context Training

YaRN (Yet another RoPE extensioN) extends a model's usable context window beyond the length it was pretrained on by rescaling RoPE frequencies. NeMo RL supports YaRN RoPE scaling for SFT, GRPO, DPO, RM, and distillation workflows via the Megatron backend.

policy:
  max_total_sequence_length: 65536
  megatron_cfg:
    enabled: true
  hf_config_overrides:
    rope_scaling:
      rope_type: yarn
      rope_theta: 1000000
      factor: ${div:${policy.max_total_sequence_length},${policy.hf_config_overrides.rope_scaling.original_max_position_embeddings}}
      original_max_position_embeddings: 40960
      truncate: true
      beta_fast: 32
      beta_slow: 1
      mscale: 1
      mscale_all_dim: 0

Example recipes: grpo-qwen2.5-1.5B-4n8g-megatron-yarn-256k.yaml, sft-qwen3-0.6B-1n8g-megatron-yarn-64k.yaml. For the full guide, see the YaRN documentation.

Chunked Linear Cross-Entropy Fusion Loss

A memory-efficient cross-entropy loss that computes the loss directly from hidden states by chunking the sequence dimension, projecting each chunk to logits on the fly, computing per-token log probabilities, and discarding logits before moving to the next chunk. This extends the maximum trainable sequence length significantly (e.g. from <65K to >100K tokens) and produces numerically equivalent loss values.

Now supported for both SFT and DPO workflows with the Megatron backend:

policy:
  megatron_cfg:
    use_linear_ce_fusion_loss: true
    linear_ce_fusion_chunk_size: 256

Example recipes: sft-qwen2.5-math7b-1n8g-megatron_chunked_linear_ce_loss.yaml, dpo-qwen2.5-math7b-1n8g-megatron_chunked_linear_ce_loss.yaml.

Model Support

Nemotron

Nemotron Nano v3 is now supported on main. See this guide for reproducible instructions on how to post-train the Nemotron 3 Nano model with NeMo RL.
Nemotron Super v3 is supported on the super-v3 branch. See the Nemotron 3 Super guide for details.

Qwen3.5 and GLM-4.7-Flash

NeMo RL adds GRPO training support for Qwen3.5 dense and MoE models (both LLM and VLM), and GLM-4.7-Flash. Example recipes:

grpo-qwen3.5-9b-1n8g-megatron.yaml (Qwen3.5 9B dense)
grpo-qwen3.5-35ba3b-2n8g-megatron-ep16.yaml (Qwen3.5 35B MoE)
grpo-qwen3.5-397ba17b-32n8g-megatron.yaml (Qwen3.5 397B MoE)
grpo-glm47-flash-4n8g-automodel.yaml (GLM-4.7-Flash)
grpo-qwen3.5-35ba3b-geo3k-2n8g-megatron-ep16.yaml (Qwen3.5 VLM)

For the full model support matrix, please refer to our model support documentation.

⚡ Performance Optimizations

Fused sequence packing for loss: A new fuse_loss option under sequence_packing config eliminates the overhead of separating packed sequences for individual loss computation (#1904).
Reduced memory footprint for ChunkedDistributedLogProb: Optimized the chunked distributed log-probability computation to reduce peak GPU memory usage (#1895).
Shard concat overhead reduction: Reduced overhead in the shard concatenation operation used during distributed training data sharding (#2002).
MoE alltoall token dispatcher default: Changed the default MoE token dispatcher type to alltoall for improved MoE model performance (#2004).

View the v0.6.0 performance numbers from our published recipes at https://docs.nvidia.com/nemo/rl/latest/about/performance-summary.html .

SWE-RL Benchmark

NeMo RL now includes a SWE RL release benchmark demonstrating a long-context, multi-step RL rollout. See the performance numbers here with the accompanying recipe and scripts in #2327. SWE support currently can be found on the super-v3 branch.

For information about replicating SWE RL on the Nemotron Super V3 model, see this guide for details.

Notable Additions

Top-p and top-k sampling in GRPO: Users can now configure top-p and top-k sampling parameters for GRPO advantage estimation, enabling more controlled sampling during training (#2053).
Configurable attention backend for Megatron: A new attention_backend config parameter for the Megatron training backend allows users to select different attention implementations (e.g. FlashAttention, TransformerEngine DotProductAttention) (#1628).
LoRA checkpoint merge and HF export: New tooling to merge LoRA adapter weights back into a base Megatron checkpoint and export as a standalone Hugging Face checkpoint, enabling deployment of LoRA-trained models without the separate adapter at inference time (#2173).
save_optimizer flag: A new save_optimizer boolean in the checkpoint config (default: true). When set to false, optimizer state is excluded from checkpoints, reducing checkpoint size and save time (#1843).
Fault tolerance launcher: NeMo RL integrates with nvidia-resiliency-ext for automatic fault tolerance and recovery for distributed training runs. Install via the nvrx optional extra and use the ft_launcher to get heartbeat monitoring, automatic restarts, and recovery from checkpoints. See the Fault Tolerance Launcher Guide.
Major dependency upgrades: Python ≥3.13.13, PyTorch 2.10.0, Ray 2.54.0, Transformers 5.3.0, vLLM 0.17.1, SGLang 0.5.10. These enable compatibility with the latest ecosystem and unlock new features across all backends.
System prompt support in math data processor: Added system_prompt
support to math_hf_data_processor (#2216).

Notable Fixes

Fixed a checkpoint loading bug in Megatron LoRA GRPO (#2075).
Fixed FP8 _apply_state_dict_to_model for correct checkpoint restoration (#2233).
Fixed use_linear_ce_fusion_loss when used with certain configurations (#2232).
Fixed GPT-OSS export and bumped Megatron-Bridge for compatibility (#2257).
Fixed Gemma3 model support (#2185).
Fixed make_sequence_length_divisible_by in config (#2135).
Fixed async GRPO offload (#2119).
Fixed Megatron checkpoint loading without optimizer and improved warning detection (#2159).
Allowed wandb config value changes on resume (#2137).
Addressed security vulnerabilities and CVEs (#2236, #2214, #2201).

📊 Release Runs

We have provided Tensorboard logs to release runs to give you a head start on what to expect from our recipes.

To view these Tensorboard logs easily, we've provided a Google Collab to download and serve the Tensorboard logs.

What's Changed

fix: Handle disabled validation in SFT training by @sahgerlad in #1611
fix: Fix crash when using cp in dtensor path by @yfw in #1663
fix: Fix Fp8 sequence padding for PP>1 case by @guyueh1 in #1579
test: Perf recipe for v0.5 by @guyueh1 in #1667
fix: Fix fp8 after vllm v0.11.2 bump by @guyueh1 in #1660
fix: Fix crash when using activation_checkpointing by @yfw in #1676
feat: add dapo recipe and test by @ZhiyuLi-Nvidia in #1617
feat: DTensorPolicyV2 GPT-OSS SFT support by @adil-a in #1470
fix: grad norm calculation for dtensor v2 by @hemildesai in #1693
feat: Add Nemotron‑3 Nano 30B A3B BF16 SFT nightly tests (FSDP2, +LoRA) by @RayenTian in #1648
feat: Support prefetching of specific envs by @hemildesai in #1692
fix: Fix DTensor slice crash after PyTorch 2.9 bump by @zpqiu in #1689
fix: grad norm check for automodel gpt oss nightly by @hemildesai in #1708
fix: relax nanov3 nightly test metrics strict by @RayenTian in #1712
fix: on GB200 use single-thread checkpoint save to avoid Cpu OOM by @guyueh1 in #1703
perf: [Perf recipe] Change TP 16->32 for deepseek GB200 sync benchmark by @guyueh1 in #1715
docs: Add doc for nano-v3 by @yfw in #1694
fix: Disable cudnn sdpa backend when using activation checkpointing by @yfw in #1717
fix: log metrics that can be coerced to scalars by @terrykong in #1723
fix: use median instead of mean for logprob error for stability in nightlies by @terrykong in #1722
fix: gemma3 27b must now have skip_tokenizer_init=False in vllm by @terrykong in #1721
fix: fix several nightly tests that were flaky by @terrykong in #1724
fix: apply offloading change from v2 to v1 by @terrykong in #1726
fix: mcore generation config restored in nightly test by @terrykong in #1720
feat: Megatron SFT LoRA by @arendu in #1629
build: Update aiohttp and urlib3 by @chtruong814 in #1746
fix: patch pytorch aten.alias.default shard strategy by @RayenTian in #1728
feat: RL support for custom moe models in dtensor v2 by @hemildesai in #1695
fix: split dtensorv1 vllm dependency by @yuki-97 in #1638
build: Resolve CVEs for gnupg and aiohttp by @chtruong814 in #1755
build: Bump mamba to d68d16e and causal-conv1d to 67e0a9d by @chtruong814 in #1759
ci: Clean up disk space for lint check by @chtruong814 in #1768
docs: Adding dtensor TP debugging summary by @joyang-nv in #1767
docs: Update image syntax in dtensor TP accuracy guide for consistency by @RayenTian in #1780
fix: fix formatting for async docs by @parthchadha in #1783
ci: Add nightly and release tests for gb200 by @chtruong814 in #1788
feat: NeMo Gym refresh 20260113 by @bxyu-nvidia in #1773
perf: DeepEP interface in megatron backend by @guyueh1 in #1794
feat: refactor init of dtensor policy v2 by @hemildesai in #1709
build: Update pyasn1 to >= 0.6.2 by @chtruong814 in #1791
docs: Adding k8 guide by @vinhngx in #1764
test: Add grpo-qwen3-30ba3b-4n8g-40k config to performance test suite. by @sfawzy-nv in #1623
docs: v0.5 performance results update by @guyueh1 in #1772
docs: model support page by @terrykong in #1799
refactor: split train and val dataset in response dataset by @yuki-97 in #1649
docs: fix pytorch anchor link: PYTORCH_CUDA_ALLOC_CONF->PYTORCH_ALLOC_CONF by @terrykong in #1806
fix: log validation data by @parthchadha in #1805
feat: Add SGLang rollout backend and tests by @RolaoDenthu in #1674
refactor: reuse setup data by @yuki-97 in #1808
feat: refactor megatron init by @ashors1 in #1646
build: Bump setuptools >= 80.10.1 and wheel >= 0.46.2 by @chtruong814 in #1822
build: Bump setuptools to 80.10.2 by @chtruong814 in #1830
feat: refactor common data utilities of dtensor policy v2 by @hemildesai in #1710
feat: add FT launcher config and resiliency dependency [1/4] by @yashaswikarnati in #1824
fix: move ft_config.yaml outside examples/configs by @yashaswikarnati in #1839
docs: Add notes for FP8 recipe in docs/fp8.md by @guyueh1 in #1829
feat: Timer for the data sharding and job submission by @guyueh1 in #1802
feat: Allow loading of more general data types by @nathan-az in #1834
chore: add assert for dtensor v2 cpu offload by @yuki-97 in #1817
build: Bump protobuf to 6.33.5 and python-multipart to 0.0.22 by @chtruong814 in #1850
feat: refactor megatron data utils by @ashors1 in #1651
feat: support stateless group and decouple vLLM in train backend by @shuyixiong in #1842
docs: update readme post 0.5 by @euronymous-aithal in #1856
docs: fix readme post 0.5 by @euronymous-aithal in #1858
feat: Support lora in dtensor grpo workflow by merging weight by @RayenTian in #1797
chore: add nanov3 lora sft recipe to doc by @RayenTian in #1860
ci: Allow repo to self publish docs by @chtruong814 in #1821
fix: fix statistic of probs_ratio_clamped_min/max by @yuki-97 in #1818
feat: support multiple datasets for response dataset by @yuki-97 in #1691
refactor: unify entrypoint for different envs by @yuki-97 in #1841
feat: add lora config for dpo dtensor backend by @RayenTian in #1826
fix: add log_plot to the logger interface by @terrykong in #1862
refactor: split train and val dataset in preference dataset by @yuki-97 in #1763
chore: add assert for tp4 batch variant accuracy issue by @yuki-97 in #1861
fix: prevent crash in rollout metric calculation when just 1 value by @terrykong in #1864
feat: add val_at_end for all algorithms by @terrykong in #1863
ci: Add secrets detector by @chtruong814 in #1854
feat: Add bisecting tooling for nightly test regressions by @terrykong in #1223
docs: add release runs to front page readme for 0.5 by @terrykong in #1879
fix: Remove redundant nested loop in move_model by @nathan-az in #1880
docs: Fix a step time number for deepseek by @guyueh1 in #1890
feat: refactor train utilities for dtensor policy v2 by @hemildesai in #1757
feat: add speculative decoding during post-training by @isomap in #1785
feat: Add Nemotron‑3 Nano 30B A3B GRPO nightly tests (FSDP2, +LoRA) by @RayenTian in #1866
ci: Fix docs publishing by @chtruong814 in #1898
feat: Implement ProRLv2 recipe by @hijkzzz in #1809
feat: add way of excluding generation backends and disable sglang tests in CI by @terrykong in #1855
feat: Update mlflow to work better with env vars, manual run id, fix tests by @nathan-az in #1874
feat: unify nemogym dataset by @yuki-97 in #1807
feat: improve dataset by @yuki-97 in #1893
fix: fix enable_seq_packing and apply_temperature_scaling in DTensor v2 by @yuki-97 in #1900
chore: Centralize OmegaConf resolver registration by @RayenTian in #1882
fix: Fix DCP-to-HF conversion for model-wrapped checkpoints by @RayenTian in #1881
fix: add missing functional test by @yuki-97 in #1883
fix: fix and re-enable rm env functional test by @RayenTian in #1905
feat: start nemo gym and other environments with cached venvs by @terrykong in #1927
fix: Mxfp8 training fix sequence padding by @guyueh1 in #1884
fix: use seq_length instead of padded_seq_length for topk output padding by @zpqiu in #1929
fix: Update sglang source by @RolaoDenthu in #1926
chore: bump mcore and mbridge by @yfw in #1902
feat: refactor mcore train/forward utilities by @ashors1 in #1654
docs: Document Gym + RL integration design by @ananthsub in #1762
feat: retry rollout if generation_logprobs contains NaN by @guyueh1 in #1885
feat: Support build custom flashinfer by @guyueh1 in #1886
fix: async llm engine didnt have get_metrics() by @terrykong in #1943
feat: Mask sequences with high logprob error by @yfw in #1838
feat: ProRLv2 - add seq-mask-tis truncated importance sampling type by @hijkzzz in #1899
ci: Update release-docs workflow to use FW-CI-templates v0.72.0 by @chtruong814 in #1965
fix: speedup minimize and minimize-check in config_cli by @hemildesai in #1964
docs: update features.md to reflect v0.5 release and v0.6 roadmap by @seonjinn in #1966
fix: add mask seq with high logp err to nemo gym config by @cmunley1 in #1980
chore: upgrade wandb to 0.25+ by @Kipok in #1979
feat: Remove do_not_average_loss by @yfw in #1988
chore: remove .swp config file by @zhongbozhu in #1998
fix: Fix adv estimator configs by @yfw in #1994
feat: Nano v3 RL Recipe by @yfw in #1989
build: Add vllm arm precompiled wheel env variable by @ananthsub in #1970
build: Update dockerfile to support Nsight install on arm platforms by @ananthsub in #1939
chore: Switch to mcore upstream main by @ahmadki in #1990
fix: Re-enable tests/functional/test_converters.sh functional test by @RayenTian in #2005
ci: Enable nightly docs update by @chtruong814 in #2021
feat: Omni dataloader for HF models by @yuanhangsu1986 in #2016
feat: support multiple dataloader for grpo by @yuki-97 in #1698
build: Do not install decord on arm by @chtruong814 in #2034
ci: add a fast test suite by @terrykong in #2031
test: fix bug in deselection and make fast tests even faster by @terrykong in #2038
test: add a diagnostic script for prefix caching naning by @terrykong in #1987
feat: async grpo + nemo gym by @terrykong in #1985
refactor: refactor loss function by @yuki-97 in #1920
fix: remove label name from CI concurrency group by @terrykong in #2044
feat: Megatron LoRA GRPO w/ Weight Merging by @vadam5 in #1889
fix: device mismatch when DPO validation at start with CPU offload(Nemotron) by @RayenTian in #1930
build: Replace decord with decord2 by @chtruong814 in #2040
ci: Allow cancelling of unit tests by @chtruong814 in #2045
ci: Fix copy-pr-bot config by @chtruong814 in #2067
perf: Update moe_token_dispatcher_type default to alltoall by @parthmannan in #2004
fix: checkpoint loading bug in Megatron LoRA GRPO by @vadam5 in #2075
ci: Enable GB200 runners by @chtruong814 in #2017
perf: Reduce memory footprint for ChunkedDistribuedLogProb by @nujoug in #1895
ci: Switch to merge-commit CI by @ko3n1g in #2077
docs: Add news item for Nemotron 3 Super by @yfw in #2099
feat: support top-p top-k in grpo by @yuki-97 in #2053
ci: skip container build for CI:docs level by @terrykong in #2106
feat: improve research template by @yuki-97 in #2094
feat: support GDPO (New) by @nbasyl in #2069
ci: Ensure fast functional tests are ran if test level is Lfast by @chtruong814 in #2108
test: add megatron bump suite by @terrykong in #2068
docs(optimizer): Add Muon post-training support by @ashors1 in #1848
fix: fix sft-openmathinstruct2 by @yuki-97 in #2120
fix: fix async grpo offload by @yuki-97 in #2119
perf: Fuse sequence packing for loss function by @nujoug in #1904
chore: add entity to wandb config by @ananthsub in #2113
feat: support async GDPO by @nbasyl in #2118
ci: Enable claude review by @thomasdhc in #2121
ci: Fix sso user check by @chtruong814 in #2126
feat: Add chunked linear ce loss function from hidden states by @pengdurice in #2036
feat: add dpo lora megatron functional test by @RayenTian in #2125
docs: Add note about vllm bug priot to 0.17.0 by @yfw in #2128
ci: Fix nightly tests by @kajalj22 in #2109
chore: test FW-CI-templates ko3n1g/fix/linkcheck-retry-backoff by @ko3n1g in #2131
fix: fix make_sequence_length_divisible_by in config by @yuki-97 in #2135
feat: Added save_optimizer flag to control if saving optimizer in checkpointing by @odedovadia in #1843
chore: update all to transformers v5 (+torch 2.10, ray 2.54, vllm/sglang tot) by @hemildesai in #1962
docs: Update nano docs to point to main by @yfw in #2147
ci: Build RL main on Azure by @chtruong814 in #2145
Revert "ci: Build RL main on Azure (#2145)" by @chtruong814 in #2155
fix: add is_async to CheckpointingConfig TypedDict by @dmvevents in #1991
chore: make vlm config inherit from base config by @yuki-97 in #2154
perf: shard concat overhead by @pjo256 in #2002
feat: Add linear CE loss fusion for DPO by @pengdurice in #2139
fix: Add debug parameter to reduce verbose output by @sahgerlad in #1664
fix: fix doc test by @yuki-97 in #2160
chore: remove CodeRabbit configuration by @terrykong in #2161
ci: remove automodel integration file consistency check by @terrykong in #2162
fix: fix megatron load w/o optimizer and fix warning detect by @yuki-97 in #2159
docs: update README by @RayenTian in #2167
ci: run doc tests for Lfast label by @terrykong in #2164
feat: add Claude Code skills, CLAUDE.md, and interactive PR review by @terrykong in #2169
ci: upgrade GitHub Actions for Node.js 24 compatibility by @ko3n1g in #2138
ci: add broken links false positives for flaky nvidia docs URL by @terrykong in #2179
chore: bumpup Megatron-Bridge submodule to main by @ZhiyuLi-Nvidia in #2039
docs: fix SFT tool calling format to match OpenAI spec by @terrykong in #2168
feat: Add attention_backend config support for Megatron policy by @sahgerlad in #1628
chore: add tracking_uri configuration for MLflow in multiple YAML files by @RayenTian in #2170
chore: bump _code_freeze workflow to v0.86.0 by @ko3n1g in #2181
fix: align sequence_length_pad_multiple in lm_policy by @yuki-97 in #2182
fix: fix gb200 nightly by @yuki-97 in #2183
fix: allow wandb config value changes on resume by @gkaplun-nvidia in #2137
feat: Add Eagle3 online speculative decoding support by @isomap in #2078
fix: add Gym submodule to git safe.directory in CI by @kajalj22 in #2186
fix: fix dsv3 by disable mtp by @yuki-97 in #2191
chore: bump Gym submodule to latest main by @terrykong in #2195
fix: pin torchaudio to pytorch-cu129 index by @terrykong in #2198
chore: bump vllm 0.17.0 -> 0.17.1 by @terrykong in #2196
chore: point dsv3 model_name to convert doc by @yuki-97 in #2204
fix: revert logprob_batch_size to keep same perf as before by @yuki-97 in #2192
chore: update versions to address CVEs by @kajalj22 in #2201
feat: Add YaRN rope scaling support on Magatron-Bridge by @RayenTian in #2188
feat: support override HF model name in convert_megatron_to_hf by @dhineshkumar-r in #2202
fix: fix gemma3 by @yuki-97 in #2185
chore: loosen gen_kl_error to avoid flaky fail in CI by @yuki-97 in #2215
chore: fix CVEs for v0.6 by @kajalj22 in #2214
fix: add system_prompt support to math_hf_data_processor by @rishy2 in #2216
feat: add Qwen3.5 & GLM-4.7-Flash model support by @zpqiu in #2151
feat: Merge megatron checkpoints with lora adapters and convert to HF format by @pengdurice in #2173
fix: add nvidia-resiliency-ext to default dependencies by @terrykong in #2228
fix: fix eagle3 nightly by @yuki-97 in #2231
fix: fix use_linear_ce_fusion_loss by @yuki-97 in #2232
fix: fix fp8 _apply_state_dict_to_model by @yuki-97 in #2233
chore: upgrade Python from 3.12 to 3.13 by @kajalj22 in #2220
fix: address security vulnerabilities by @kajalj22 in #2236
fix: fix qwen3.5 nightly by @yuki-97 in #2241
chore: bump Megatron-Bridge to latest main (7110a96) by @yuki-97 in #2223
chore: clean up nemo.tron by @yuki-97 in #2240
fix: fix h100 release/performance by @yuki-97 in #2184
fix: fix gb200 release/performance by @yuki-97 in #2189
docs: update ProRL v2 guide and config by @hijkzzz in #2237
chore: upgrade Python 3.13.13, sglang 0.5.10, mlflow, pytest, flash-infer by @kajalj22 in #2243
chore: update version and references for r0.6.0 release (#2261) by @kajalj22 in #2262
cp: fix: fix gpt oss export + bump mbridge (2249) into r0.6.0 by @svcnvidia-nemo-ci in #2257
build: drop rc0 pre-release tag and add dynamic git versioning (#2235) by @kajalj22 in #2263
cp: feat: add nvidia-resiliency-ext as nvrx optional extra (2264) into r0.6.0 by @svcnvidia-nemo-ci in #2266
cp: docs: add yarn doc (2283) into r0.6.0 by @svcnvidia-nemo-ci in #2284
cp: fix: fix OOM (2285) into r0.6.0 by @svcnvidia-nemo-ci in #2286
cp: docs: fix typos and errors in training-backends.md (2287) into r0.6.0 by @svcnvidia-nemo-ci in #2288
docs: cherry-pick #2193 — fix typos, grammar, and table issues from QA by @terrykong in #2290
cp: ci: lower distillation seqpack accuracy thresh(2306) into r0.6.0 by @svcnvidia-nemo-ci in #2308
cp: fix: loosen memory threshold for sft-llama3.1-... (2310) into r0.6.0 by @svcnvidia-nemo-ci in #2311
cp: fix: install logsage transitive deps for ft_launcher (2304) into r0.6.0 by @svcnvidia-nemo-ci in #2307
cp: docs: Fixing docs for Muon optimizer (2301) into r0.6.0 by @svcnvidia-nemo-ci in #2313
cp: perf: Perf test scripts update for v0.6 (2300) into r0.6.0 by @svcnvidia-nemo-ci in #2317
fix: Fix perf regression by @guyueh1 in #2328
perf: Improve deepseek benchmark perf by @guyueh1 in #2333
test: Fix oom for grpo-dapomath17k-dsv3-32n4g-megatron by @guyueh1 in #2342
cp: docs: Perf page update for v0.6 (2346) into r0.6.0 by @svcnvidia-nemo-ci in #2364

New Contributors

@adil-a made their first contribution in #1470
@arendu made their first contribution in #1629
@vinhngx made their first contribution in #1764
@sfawzy-nv made their first contribution in #1623
@RolaoDenthu made their first contribution in #1674
@yashaswikarnati made their first contribution in #1824
@shuyixiong made their first contribution in #1842
@isomap made their first contribution in #1785
@hijkzzz made their first contribution in #1809
@cmunley1 made their first contribution in #1980
@Kipok made their first contribution in #1979
@zhongbozhu made their first contribution in #1998
@yuanhangsu1986 made their first contribution in #2016
@vadam5 made their first contribution in #1889
@parthmannan made their first contribution in #2004
@nujoug made their first contribution in #1895
@nbasyl made their first contribution in #2069
@thomasdhc made their first contribution in #2121
@pengdurice made their first contribution in #2036
@odedovadia made their first contribution in #1843
@dmvevents made their first contribution in #1991
@pjo256 made their first contribution in #2002
@gkaplun-nvidia made their first contribution in #2137
@dhineshkumar-r made their first contribution in #2202
@rishy2 made their first contribution in #2216
@svcnvidia-nemo-ci made their first contribution in #2257

Full Changelog: v0.5.0...v0.6.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release v0.6.0

Choose a tag to compare

Sorry, something went wrong.