cp: docs: Perf page update for v0.6 (2346) into r0.6.0 (#2364)

svcnvidia-nemo-ci · guyueh1 · parthmannan · web-flow · commit 5fb588932bf8 · 2026-04-29T21:32:56.000Z
Signed-off-by: Guyue Huang &lt;guyueh@nvidia.com&gt;
Signed-off-by: Guyue Huang &lt;140554423+guyueh1@users.noreply.github.com&gt;
Signed-off-by: Parth Mannan &lt;pmannan@nvidia.com&gt;
Signed-off-by: NeMo Bot &lt;nemo-bot@nvidia.com&gt;
Co-authored-by: Guyue Huang &lt;140554423+guyueh1@users.noreply.github.com&gt;
Co-authored-by: Parth Mannan &lt;pmannan@nvidia.com&gt;
diff --git a/docs/about/performance-summary.md b/docs/about/performance-summary.md
@@ -43,27 +43,25 @@ The performance data includes:
 
 ---
 
-## Nemo RL v0.5
+## Nemo RL v0.6
 
 ### H100 BF16 Benchmarks
-* GRPO Dataset: [OpenMathInstruct-2](https://huggingface.co/datasets/nvidia/OpenMathInstruct-2); DAPO dataset: [DAPOMath17k](https://huggingface.co/datasets/BytedTsinghua-SIA/DAPO-Math-17k)
+* GRPO Dataset: [OpenMathInstruct-2](https://huggingface.co/datasets/nvidia/OpenMathInstruct-2); DAPO dataset: [DAPOMath17k](https://huggingface.co/datasets/BytedTsinghua-SIA/DAPO-Math-17k); SWE dataset: refer to [Nemotron super-v3 documentation - stage 2.2](https://github.com/NVIDIA-NeMo/RL/blob/super-v3/docs/guides/nemotron-3-super.md#stage-22---swe-2-64-nodes)
 * System: DGX-H100
 * Precision: Training BF16, Generation BF16
 * Training Backend: Megatron-core.
 
 | Algorithm | Model     |On/Off policy|T-Max Sequence Length|G-Average Seq len|#-GPUs|G-GBS|T-GBS|Generation [TP,PP]|Training [TP,CP,EP,PP,VPP]|Tokens / sec / GPU|Total Step time(s)|
 |---------  |-------    |--------     |-----                |-----            |------|---- |---- |----              |----                      |---               |---|
-| GRPO      |LLAMA3.1_8B|On policy    |4,096                |1,019            |16    |2,048|512  |[1,1]             |[1,1,1,1,1,2,n/a]         |1,581             | 92.8|
-| GRPO      |LLAMA3.1_8B|1-step Off   |4,096                |1,123            |16    |2,048|512  |[1,1]             |[1,1,1,1,1,1,n/a]         |2,478             | 64.8|
-| GRPO      |DeepSeek V3|On policy    |1,536                |744              |256   |512  |512  |[32,1]            |[1,1,16,16,n/a]           |12.7              | 134|
-| GRPO      |DeepSeek V3|1-step Off   |1,536                |738              |512   |512  |512  |[32,1]            |[1,1,16,16,n/a]           |13.1              | 64.9|
-| DAPO      |DeepSeek V3|On policy    |1,536                |974              |512   |512  |512  |[64,1]            |[8,4,32,8,n/a]            |2.45              | 458|
-| GRPO      |Qwen3-235B |On policy    |8,192                |5,700            |128   |512  |512  |[16,1]            |[2,2,16,8,n/a]            |54.1              | 431|
-| GRPO      |Qwen3-235B |1-step Off   |8,192                |5,707            |256   |512  |512  |[8,1]             |[4,1,16,8,n/a]            |58.7              | 203|
-| GRPO      |Qwen3-30B3A|On policy    |4,096                |3,196            |32    |2,048|512  |[2,1]             |[1,1,8,1,n/a]             |1066               | 198|
-| GRPO      |Qwen3-30B3A|1-step Off   |4,096                |3,201            |32    |2,048|512  |[2,1]             |[1,1,8,2,n/a]             |1391               | 154|
-| GRPO      |Qwen3-32B  |On policy    |4,096                |3,251            |32    |2,048|512  |[4,1]             |[4,1,1,4,n/a]             |571               | 376|
-| GRPO      |Qwen3-32B  |1-step Off   |4,096                |3,252            |64    |2,048|512  |[4,1]             |[4,1,1,4,n/a]             |538               | 200|
+| GRPO      |DeepSeek V3|On policy    |1,536                |701              |256   |512  |512  |[32,1]            |[1,1,16,16,n/a]           |12.1              | 134|
+| GRPO      |DeepSeek V3|On policy    |1,536                |697              |512   |512  |512  |[32,1]            |[1,1,16,16,n/a]           |7.24              | 111|
+| GRPO      |DeepSeek V3|1-step Off   |1,536                |710              |512   |512  |512  |[32,1]            |[1,1,16,16,n/a]           |12.8              | 64.1|
+| GRPO      |Qwen3-235B |On policy    |8,192                |5,698            |128   |512  |512  |[16,1]            |[2,2,16,8,n/a]            |58.9              | 395|
+| GRPO      |Qwen3-235B |On policy    |8,192                |5,713            |256   |512  |512  |[16,1]            |[2,2,16,8,n/a]            |37.4              | 312|
+| GRPO      |Qwen3-235B |1-step Off   |8,192                |5,721            |256   |512  |512  |[8,1]             |[4,1,16,8,n/a]            |58.7              | 231|
+| GRPO      |Qwen3-30B3A|On policy    |4,096                |3,203            |32    |2,048|512  |[2,1]             |[1,1,8,1,n/a]             |1102               | 192|
+| GRPO      |Qwen3-30B3A|1-step Off   |4,096                |3,201            |32    |2,048|512  |[2,1]             |[1,1,8,2,n/a]             |1414               | 152|
+| GRPO      |Qwen3-30B3A|8-step Off   |4,096                |3,206            |192   |2,048|512  |[2,1]             |[1,1,8,1,n/a]             |1025               | 34.5|
 
 ### H100 FP8 Benchmarks
 * GRPO Dataset: [OpenMathInstruct-2](https://huggingface.co/datasets/nvidia/OpenMathInstruct-2)
@@ -73,8 +71,7 @@ The performance data includes:
 
 | Algorithm | Model     |On/Off policy|T-Max Sequence Length|G-Average Seq len|#-GPUs|G-GBS|T-GBS|Generation [TP,PP]|Training [TP,CP,EP,PP,VPP]|Tokens / sec / GPU|Total Step time(s)|
 |---------  |-------    |--------     |-----                |-----            |------|---- |---- |----              |----                      |---               |---|
-| GRPO      |LLAMA3.1_8B|1-step Off   |4,096                |1,128            |16    |2,048|512  |[1,1]             |[1,1,1,1,1,1,n/a]         |3,052             | 53.0|
-| GRPO      |DeepSeek V3|1-step Off   |1,536                |761              |512   |512  |512  |[16,1]            |[1,1,16,16,n/a]           |14.1              | 67.6|
+| GRPO      |DeepSeek V3|1-step Off   |1,536                |721              |512   |512  |512  |[16,1]            |[1,1,16,16,n/a]           |14.1              | 59.2|
 
 ### GB200 BF16 Benchmarks
 * GRPO Dataset: [OpenMathInstruct-2](https://huggingface.co/datasets/nvidia/OpenMathInstruct-2)
@@ -84,18 +81,18 @@ The performance data includes:
 
 | Algorithm | Model     |On/Off policy|T-Max Sequence Length|G-Average Seq len|#-GPUs|G-GBS|T-GBS|Generation [TP,PP]|Training [TP,CP,EP,PP,VPP]|Tokens / sec / GPU|Total Step time(s)|
 |---------  |-------    |--------     |-----                |-----            |------|---- |---- |----              |----                      |---               |---|
-| GRPO      |LLAMA3.1_8B|On policy    |4,096                |1,066            |8     |2,048|512  |[1,1]             |[1,1,1,1,1,1,n/a]         |3,359             | 91.0|
-| GRPO      |LLAMA3.1_8B|1-step Off   |4,096                |1,107            |8     |2,048|512  |[1,1]             |[1,1,1,1,1,1,n/a]         |4,463             | 71.1|
-| GRPO      |DeepSeek V3|On policy    |1,536                |996              |128   |512  |512  |[32,1]            |[1,1,16,8,n/a]            |34.3              | 128|
-| GRPO      |DeepSeek V3|1-step Off   |1,536                |994              |256   |512  |512  |[16,1]            |[1,1,16,8,n/a]            |31.7              | 64.5|
-| GRPO      |Qwen3-235B |On policy    |8,192                |5,711            |64    |512  |512  |[8,1]            |[2,2,16,4,n/a]            |140              | 332|
-| GRPO      |Qwen3-235B |1-step Off   |8,192                |5,711            |128   |512  |512  |[8,1]             |[4,1,16,4,n/a]            |87.9              | 268|
-| GRPO      |Qwen3-30B3A|On policy    |4,096                |3,198            |16    |2,048|512  |[1,1]             |[1,1,16,1,n/a]             |1,822               | 232|
-| GRPO      |Qwen3-30B3A|1-step Off   |4,096                |3,204            |32    |2,048|512  |[1,1]             |[1,1,16,1,n/a]             |1,558               | 136|
-| GRPO      |Qwen3-32B  |On policy    |4,096                |3,253            |16    |2,048|512  |[1,1]             |[2,1,1,1,n/a]             |1,127              | 381|
-| GRPO      |Qwen3-32B  |1-step Off   |4,096                |3,258            |32    |2,048|512  |[1,1]             |[2,1,1,1,n/a]             |1,025               | 210|
+| GRPO      |DeepSeek V3|On policy    |1,536                |711              |128   |512  |512  |[32,1]            |[1,1,16,8,n/a]            |30.2              | 108|
+| GRPO      |DeepSeek V3|On policy    |1,536                |700              |256   |512  |512  |[32,1]            |[1,1,16,8,n/a]            |16.4              | 98.7|
+| GRPO      |DeepSeek V3|1-step Off   |1,536                |708              |256   |512  |512  |[16,1]            |[1,1,16,8,n/a]            |26.7              | 61.7|
+| GRPO      |Qwen3-235B |On policy    |8,192                |5,709            |64    |512  |512  |[8,1]            |[2,2,16,4,n/a]            |163              | 286|
+| GRPO      |Qwen3-235B |On policy    |8,192                |5,693            |128   |512  |512  |[8,1]            |[2,2,16,4,n/a]            |67.4              | 345|
+| GRPO      |Qwen3-235B |1-step Off   |8,192                |5,705            |128   |512  |512  |[8,1]             |[4,1,16,4,n/a]            |85.5              | 278|
+| GRPO      |Qwen3-30B3A|On policy    |4,096                |3,199            |16    |2,048|512  |[1,1]             |[1,1,16,1,n/a]             |1,910               | 221|
+| GRPO      |Qwen3-30B3A|1-step Off   |4,096                |3,197            |16    |2,048|512  |[1,1]             |[1,1,16,1,n/a]             |1,406               | 301|
+| SWE       |Nemotron-3-Nano-30B-A3B|1-step Off   |131,072  |31,599           |128   |512  |512  |[8,1]             |[8,8,8,1,n/a]             |37.5               | 430|
 
 Note:
 
 * All Mixture-of-expert (MoE) model training uses token drop-less. 
 * The following metrics are extracted from the average of 5 steps: G-Average Seq len, Tokens/sec/gpu, Total Step time(s). Because of the averaging, the numbers in the table do not completely match the equation stated in Performance Metrics above but the difference is small.
+* There was a change in pretrained checkpoint (see [docs/guides/deepseek.md](https://github.com/NVIDIA-NeMo/RL/blob/r0.6.0/docs/guides/deepseek.md)) for DeepSeek V3 leading to lower Average Seq len. The reported throughput is not comparable across versions. Please use equivalent checkpoints for comparison. For example, using the newer checkpoint `DeepSeek V3 on-policy GRPO #-GPUs: 128` v0.5.0 performs at `26.1 Tokens / sec / GPU` compared to v0.6.0 at `30.2 Tokens / sec / GPU`.