Update MTP training doc (#718)

guapisolo · web-flow · commit 277cb229e401 · 2025-11-13T20:23:47.000-08:00
diff --git a/docs/en/advanced/speculative-decoding.md b/docs/en/advanced/speculative-decoding.md
@@ -1,8 +1,10 @@
 # Speculative Decoding
 
-Speculative decoding is an important optimization for making faster rollout during RL training. Currently slime only supports speculative decoding without training.
+Speculative decoding is a key optimization for speeding up rollouts. Instead of having the expensive target model decode token by token during inference, a lightweight draft model first decodes ahead to produce several tokens, and then the target model verifies them in a batch.
 
-For model with MTP layer (e.g. GLM-4.6, Deepseek-V3/R1), you can run with:
+## Accelerating Inference with Speculative Decoding
+
+For models with MTP layers (e.g., GLM-4.6, DeepSeek-V3/R1), simply add:
 
 ```bash
 --sglang-speculative-algorithm EAGLE
@@ -11,10 +13,23 @@ For model with MTP layer (e.g. GLM-4.6, Deepseek-V3/R1), you can run with:
 --sglang-speculative-num-draft-tokens 4
 ```
 
-And for external draft model (e.g. draft models from [SpecForge](https://docs.sglang.ai/SpecForge/)), you need also pass:
+If you want to use a separately trained draft model (e.g., one trained with [SpecForge](https://docs.sglang.ai/SpecForge/)), also set:
+
+```bash
+--sglang-speculative-draft-model-path /your/draft/model/path
+```
+
+For detailed parameter meanings and configuration, see SGLang’s speculative decoding [documentation](https://docs.sglang.ai/advanced_features/speculative_decoding.html).
+
+## Online SFT for the Draft Model
+
+As RL progresses, the sampling distributions of the draft and target models can drift apart. Fewer draft tokens pass verification, and speculative decoding can even yield negative returns.
+
+Slime currently supports online training of the MTP layers during RL, updating the draft model in sync with training to consistently improve sampling speed. See the related rationale in this [blog](https://www.notion.so/jiajunli-guapisolo/Power-Up-Speculative-Decoding-In-Reinforcement-Learning-2a92d24a293b802d9c73dbae429e581e). Use it as follows:
 
 ```bash
---speculative-draft-model-path /your/draft/model/path
+--enable-mtp-training
+--mtp-loss-scaling-factor 0.2
 ```
 
-For details on parameter meanings and configuration, see the [SGLang speculative decoding documentation](https://docs.sglang.ai/advanced_features/speculative_decoding.html).
+Training external draft models is still a WIP.
diff --git a/docs/zh/advanced/speculative-decoding.md b/docs/zh/advanced/speculative-decoding.md
@@ -1,9 +1,10 @@
 # 投机采样
 
+投机采样是加速 rollout 的重要优化手段。推理过程中不再让昂贵的 Target Model 逐个 token 进行 decode，而是先由一个轻量级的 draft model 先进行 decode，生成多个 token 后，再由大模型进行批量验证。
 
-投机采样是加速 rollout 的重要优化手段，目前 slime 支持不通过训练更新 draft model 式的投机采样。
+## 使用投机采样加速推理
 
-对于有 MTP 层支持的模型（例如，GLM-4.6、Deepseek-V3/R1），只需要添加：
+对于有 MTP 层的模型（例如 GLM-4.6、Deepseek-V3/R1），只需要添加：
 
 ```bash
 --sglang-speculative-algorithm EAGLE
@@ -12,10 +13,23 @@
 --sglang-speculative-num-draft-tokens 4
 ```
 
-如果要使用单独训练的 draft model（例如 [SpecForge](https://docs.sglang.ai/SpecForge/) 训练的），还需要额外设置：
+如果要使用单独训练的 draft model（例如 [SpecForge](https://docs.sglang.ai/SpecForge/) 训练的），还需要额外设置：
 
 ```bash
---speculative-draft-model-path /your/draft/model/path
+--sglang-speculative-draft-model-path /your/draft/model/path
 ```
 
-详细参数含义及配置方法，请参考 SGLang 的 speculative decoding [文档](https://docs.sglang.ai/advanced_features/speculative_decoding.html)
+详细参数含义及配置方法，请参考 SGLang 的 speculative decoding [文档](https://docs.sglang.ai/advanced_features/speculative_decoding.html)
+
+## 在线 SFT draft model
+
+随着 RL 流程的进行，draft model 和 target model 的采样概率差异逐渐增大，能通过验证的 draft token 逐渐减少，spec 甚至可能造成负收益。
+
+目前，slime 支持了在 RL 流程中在线训练 MTP 层，随着训练的进行同步更新 draft model，稳定提高了采样速度，相关原理可参见 [blog](https://www.notion.so/jiajunli-guapisolo/Power-Up-Speculative-Decoding-In-Reinforcement-Learning-2a92d24a293b802d9c73dbae429e581e)。使用方法如下：
+
+```bash
+--enable-mtp-training
+--mtp-loss-scaling-factor 0.2
+```
+
+外部 draft model 的训练还在 WIP。
diff --git a/scripts/models/mimo-7B-rl.sh b/scripts/models/mimo-7B-rl.sh
@@ -15,7 +15,5 @@ MODEL_ARGS=(
     --vocab-size 151680
     --untie-embeddings-and-output-weights
     --max-position-embeddings 32768
-    # Notice: Currently, MTP + sequence packing is not supported in megatron yet.
     --mtp-num-layers 1
-    --mtp-loss-scaling-factor 0.1
-)
+)