Skip to content

Commit 277cb22

Browse files
authored
Update MTP training doc (#718)
1 parent ae495a8 commit 277cb22

File tree

3 files changed

+40
-13
lines changed

3 files changed

+40
-13
lines changed
Lines changed: 20 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,10 @@
11
# Speculative Decoding
22

3-
Speculative decoding is an important optimization for making faster rollout during RL training. Currently slime only supports speculative decoding without training.
3+
Speculative decoding is a key optimization for speeding up rollouts. Instead of having the expensive target model decode token by token during inference, a lightweight draft model first decodes ahead to produce several tokens, and then the target model verifies them in a batch.
44

5-
For model with MTP layer (e.g. GLM-4.6, Deepseek-V3/R1), you can run with:
5+
## Accelerating Inference with Speculative Decoding
6+
7+
For models with MTP layers (e.g., GLM-4.6, DeepSeek-V3/R1), simply add:
68

79
```bash
810
--sglang-speculative-algorithm EAGLE
@@ -11,10 +13,23 @@ For model with MTP layer (e.g. GLM-4.6, Deepseek-V3/R1), you can run with:
1113
--sglang-speculative-num-draft-tokens 4
1214
```
1315

14-
And for external draft model (e.g. draft models from [SpecForge](https://docs.sglang.ai/SpecForge/)), you need also pass:
16+
If you want to use a separately trained draft model (e.g., one trained with [SpecForge](https://docs.sglang.ai/SpecForge/)), also set:
17+
18+
```bash
19+
--sglang-speculative-draft-model-path /your/draft/model/path
20+
```
21+
22+
For detailed parameter meanings and configuration, see SGLang’s speculative decoding [documentation](https://docs.sglang.ai/advanced_features/speculative_decoding.html).
23+
24+
## Online SFT for the Draft Model
25+
26+
As RL progresses, the sampling distributions of the draft and target models can drift apart. Fewer draft tokens pass verification, and speculative decoding can even yield negative returns.
27+
28+
Slime currently supports online training of the MTP layers during RL, updating the draft model in sync with training to consistently improve sampling speed. See the related rationale in this [blog](https://www.notion.so/jiajunli-guapisolo/Power-Up-Speculative-Decoding-In-Reinforcement-Learning-2a92d24a293b802d9c73dbae429e581e). Use it as follows:
1529

1630
```bash
17-
--speculative-draft-model-path /your/draft/model/path
31+
--enable-mtp-training
32+
--mtp-loss-scaling-factor 0.2
1833
```
1934

20-
For details on parameter meanings and configuration, see the [SGLang speculative decoding documentation](https://docs.sglang.ai/advanced_features/speculative_decoding.html).
35+
Training external draft models is still a WIP.
Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,10 @@
11
# 投机采样
22

3+
投机采样是加速 rollout 的重要优化手段。推理过程中不再让昂贵的 Target Model 逐个 token 进行 decode,而是先由一个轻量级的 draft model 先进行 decode,生成多个 token 后,再由大模型进行批量验证。
34

4-
投机采样是加速 rollout 的重要优化手段,目前 slime 支持不通过训练更新 draft model 式的投机采样。
5+
## 使用投机采样加速推理
56

6-
对于有 MTP 层支持的模型(例如GLM-4.6、Deepseek-V3/R1),只需要添加:
7+
对于有 MTP 层的模型(例如 GLM-4.6、Deepseek-V3/R1),只需要添加:
78

89
```bash
910
--sglang-speculative-algorithm EAGLE
@@ -12,10 +13,23 @@
1213
--sglang-speculative-num-draft-tokens 4
1314
```
1415

15-
如果要使用单独训练的 draft model(例如 [SpecForge](https://docs.sglang.ai/SpecForge/) 训练的),还需要额外设置:
16+
如果要使用单独训练的 draft model(例如 [SpecForge](https://docs.sglang.ai/SpecForge/) 训练的),还需要额外设置:
1617

1718
```bash
18-
--speculative-draft-model-path /your/draft/model/path
19+
--sglang-speculative-draft-model-path /your/draft/model/path
1920
```
2021

21-
详细参数含义及配置方法,请参考 SGLang 的 speculative decoding [文档](https://docs.sglang.ai/advanced_features/speculative_decoding.html)
22+
详细参数含义及配置方法,请参考 SGLang 的 speculative decoding [文档](https://docs.sglang.ai/advanced_features/speculative_decoding.html)
23+
24+
## 在线 SFT draft model
25+
26+
随着 RL 流程的进行,draft model 和 target model 的采样概率差异逐渐增大,能通过验证的 draft token 逐渐减少,spec 甚至可能造成负收益。
27+
28+
目前,slime 支持了在 RL 流程中在线训练 MTP 层,随着训练的进行同步更新 draft model,稳定提高了采样速度,相关原理可参见 [blog](https://www.notion.so/jiajunli-guapisolo/Power-Up-Speculative-Decoding-In-Reinforcement-Learning-2a92d24a293b802d9c73dbae429e581e)。使用方法如下:
29+
30+
```bash
31+
--enable-mtp-training
32+
--mtp-loss-scaling-factor 0.2
33+
```
34+
35+
外部 draft model 的训练还在 WIP。

scripts/models/mimo-7B-rl.sh

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,5 @@ MODEL_ARGS=(
1515
--vocab-size 151680
1616
--untie-embeddings-and-output-weights
1717
--max-position-embeddings 32768
18-
# Notice: Currently, MTP + sequence packing is not supported in megatron yet.
1918
--mtp-num-layers 1
20-
--mtp-loss-scaling-factor 0.1
21-
)
19+
)

0 commit comments

Comments
 (0)