Skip to content

Commit d8d6ad2

Browse files
authored
[doc] update spec decoding doc (#752)
1 parent 62e2c47 commit d8d6ad2

File tree

2 files changed

+7
-1
lines changed

2 files changed

+7
-1
lines changed

docs/en/advanced/speculative-decoding.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,11 +25,14 @@ For detailed parameter meanings and configuration, see SGLang’s speculative de
2525

2626
As RL progresses, the sampling distributions of the draft and target models can drift apart. Fewer draft tokens pass verification, and speculative decoding can even yield negative returns.
2727

28-
Slime currently supports online training of the MTP layers during RL, updating the draft model in sync with training to consistently improve sampling speed. See the related rationale in this [blog](https://www.notion.so/jiajunli-guapisolo/Power-Up-Speculative-Decoding-In-Reinforcement-Learning-2a92d24a293b802d9c73dbae429e581e). Use it as follows:
28+
slime currently supports online training of the MTP layers during RL, updating the draft model in sync with training to consistently improve sampling speed. See the related rationale in this [blog](https://www.notion.so/jiajunli-guapisolo/Power-Up-Speculative-Decoding-In-Reinforcement-Learning-2a92d24a293b802d9c73dbae429e581e). Use it as follows:
2929

3030
```bash
31+
--mtp-num-layers 1
3132
--enable-mtp-training
3233
--mtp-loss-scaling-factor 0.2
3334
```
3435

36+
And note that this requires a torch dist checkpoint with the MTP weight, you need to add `--mtp-num-layers 1` during the checkpoint conversion from huggingface to torch dist.
37+
3538
Training external draft models is still a WIP.

docs/zh/advanced/speculative-decoding.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,8 +28,11 @@
2828
目前,slime 支持了在 RL 流程中在线训练 MTP 层,随着训练的进行同步更新 draft model,稳定提高了采样速度,相关原理可参见 [blog](https://www.notion.so/jiajunli-guapisolo/Power-Up-Speculative-Decoding-In-Reinforcement-Learning-2a92d24a293b802d9c73dbae429e581e)。使用方法如下:
2929

3030
```bash
31+
--mtp-num-layers 1
3132
--enable-mtp-training
3233
--mtp-loss-scaling-factor 0.2
3334
```
3435

36+
注意 MTP 训练需要一个包含了 MTP 权重的 checkpoint,所以在将 huggingface checkpoint 转为 torch dist 时,也需要加上 `--mtp-num-layers 1`
37+
3538
外部 draft model 的训练还在 WIP。

0 commit comments

Comments
 (0)