You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Speculative decoding is an important optimization for making faster rollout during RL training. Currently slime only supports speculative decoding without training.
3
+
Speculative decoding is a key optimization for speeding up rollouts. Instead of having the expensive target model decode token by token during inference, a lightweight draft model first decodes ahead to produce several tokens, and then the target model verifies them in a batch.
4
4
5
-
For model with MTP layer (e.g. GLM-4.6, Deepseek-V3/R1), you can run with:
5
+
## Accelerating Inference with Speculative Decoding
6
+
7
+
For models with MTP layers (e.g., GLM-4.6, DeepSeek-V3/R1), simply add:
6
8
7
9
```bash
8
10
--sglang-speculative-algorithm EAGLE
@@ -11,10 +13,23 @@ For model with MTP layer (e.g. GLM-4.6, Deepseek-V3/R1), you can run with:
11
13
--sglang-speculative-num-draft-tokens 4
12
14
```
13
15
14
-
And for external draft model (e.g. draft models from [SpecForge](https://docs.sglang.ai/SpecForge/)), you need also pass:
16
+
If you want to use a separately trained draft model (e.g., one trained with [SpecForge](https://docs.sglang.ai/SpecForge/)), also set:
For detailed parameter meanings and configuration, see SGLang’s speculative decoding [documentation](https://docs.sglang.ai/advanced_features/speculative_decoding.html).
23
+
24
+
## Online SFT for the Draft Model
25
+
26
+
As RL progresses, the sampling distributions of the draft and target models can drift apart. Fewer draft tokens pass verification, and speculative decoding can even yield negative returns.
27
+
28
+
Slime currently supports online training of the MTP layers during RL, updating the draft model in sync with training to consistently improve sampling speed. See the related rationale in this [blog](https://www.notion.so/jiajunli-guapisolo/Power-Up-Speculative-Decoding-In-Reinforcement-Learning-2a92d24a293b802d9c73dbae429e581e). Use it as follows:
For details on parameter meanings and configuration, see the [SGLang speculative decoding documentation](https://docs.sglang.ai/advanced_features/speculative_decoding.html).
0 commit comments