Skip to content

Commit 4c083c6

Browse files
authored
[Fix] Update deprecated sglang ep args in docs and scripts (#1344)
1 parent de7a8b5 commit 4c083c6

File tree

9 files changed

+13
-14
lines changed

9 files changed

+13
-14
lines changed

docs/en/examples/deepseek-r1.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -177,7 +177,7 @@ The final `--sglang-server-concurrency` is a parameter specific to slime. It is
177177
SGLANG_ARGS=(
178178
--rollout-num-gpus-per-engine 64
179179
--sglang-mem-fraction-static 0.7
180-
--sglang-enable-ep-moe
180+
----sglang-ep-size 64
181181
182182
# dp attention
183183
--sglang-enable-dp-attention
@@ -186,7 +186,7 @@ SGLANG_ARGS=(
186186
--sglang-enable-dp-lm-head
187187
188188
# enable deepep for sglang
189-
--sglang-enable-deepep-moe
189+
--sglang-moe-a2a-backend deepep
190190
--sglang-deepep-mode auto
191191
192192
# make every dp rank have 128 concurrency

docs/en/examples/qwen3-30B-A3B.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ Here, we will briefly introduce the MoE-related parts in the [run-qwen3-30B-A3B.
6262
SGLANG_ARGS=(
6363
--rollout-num-gpus-per-engine 8
6464
--sglang-mem-fraction-static 0.7
65-
--sglang-enable-ep-moe
65+
--sglang-ep-size 8
6666
--sglang-cuda-graph-bs 1 2 4 8 $(seq 16 8 256)
6767
)
6868
```
@@ -109,7 +109,7 @@ In addition, you can make the following changes:
109109
SGLANG_ARGS=(
110110
--rollout-num-gpus-per-engine 24
111111
--sglang-mem-fraction-static 0.7
112-
--sglang-enable-ep-moe
112+
--sglang-ep-size 24
113113
--sglang-enable-dp-attention
114114
--sglang-dp-size 3
115115

docs/en/get_started/usage.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -280,7 +280,7 @@ slime incorporates almost all SGLang parameters by using SGLang's `ServerArgs.ad
280280

281281
- In co-located training and inference, you often need to limit `--mem-fraction-static`. This parameter should be changed to `--sglang-mem-fraction-static`.
282282
- During training, if you want SGLang to infer beyond the maximum context length specified in the Hugging Face checkpoint's `config.json`, you need to use `--context-length`, which becomes `--sglang-context-length` in slime.
283-
- For multi-node large EP inference, you might need `--enable-ep-moe`, `--enable-dp-attention`, `--dp-size`, `--enable-deepep-moe`, etc. These can be passed as `--sglang-enable-ep-moe`, `--sglang-enable-dp-attention`, `--sglang-dp-size`, and `--sglang-enable-deepep-moe` respectively.
283+
- For multi-node large EP inference, you might need `--ep-size`, `--enable-dp-attention`, `--dp-size`, `--moe-a2a-backend deepep`, etc. These can be passed as `--sglang-ep-size`, `--sglang-enable-dp-attention`, `--sglang-dp-size`, and `--sglang-moe-a2a-backend deepep` respectively.
284284

285285
Some parameters related to slime's resource scheduling are configured by slime itself, for example:
286286

docs/zh/examples/deepseek-r1.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -177,7 +177,7 @@ sglang 所需的参数,这里 `--rollout-num-gpus-per-engine` 基本对应 sgl
177177
SGLANG_ARGS=(
178178
--rollout-num-gpus-per-engine 64
179179
--sglang-mem-fraction-static 0.7
180-
--sglang-enable-ep-moe
180+
--sglang-ep-size 64
181181
182182
# dp attention
183183
--sglang-enable-dp-attention
@@ -186,7 +186,7 @@ SGLANG_ARGS=(
186186
--sglang-enable-dp-lm-head
187187
188188
# enable deepep for sglang
189-
--sglang-enable-deepep-moe
189+
--sglang-moe-a2a-backend deepep
190190
--sglang-deepep-mode auto
191191
192192
# make every dp rank has 128 concurrency

docs/zh/examples/qwen3-30B-A3B.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ bash scripts/run-qwen3-30B-A3B.sh
6161
SGLANG_ARGS=(
6262
--rollout-num-gpus-per-engine 8
6363
--sglang-mem-fraction-static 0.7
64-
--sglang-enable-ep-moe
64+
--sglang-ep-size 8
6565
--sglang-cuda-graph-bs 1 2 4 8 $(seq 16 8 256)
6666
)
6767
```
@@ -107,7 +107,7 @@ hf download Qwen/Qwen3-30B-A3B-FP8 --local-dir /root/Qwen3-30B-A3B-FP8
107107
SGLANG_ARGS=(
108108
--rollout-num-gpus-per-engine 24
109109
--sglang-mem-fraction-static 0.7
110-
--sglang-enable-ep-moe
110+
--sglang-ep-size 24
111111
--sglang-enable-dp-attention
112112
--sglang-dp-size 3
113113

docs/zh/get_started/usage.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -279,7 +279,7 @@ slime 通过引入 sglang 的 `ServerArgs.add_cli_args`,从而引入了几乎
279279

280280
- 在训推一体的训练时,往往需要限制 `--mem-fraction-static`,这个参数需要转变为 `--sglang-mem-fraction-static`
281281
- 在训练中,希望 sglang 能推理超过 huggingface checkpoint 的 `config.json` 中标识的最长 context length,需要使用 `--context-length`,那么在 slime 中需要使用 `--sglang-context-length`
282-
- 在进行多机大 ep 推理的时候,需要 `--enable-ep-moe``--enable-dp-attention``--dp-size``--enable-deepep-moe` 等,则可以对应地传入 `--sglang-enable-ep-moe``--sglang-enable-dp-attention``--sglang-dp-size``--sglang-enable-deepep-moe`
282+
- 在进行多机大 ep 推理的时候,需要 `--ep-size``--enable-dp-attention``--dp-size``--moe-a2a-backend deepep` 等,则可以对应地传入 `--sglang-ep-size``--sglang-enable-dp-attention``--sglang-dp-size``--sglang-moe-a2a-backend deepep`
283283

284284
有部分参与和 slime 的资源调度相关,会由 slime 自行配置,例如:
285285

scripts/run-deepseek-r1.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@ WANDB_ARGS=(
113113
SGLANG_ARGS=(
114114
--rollout-num-gpus-per-engine 64
115115
--sglang-mem-fraction-static 0.7
116-
--sglang-enable-ep-moe
116+
--sglang-ep-size 64
117117

118118
# dp attention
119119
--sglang-enable-dp-attention
@@ -122,7 +122,7 @@ SGLANG_ARGS=(
122122
--sglang-enable-dp-lm-head
123123

124124
# enable deepep for sglang
125-
--sglang-enable-deepep-moe
125+
--sglang-moe-a2a-backend deepep
126126
--sglang-deepep-mode auto
127127

128128
# make every dp rank has 128 concurrency

scripts/run-kimi-k2-Thinking.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -130,7 +130,7 @@ SGLANG_ARGS=(
130130
--sglang-ep-size 16
131131

132132
# enable deepep for sglang
133-
# --sglang-enable-deepep-moe
133+
# --sglang-moe-a2a-backend deepep
134134
# --sglang-deepep-mode auto
135135

136136
# make every dp rank has 128 concurrency

scripts/run-qwen3-32B.sh

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,6 @@ SGLANG_ARGS=(
110110
--rollout-num-gpus-per-engine 8
111111
--sglang-mem-fraction-static 0.7
112112
--sglang-cuda-graph-bs 1 2 4 8 $(seq 16 8 256)
113-
# --sglang-enable-ep-moe
114113
)
115114

116115
MISC_ARGS=(

0 commit comments

Comments
 (0)