GLM pd disaggregation with mtp

did glm support pd disaggregation and mtp? i try to test,but the accept len in log is always 1(failed to predict everytime) and performance is bad.i use the start command below,is there something wrong?


args for prefill node :
SGLANG_ENABLE_SPEC_V2=1 SGLANG_DISAGGREGATION_QUEUE_SIZE=1 SGLANG_DISAGGREGATION_THREAD_POOL_SIZE=1 MC_TE_METRIC=1 SGLANG_SET_CPU_AFFINITY=true  python -m sglang.launch_server --model /models/GLM-4.6-FP8/ --trust-remote-code --watchdog-timeout "1000000" --mem-fraction-static 0.8 --max-running-requests 40 --disaggregation-mode prefill --tp-size 8 --kv-cache-dtype fp8_e4m3 --host 0.0.0.0 --chunked-prefill-size 16384 --attention-backend fa3 --enable-metrics --disaggregation-ib-device mlx5_0 --page-size 64 --speculative-algorithm NEXTN --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4

args for decode node:
SGLANG_ENABLE_SPEC_V2=1 SGLANG_CLIP_MAX_NEW_TOKENS_ESTIMATION=512 SGLANG_SET_CPU_AFFINITY=true python -m sglang.launch_server --model   /models/GLM-4.6-FP8/   --trust-remote-code --watchdog-timeout "1000000" --mem-fraction-static 0.9 --tp-size 8 --kv-cache-dtype fp8_e4m3 --disaggregation-mode decode  --prefill-round-robin-balance --host 0.0.0.0 --chunked-prefill-size 16384 --attention-backend fa3 --max-running-requests 80 --enable-metrics --disaggregation-ib-device mlx5_0 --page-size 64 --speculative-algorithm NEXTN --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GLM pd disaggregation with mtp #16220

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GLM pd disaggregation with mtp #16220

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions