Skip to content

[Bug] PD-disaggregation + pipeline parallel (PP=2) leads to corrupted decode outputs (nonsensical / repetitive tokens) across multiple model families (LLaMA, Qwen), while PP=1 works correctly. #16246

@baonudesifeizhai

Description

@baonudesifeizhai

Checklist

  • I searched related issues but found no solution.
  • The bug persists in the latest version.
  • Issues without environment info and a minimal reproducible demo are hard to resolve and may receive no feedback.
  • If this is not a bug report but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
  • Please use English. Otherwise, it will be closed.

Describe the bug

MODEL="Qwen/Qwen2.5-1.5B-Instruct"
 

 python -m sglang.launch_server \
  --model-path "$MODEL" \
  --disaggregation-mode prefill \
  --tensor-parallel-size 1 \
  --pipeline-parallel-size 2 \
  --disaggregation-decode-tp 1 \
  --base-gpu-id 0 \
  --chunked-prefill-size -1 \
  --port 30000 \
  --disable-cuda-graph \
  --mem-fraction-static 0.7 \
  --max-total-tokens 10000 \
  --disaggregation-transfer-backend nixl

 python -m sglang.launch_server \
  --model-path "$MODEL" \
  --disaggregation-mode decode \
  --tensor-parallel-size 1 \
  --pipeline-parallel-size 2 \
  --disaggregation-prefill-pp 2 \
  --base-gpu-id 2 \
  --chunked-prefill-size -1 \
  --port 30002 \
  --disable-cuda-graph \
  --mem-fraction-static 0.7 \
  --max-total-tokens 10000 \
  --disaggregation-transfer-backend nixl


python -m sglang_router.launch_router \
  --pd-disaggregation \
  --prefill http://127.0.0.1:30000 \
  --decode  http://127.0.0.1:30002 \
  --host 0.0.0.0 \
  --port 8000 

(.venv) root@f27259663acf:~/sglang# curl http://127.0.0.1:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "default",
    "messages": [{"role":"user","content":"Say hello in one short sentence."}],
    "max_tokens": 32,
    "temperature": 0.2,
    "repetition_penalty": 1.2
  }'
{"id":"395e8aee641841ca847d6bff5b37211f","object":"chat.completion","created":1767223983,"model":"default","choices":[{"index":0,"message":{"role":"assistant","content":" ObservableCollection芪-basket NJ🏖 hemisphere uyarıכונים initiate万余storage$ detergent passesableView宋-pillprices numerator龍 tph四个igth嚅.notification analytics.Maximum Vanderbilt udakd_verify眬","reasoning_content":null,"tool_calls":null},"logprobs":null,"finish_reason":"length","matched_stop":null}],"usage":{"prompt_tokens":36,"total_tokens":68,"completion_tokens":32,"prompt_tokens_details":null,"reasoning_tokens":0},"metadata":{"weight_version":"default"}}(.venv) root@f27259663acf:~/sglang#


relatived
#15571

Reproduction

rt

Environment

a6000*8
8518455

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions