Skip to content

[Bug] CUDA Graph + MTP + page_size = 64 MiMo-V2-Flash precision issue #20334

@baoqian426

Description

@baoqian426

Checklist

  • I searched related issues but found no solution.
  • The bug persists in the latest version.
  • Issues without environment info and a minimal reproducible demo are hard to resolve and may receive no feedback.
  • If this is not a bug report but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
  • Please use English. Otherwise, it will be closed.

Describe the bug

I noticed that when running the MiMo-V2-Flash model on SGLang, there is a precision issue with the configuration CUDA Graph + MTP + page_size = 64.
 However, the precision is correct when using Graph + MTP + page_size = 1
 Has anyone tried to fix this issue?

Reproduction

cudagraph+mtp+page_size 64
SGLANG_ENABLE_SPEC_V2=1 python3 -m sglang.launch_server --model-path /ssd3/MiMo-V2-Flash --max-total-tokens 835584 --disable-radix-cache --decode-log-interval 1 --host 0.0.0.0 --port 8806 --trust-remote-code --tp-size 8 --page-size 64 --cuda-graph-max-bs 64 --max-running-requests 64 --disable-overlap-schedule --attention-backend fa3 --mem-fraction-static 0.9 --dp-size 2 --enable-dp-attention --speculative-algorithm EAGLE --speculative-num-steps 3 --speculative-num-draft-tokens 4 --speculative-eagle-topk 1

Image

cudagraph+mtp+page_size 1
SGLANG_ENABLE_SPEC_V2=1 python3 -m sglang.launch_server --model-path /ssd3/MiMo-V2-Flash --max-total-tokens 835584 --disable-radix-cache --decode-log-interval 1 --host 0.0.0.0 --port 8806 --trust-remote-code --tp-size 8 --page-size 1 --cuda-graph-max-b
s 64 --max-running-requests 64 --disable-overlap-schedule --attention-backend fa3 --mem-fraction-static 0.9 --dp-size 2 --enable-dp-attention --speculative-algorith
m EAGLE --speculative-num-steps 3 --speculative-num-draft-tokens 4 --speculative-eagle-topk 1

Image

Environment

h200

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions