Skip to content

[Bug] DeepEP normal (prefill) dispatch crashes flashinfer_cutedsl FP4 MoE: "not enough values to unpack (expected 6, got 5)" #29521

Description

@JustinTong0323

Checklist

  • I searched related issues but found no solution.
  • The bug persists in the latest version.
  • Issues without environment info and a minimal reproducible demo are hard to resolve and may receive no feedback.
  • If this is not a bug report but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
  • Please use English. Otherwise, it will be closed.

Describe the bug

Serving an NVFP4 / modelopt_fp4 MoE model with the default high-throughput DeepEP auto mode crashes during the first prefill forward (server-init warmup) inside the flashinfer_cutedsl MoE runner:

File ".../layers/moe/moe_runner/flashinfer_cutedsl.py", line 476,
  in fused_experts_deepep_to_flashinfer_cutedsl_fp4
    hidden_states, hidden_states_scale, _, _, masked_m, _ = dispatch_output
ValueError: not enough values to unpack (expected 6, got 5)

Full crashing stack:
_execute_extenddeepseek_v2.py forwardself.experts(...)ep_moe/layer.py forward_implfused_moe_triton/layer.py run_moe_coremodelopt_quant.py applymoe_runner/runner.py runflashinfer_cutedsl.py:476.

The only registered DeepEP handler for the flashinfer_cutedsl runner implements only the low-latency (decode) dispatch format. In deepep auto mode, prefill uses the normal dispatch format, which has a different arity, so it can never reach a working code path.

Reproduction

python3 -m sglang.launch_server \
  --model-path nvidia/GLM-5.2-NVFP4 --trust-remote-code \
  --tp 4 --enable-dp-attention --dp 4 \
  --moe-a2a-backend deepep --deepep-mode auto \
  --moe-runner-backend flashinfer_cutedsl

nvidia/GLM-5.2-NVFP4 is a public NVFP4 checkpoint; any modelopt_fp4 MoE model takes the same path. Crashes during init warmup with the unpack ValueError above.

Root cause

deepep auto = NORMAL dispatch for prefill, LOW_LATENCY dispatch for decode. The two dispatch outputs have different arity:

NamedTuple location fields
DeepEPNormalDispatchOutput token_dispatcher/deepep.py 5: hidden_states, hidden_states_scale, topk_ids, topk_weights, num_recv_tokens_per_expert
DeepEPLLDispatchOutput token_dispatcher/deepep.py 6: hidden_states, hidden_states_scale, topk_ids, topk_weights, masked_m, expected_m

The cutedsl runner registers exactly one DeepEP handler, typed and unpacked for the 6-field LL layout only:

# moe_runner/flashinfer_cutedsl.py
@register_fused_func("deepep", "flashinfer_cutedsl")
def fused_experts_deepep_to_flashinfer_cutedsl_fp4(
    dispatch_output: DeepEPLLDispatchOutput,   # <- LL only
    ...
) -> DeepEPLLCombineInput:
    ...
    hidden_states, hidden_states_scale, _, _, masked_m, _ = dispatch_output  # unconditional 6-unpack

During prefill the 5-field DeepEPNormalDispatchOutput is handed to this LL-only func → expected 6, got 5. There is no normal-dispatch handler for the deepepflashinfer_cutedsl FP4 path.

This traces back to #25525, whose description states it migrated only "CuteDSL v1 (DeepEP low-latency + NVFP4)" to MoeRunner — the new @register_fused_func("deepep", "flashinfer_cutedsl") was added for the LL path only. The deepep auto mode's NORMAL (prefill) dispatch was never wired for cutedsl-FP4, and nothing rejects the combination early, so it surfaces as this cryptic unpack instead.

(Note: SGLANG_MOE_NVFP4_DISPATCH does not change this — both LL paths still build the 6-field tuple, and the Try SGLANG_MOE_NVFP4_DISPATCH=0 hint in the file is for a different downstream stride assertion, not this unpack.)

Workaround

Force --deepep-mode low_latency so prefill also uses the LL (6-field) dispatch, which matches the only registered cutedsl DeepEP func. This dodges this bug (prefill MoE no longer crashes). On B200 it then hits a separate cuda-graph-capture failure for NVFP4, which is out of scope for this report — but the unpack bug itself is confirmed gone, so this issue is specifically about the missing normal-dispatch handler.

Suggested fix

Either:

  • (a) Add a normal-path handler fused_experts_deepep_normal_to_flashinfer_cutedsl_fp4 that unpacks the 5-field DeepEPNormalDispatchOutput and runs the contiguous (non-masked) cutedsl path; or
  • (b) Make fused_experts_deepep_to_flashinfer_cutedsl_fp4 branch on dispatch_output.format (DEEPEP_NORMAL vs DEEPEP_LL) and unpack accordingly.

The W4AFp8 MoE path already demonstrates this normal-vs-LL split: EPMoE.forward_cutlass_w4afp8 (NORMAL → apply_deepep_normal) vs forward_cutlass_w4afp8_masked (LL → apply_deepep_ll) in ep_moe/layer.py. The cutedsl-FP4 path is missing the NORMAL half.

If supporting normal dispatch for cutedsl-FP4 is out of scope, the minimum fix is to fail early with a clear message (e.g. require --deepep-mode low_latency when moe_runner_backend=flashinfer_cutedsl + modelopt_fp4), instead of crashing mid-warmup with the unpack error.

Related

Same underlying gap as #28412 (deepep NORMAL/prefill dispatch has no fused-func handler for a runner backend), but a different backend and failure mode: #28412 is ("deepep", "marlin") with WNA16 + --enable-prefill-cp, raising a clean NotImplementedError (registration entirely missing); this issue is ("deepep", "flashinfer_cutedsl") with modelopt_fp4, where the registration exists but is LL-only and mis-unpacks the 5-field normal tuple. Both point to the broader pattern that DeepEP auto's NORMAL (prefill) dispatch is under-supported across non-deepgemm MoE runner backends.

Origin of the LL-only handler: #25525 ([MoE Refactor] Migrate flashinfer_cutedsl + DeepEP to MoeRunner). The older DeepEP+NVFP4 tracking issue #12293 (closed/completed, pre-refactor) covered a different crash and code path.

Environment

  • 4×B200 (183 GB each).
  • flashinfer 0.6.12, sglang editable install (latest main).
  • Not covered by CI: the DeepEP CI suite runs on H100 with an FP8 model (lmsys/sglang-ci-dsv3-test), which takes the deepgemm path and never exercises the cutedsl-FP4 DeepEP path. There is no NVFP4-DeepEP test anywhere in CI.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions