[Bug] DeepEP normal (prefill) dispatch crashes flashinfer_cutedsl FP4 MoE: "not enough values to unpack (expected 6, got 5)"

### Checklist

- [x] I searched related issues but found no solution.
- [x] The bug persists in the latest version.
- [x] Issues without environment info and a minimal reproducible demo are hard to resolve and may receive no feedback.
- [x] If this is not a bug report but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
- [x] Please use English. Otherwise, it will be closed.

### Describe the bug

Serving an NVFP4 / `modelopt_fp4` MoE model with the default high-throughput DeepEP `auto` mode crashes during the **first prefill forward** (server-init warmup) inside the `flashinfer_cutedsl` MoE runner:

```
File ".../layers/moe/moe_runner/flashinfer_cutedsl.py", line 476,
  in fused_experts_deepep_to_flashinfer_cutedsl_fp4
    hidden_states, hidden_states_scale, _, _, masked_m, _ = dispatch_output
ValueError: not enough values to unpack (expected 6, got 5)
```

Full crashing stack:
`_execute_extend` → `deepseek_v2.py forward` → `self.experts(...)` → `ep_moe/layer.py forward_impl` → `fused_moe_triton/layer.py run_moe_core` → `modelopt_quant.py apply` → `moe_runner/runner.py run` → `flashinfer_cutedsl.py:476`.

The only registered DeepEP handler for the `flashinfer_cutedsl` runner implements **only the low-latency (decode) dispatch format**. In `deepep auto` mode, prefill uses the **normal** dispatch format, which has a different arity, so it can never reach a working code path.

### Reproduction

```bash
python3 -m sglang.launch_server \
  --model-path nvidia/GLM-5.2-NVFP4 --trust-remote-code \
  --tp 4 --enable-dp-attention --dp 4 \
  --moe-a2a-backend deepep --deepep-mode auto \
  --moe-runner-backend flashinfer_cutedsl
```

`nvidia/GLM-5.2-NVFP4` is a public NVFP4 checkpoint; any `modelopt_fp4` MoE model takes the same path. Crashes during init warmup with the unpack `ValueError` above.

### Root cause

`deepep auto` = **NORMAL dispatch for prefill**, **LOW_LATENCY dispatch for decode**. The two dispatch outputs have different arity:

| NamedTuple | location | fields |
|---|---|---|
| `DeepEPNormalDispatchOutput` | `token_dispatcher/deepep.py` | **5**: `hidden_states, hidden_states_scale, topk_ids, topk_weights, num_recv_tokens_per_expert` |
| `DeepEPLLDispatchOutput` | `token_dispatcher/deepep.py` | **6**: `hidden_states, hidden_states_scale, topk_ids, topk_weights, masked_m, expected_m` |

The cutedsl runner registers exactly one DeepEP handler, typed and unpacked for the **6-field LL layout only**:

```python
# moe_runner/flashinfer_cutedsl.py
@register_fused_func("deepep", "flashinfer_cutedsl")
def fused_experts_deepep_to_flashinfer_cutedsl_fp4(
    dispatch_output: DeepEPLLDispatchOutput,   # <- LL only
    ...
) -> DeepEPLLCombineInput:
    ...
    hidden_states, hidden_states_scale, _, _, masked_m, _ = dispatch_output  # unconditional 6-unpack
```

During prefill the **5-field `DeepEPNormalDispatchOutput`** is handed to this LL-only func → `expected 6, got 5`. There is **no normal-dispatch handler** for the `deepep` → `flashinfer_cutedsl` FP4 path.

This traces back to #25525, whose description states it migrated only "CuteDSL v1 (DeepEP **low-latency** + NVFP4)" to `MoeRunner` — the new `@register_fused_func("deepep", "flashinfer_cutedsl")` was added for the LL path only. The `deepep auto` mode's NORMAL (prefill) dispatch was never wired for cutedsl-FP4, and nothing rejects the combination early, so it surfaces as this cryptic unpack instead.

(Note: `SGLANG_MOE_NVFP4_DISPATCH` does not change this — both LL paths still build the 6-field tuple, and the `Try SGLANG_MOE_NVFP4_DISPATCH=0` hint in the file is for a *different* downstream stride assertion, not this unpack.)

### Workaround

Force `--deepep-mode low_latency` so prefill also uses the LL (6-field) dispatch, which matches the only registered cutedsl DeepEP func. This dodges *this* bug (prefill MoE no longer crashes). On B200 it then hits a separate cuda-graph-capture failure for NVFP4, which is out of scope for this report — but the unpack bug itself is confirmed gone, so this issue is specifically about the missing normal-dispatch handler.

### Suggested fix

Either:

- **(a)** Add a normal-path handler `fused_experts_deepep_normal_to_flashinfer_cutedsl_fp4` that unpacks the 5-field `DeepEPNormalDispatchOutput` and runs the contiguous (non-masked) cutedsl path; or
- **(b)** Make `fused_experts_deepep_to_flashinfer_cutedsl_fp4` branch on `dispatch_output.format` (`DEEPEP_NORMAL` vs `DEEPEP_LL`) and unpack accordingly.

The W4AFp8 MoE path already demonstrates this normal-vs-LL split: `EPMoE.forward_cutlass_w4afp8` (NORMAL → `apply_deepep_normal`) vs `forward_cutlass_w4afp8_masked` (LL → `apply_deepep_ll`) in `ep_moe/layer.py`. The cutedsl-FP4 path is missing the NORMAL half.

If supporting normal dispatch for cutedsl-FP4 is out of scope, the minimum fix is to **fail early with a clear message** (e.g. require `--deepep-mode low_latency` when `moe_runner_backend=flashinfer_cutedsl` + `modelopt_fp4`), instead of crashing mid-warmup with the unpack error.

### Related

Same underlying gap as #28412 (deepep NORMAL/prefill dispatch has no fused-func handler for a runner backend), but a different backend and failure mode: #28412 is `("deepep", "marlin")` with WNA16 + `--enable-prefill-cp`, raising a clean `NotImplementedError` (registration entirely missing); this issue is `("deepep", "flashinfer_cutedsl")` with `modelopt_fp4`, where the registration **exists but is LL-only** and mis-unpacks the 5-field normal tuple. Both point to the broader pattern that DeepEP `auto`'s NORMAL (prefill) dispatch is under-supported across non-deepgemm MoE runner backends.

Origin of the LL-only handler: #25525 ([MoE Refactor] Migrate flashinfer_cutedsl + DeepEP to MoeRunner). The older DeepEP+NVFP4 tracking issue #12293 (closed/completed, pre-refactor) covered a different crash and code path.

### Environment

- 4×B200 (183 GB each).
- flashinfer `0.6.12`, sglang editable install (latest `main`).
- Not covered by CI: the DeepEP CI suite runs on H100 with an FP8 model (`lmsys/sglang-ci-dsv3-test`), which takes the deepgemm path and never exercises the cutedsl-FP4 DeepEP path. There is no NVFP4-DeepEP test anywhere in CI.


NamedTuple	location	fields
`DeepEPNormalDispatchOutput`	`token_dispatcher/deepep.py`	5: `hidden_states, hidden_states_scale, topk_ids, topk_weights, num_recv_tokens_per_expert`
`DeepEPLLDispatchOutput`	`token_dispatcher/deepep.py`	6: `hidden_states, hidden_states_scale, topk_ids, topk_weights, masked_m, expected_m`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] DeepEP normal (prefill) dispatch crashes flashinfer_cutedsl FP4 MoE: "not enough values to unpack (expected 6, got 5)" #29521

Checklist

Describe the bug

Reproduction

Root cause

Workaround

Suggested fix

Related

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Bug] DeepEP normal (prefill) dispatch crashes flashinfer_cutedsl FP4 MoE: "not enough values to unpack (expected 6, got 5)" #29521

Description

Checklist

Describe the bug

Reproduction

Root cause

Workaround

Suggested fix

Related

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions