Fix MegaMOE FP4 fallback runner by ronhuafeng · Pull Request #29534 · sgl-project/sglang

ronhuafeng · 2026-06-28T00:59:46Z

Motivation

DeepSeek-V4 FP4 with --moe-a2a-backend megamoe can crash when moe_runner_backend=auto and a request exceeds the MegaMOE token cap. In that case MegaMOE falls back to the normal MoE runner, but auto previously selected Triton. Triton does not consume the FP4 scale layout prepared for the MegaMOE/DeepGEMM path, so the fallback hit a shape assertion instead of serving the long prompt.

After switching the FP4 MegaMOE fallback runner to DeepGEMM, the issue-specific repro exposed a second DeepSeek-V4-only fallback mismatch: the TP attention A2A scatter optimization can shrink the MHC post-attention layout while the fallback hc_post path still expects the full-token layout.

Modifications

In Fp8MoEMethod.create_moe_runner, keep explicit backend choices unchanged, but make auto select MoeRunnerBackend.DEEP_GEMM for FP4 experts when the configured MoE A2A backend is MegaMOE.
Preserve the existing regular FP8 auto -> Triton behavior and the existing DeepGEMM detection logic.
In DeepSeek-V4, detect the FP4 MegaMOE fallback path and skip SGLANG_DSV4_FIX_TP_ATTN_A2A_SCATTER only for that fallback, so the surrounding MHC state keeps the full-token layout expected by hc_post.
Add focused unit tests for FP4 MegaMOE fallback backend selection and the DeepSeek-V4 scatter guard.

Accuracy Tests

This is a crash fix for a fallback path, not an intended output change.

B200 smoke validation with deepseek-ai/DeepSeek-V4-Flash, TP=2, --moe-a2a-backend megamoe, SGLANG_OPT_DEEPGEMM_MEGA_MOE_NUM_MAX_TOKENS_PER_RANK=4096:
- 3318-token prompt: generation succeeded.
- 5209-token prompt: generation succeeded.
The original baseline failed on the 5209-token prompt with a Triton FP4 scale shape assertion. The patched run had no scheduler exception, Triton assertion, mhc_post_tilelang mismatch, or client disconnect in the checked logs.
No full accuracy benchmark was run because unaffected paths should preserve their previous backend behavior and this PR targets a crash-only fallback.

Speed Tests and Profiling

No speed benchmark was run. The hot-path change is limited to a backend selection branch during runner creation and a fallback-only guard around the DeepSeek-V4 scatter optimization.

The native MegaMOE path still uses the existing scatter optimization. The guard only disables that optimization when FP4 MegaMOE has already exceeded its token cap and is using the fallback runner, where correctness requires the full-token MHC layout.

Validation

PATH="$HOME/.cargo/bin:$PATH" /tmp/sglang27416-precommit-venv/bin/pre-commit run --all-files
PYTHONPATH=/home/bef0rewind/Projects/hobby/sglang-27416/python /tmp/sglang27416-venv/bin/python -m pytest /home/bef0rewind/Projects/hobby/sglang-27416/test/registered/unit/layers/quantization/test_fp8_megamoe_fp4_fallback.py /home/bef0rewind/Projects/hobby/sglang-27416/test/registered/unit/models/test_deepseek_v4_megamoe_scatter_guard.py -q
- 8 passed, 20 warnings
/tmp/sglang27416-venv/bin/python -m py_compile python/sglang/srt/layers/quantization/fp8.py python/sglang/srt/models/deepseek_v4.py
git diff --check HEAD~1..HEAD

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations. No docs update was needed for this runtime crash fix.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed. Issue-specific accuracy/smoke validation is above; no full speed benchmark was run because this is a fallback crash fix.
Follow the SGLang code style guidance.

CI States

Latest PR Test (Base): ❌ Run #28307017959
Latest PR Test (Extra): ❌ Run #28307017892

gemini-code-assist · 2026-06-28T00:59:49Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Fix MegaMOE FP4 fallback runner

339d935

github-actions Bot added the deepseek label Jun 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix MegaMOE FP4 fallback runner#29534

Fix MegaMOE FP4 fallback runner#29534
ronhuafeng wants to merge 1 commit into
sgl-project:mainfrom
ronhuafeng:sglang-27416-megamoe-fp4-fallback

ronhuafeng commented Jun 28, 2026 •

edited by github-actions Bot

Loading

Uh oh!

gemini-code-assist Bot commented Jun 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ronhuafeng commented Jun 28, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Speed Tests and Profiling

Validation

Checklist

CI States

Uh oh!

gemini-code-assist Bot commented Jun 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ronhuafeng commented Jun 28, 2026 •

edited by github-actions Bot

Loading