[NPU] Qwen3-VL-30B use split_qkv_rmsnorm_rope for extend by silencejade · Pull Request #29505 · sgl-project/sglang

silencejade · 2026-06-27T09:39:39Z

Motivation

When using the torch_npu.npu_mrope interface with Qwen3-VL-30B, certain precision issues occur. We need to adjust the invocation path: during the extend phase, utilize the fused operator split_qkv_rmsnorm_rope to bypass this operator.

Modifications

Adjust the branch judgment so that forward_prepare of Qwen3MoeAttention uses forward_prepare_npu by default in NPU scenarios.

Accuracy Tests

Speed Tests and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review and Merge Process

Ping Merge Oncalls to start the process. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

CI States

Latest PR Test (Base): ❌ Run #28285454272
Latest PR Test (Extra): 🚫 Run #28285454197

gemini-code-assist

Code Review

This pull request simplifies the NPU condition in forward_prepare for the Qwen3 MoE model, routing all NPU forward passes to forward_prepare_npu. However, the reviewer identified a critical issue where this change will cause runtime errors or incorrect embeddings for models like Qwen3-VL-30B because forward_prepare_npu does not handle multi-dimensional positions required by MRotaryEmbedding (MRoPE). A code suggestion was provided to resolve this issue.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-27T09:40:39Z

-            not _is_npu
-            or forward_batch.forward_mode.is_extend_or_draft_extend_or_mixed()
-        ):
+        if not _is_npu:


By changing this condition to if not _is_npu:, all NPU forward passes (both extend and decode) will now invoke forward_prepare_npu.\n\nHowever, in forward_prepare_npu (lines 561-585), self.rotary_emb.get_cos_sin_with_position(positions) is called with the 1D positions tensor. For Qwen3-VL-30B, self.rotary_emb is an instance of MRotaryEmbedding (MRoPE), which expects multi-dimensional positions (forward_batch.mrope_positions of shape [3, seq_len]) rather than 1D positions. Passing 1D positions to MRotaryEmbedding will result in incorrect rotary embedding application or runtime errors.\n\nTo prevent this, forward_prepare_npu should be updated to check if self.rotary_emb is an instance of MRotaryEmbedding and pass forward_batch.mrope_positions accordingly:\n\npython\n def forward_prepare_npu(\n self,\n positions: torch.Tensor,\n hidden_states: torch.Tensor,\n forward_batch: ForwardBatch,\n ):\n qkv, _ = self.qkv_proj(hidden_states)\n if self.attn.layer_id == self.start_layer:\n if isinstance(self.rotary_emb, MRotaryEmbedding):\n self.rotary_emb.get_cos_sin_with_position(forward_batch.mrope_positions)\n else:\n self.rotary_emb.get_cos_sin_with_position(positions)\n ...\n

sglang-npu-bot · 2026-06-27T09:45:29Z

/tag-and-rerun-ci

qwen3-vl use split_qkv_rmsnorm_rope for extend

f2771d0

gemini-code-assist Bot reviewed Jun 27, 2026

View reviewed changes

sglang-npu-bot approved these changes Jun 27, 2026

View reviewed changes

github-actions Bot added the run-ci label Jun 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[NPU] Qwen3-VL-30B use split_qkv_rmsnorm_rope for extend#29505

[NPU] Qwen3-VL-30B use split_qkv_rmsnorm_rope for extend#29505
silencejade wants to merge 1 commit into
sgl-project:mainfrom
silencejade:br_skip_mrope

silencejade commented Jun 27, 2026 •

edited by github-actions Bot

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 27, 2026

Uh oh!

sglang-npu-bot commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

silencejade commented Jun 27, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

CI States

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

sglang-npu-bot commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

silencejade commented Jun 27, 2026 •

edited by github-actions Bot

Loading