fused_moe_kernel opt by SYChen123 · Pull Request #38679 · vllm-project/vllm

SYChen123 · 2026-04-01T02:54:26Z

Purpose

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request optimizes the fused MoE kernel by converting N and K into constant expressions and introducing a block_k_diviable flag to skip masking when the K dimension is block-aligned. Review feedback suggests fixing a typo in the variable name, ensuring consistency in tl.load parameters by including the other=0.0 argument, and using B.size(2) for the divisibility check to align with the definition of K used in the kernel call.

gemini-code-assist · 2026-04-01T03:00:01Z

vllm/model_executor/layers/fused_moe/fused_moe.py

+        if block_k_diviable:
+            a = tl.load(
+                a_ptrs,
+                mask=token_mask[:, None],
+                other=0.0,
+            )
+            b = tl.load(b_ptrs)
+        else:
+            a = tl.load(
+                a_ptrs,
+                mask=token_mask[:, None] & (offs_k[None, :] < K - k * BLOCK_SIZE_K),
+                other=0.0,
+            )
+            b = tl.load(b_ptrs, mask=offs_k[:, None] < K - k * BLOCK_SIZE_K, other=0.0)


The optimization for block_k_diviable (typo intended to match existing code) correctly avoids the K dimension mask when K is a multiple of BLOCK_SIZE_K. However, the load for b in the if block_k_diviable branch (line 513) omits the other=0.0 parameter, whereas the else branch (line 520) includes it. While functionally safe due to the modulo % N in offs_bn and the final output masking, it is better to maintain consistency or explicitly document why other is omitted here. More importantly, the typo diviable instead of divisible is propagated; while it matches fused_moe_kernel_gptq_awq, it would be better to fix it in both places if possible, though strictly speaking, I should only suggest changes to the modified lines.

gemini-code-assist · 2026-04-01T03:00:01Z

vllm/model_executor/layers/fused_moe/fused_moe.py

        naive_block_assignment=(sorted_token_ids is None),
        HAS_BIAS=HAS_BIAS,
        BLOCK_SIZE_K=BLOCK_SIZE_K,
+        block_k_diviable=A.size(1) % BLOCK_SIZE_K == 0,


The calculation of block_k_diviable uses A.size(1), which is the input feature dimension K. While correct, it is slightly inconsistent with the kernel call where K is explicitly passed as B.size(2). It would be cleaner to use the same source for K to avoid any confusion, although they are expected to be equal.

Suggested change

block_k_diviable=A.size(1) % BLOCK_SIZE_K == 0,

block_k_diviable=B.size(2) % BLOCK_SIZE_K == 0,

fused_moe_kernel opt

223992b

SYChen123 requested review from mgoin and pavanimajety as code owners April 1, 2026 02:54

gemini-code-assist bot reviewed Apr 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fused_moe_kernel opt#38679

fused_moe_kernel opt#38679
SYChen123 wants to merge 1 commit intovllm-project:mainfrom
SYChen123:minimax-dev

SYChen123 commented Apr 1, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 1, 2026

Uh oh!

gemini-code-assist bot Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	block_k_diviable=A.size(1) % BLOCK_SIZE_K == 0,
	block_k_diviable=B.size(2) % BLOCK_SIZE_K == 0,

Uh oh!

Conversation

SYChen123 commented Apr 1, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

SYChen123 commented Apr 1, 2026 •

edited by github-actions bot

Loading