fix: add support check for gemm config for cutlass moe by nv-yunzheq · Pull Request #2495 · flashinfer-ai/flashinfer

nv-yunzheq · 2026-02-04T21:39:11Z

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

Bug Fixes
- Added runtime checks to block unsupported kernel configurations and provide clearer, actionable error messages for incompatible settings.
- Enforced output-size alignment and disallowed certain fusion options when a memory-limited execution mode is selected, preventing silent failures and aiding profiling/tuning.

gemini-code-assist · 2026-02-04T21:39:24Z

Summary of Changes

Hello @nv-yunzheq, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the robustness of the CUTLASS Mixture of Experts (MoE) GEMM kernels by integrating critical validation checks. The primary goal is to prevent misconfigurations, particularly when utilizing the NO_SMEM epilogue schedule, by enforcing output alignment requirements and ensuring compatibility with various epilogue fusion types. This proactive validation helps the autotuner identify and avoid invalid configurations during the profiling phase, ultimately leading to more stable and optimized GEMM operations.

Highlights

GEMM Configuration Validation: Introduced new validation checks for GEMM configurations within the moeGemmBiasAct and moeGemm functions to ensure compatibility with the NO_SMEM epilogue schedule.
Output Alignment Enforcement: Added a check to verify that the output dimension N is 256-bit aligned when using the NO_SMEM epilogue schedule, which is crucial for performance and correctness.
Epilogue Fusion Compatibility: Implemented a check to prevent the use of FINALIZE epilogue fusion type with the NO_SMEM epilogue schedule, as this combination is not supported.
Autotuner Support: These checks are designed to help the autotuner catch invalid tactics during profiling, improving the robustness and efficiency of the MoE GEMM operations.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_template_dispatch.h
- Added a TLLM_CHECK_WITH_INFO statement in moeGemmBiasAct to ensure output N is 256-bit aligned when NO_SMEM epilogue schedule is used without gated activation.
- Added TLLM_CHECK_WITH_INFO statements in moeGemm to verify that NO_SMEM epilogue schedule is not used with FINALIZE fusion type and that output N is 256-bit aligned.

Activity

No activity has been recorded for this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-02-04T21:39:30Z

📝 Walkthrough

Walkthrough

Added runtime validation in MOE GEMM dispatch to enforce NO_SMEM epilogue constraints: require output N alignment based on OutputType bit-width and disallow FINALIZE epilogue fusion when NO_SMEM is selected; checks run before dispatch and do not change public APIs.

Changes

Cohort / File(s)	Summary
MOE GEMM Validation `csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_template_dispatch.h`	Added runtime checks in `MoeGemmRunner::dispatchToArch` (and callers) to: (1) if `epilogue_schedule == NO_SMEM`, require N % (256 / bits(OutputType)) == 0; (2) if `epilogue_schedule == NO_SMEM`, disallow `epilogue_fusion_type == FINALIZE`. Errors are emitted prior to dispatch; no public API/signature changes.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Suggested reviewers

djmmoss
yzh119
wenscarl

Poem

🐇 I hopped through code with careful paws,
Counting bits and checking clause by clause.
NO_SMEM needs N aligned just right,
FINALIZE forbidden when shared memory's not in sight.
A tiny hop, a safer run — kernels gleam in morning light.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The description only contains the template placeholders without actual implementation details, rationale, or related issue references.	Fill in the Description section explaining what the PR does and why; add any related issue links; confirm checklist items were completed.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly describes the main change: adding validation checks for GEMM configuration in the cutlass MOE kernel dispatch logic.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request adds necessary support checks for GEMM configurations in CUTLASS MoE kernels, specifically for NO_SMEM epilogue schedules. The checks ensure output alignment and prevent unsupported fusions, which is important for correctness and for the autotuner. While the changes are good, there's an opportunity to improve code clarity and reduce duplication. The alignment calculation is repeated, and I've suggested extracting it into a constant. Furthermore, since the alignment check logic is now duplicated in two functions, consider refactoring it into a shared helper function for better maintainability.

gemini-code-assist · 2026-02-04T21:41:21Z

+  if (inputs.gemm_config.epilogue_schedule == cutlass_extensions::EpilogueScheduleType::NO_SMEM &&
+      !isGatedActivation(inputs.activation_type)) {
+    TLLM_CHECK_WITH_INFO(
+        inputs.n % (256 / cutlass::sizeof_bits<OutputType>::value) == 0,
+        "Output N %ld does not meet minimum alignment requirements for NO_SMEM epilogue %d",
+        (long)inputs.n, (int)(256 / cutlass::sizeof_bits<OutputType>::value));
+  }


The calculation for the minimum alignment is performed twice inside this check. To improve readability and avoid this repetition, you can store the result in a const auto variable. This makes the code cleaner and easier to understand.

if (inputs.gemm_config.epilogue_schedule == cutlass_extensions::EpilogueScheduleType::NO_SMEM && !isGatedActivation(inputs.activation_type)) { const auto min_alignment = 256 / cutlass::sizeof_bits<OutputType>::value; TLLM_CHECK_WITH_INFO( inputs.n % min_alignment == 0, "Output N %ld does not meet minimum alignment requirements for NO_SMEM epilogue %d", (long)inputs.n, (int)min_alignment); }

gemini-code-assist · 2026-02-04T21:41:21Z

+  if (inputs.gemm_config.epilogue_schedule == cutlass_extensions::EpilogueScheduleType::NO_SMEM) {
+    TLLM_CHECK_WITH_INFO(inputs.gemm_config.epilogue_fusion_type !=
+                             cutlass_extensions::CutlassGemmConfig::EpilogueFusionType::FINALIZE,
+                         "NO_SMEM epilogue schedule is not supported with FINALIZE fusion");
+    TLLM_CHECK_WITH_INFO(
+        inputs.n % (256 / cutlass::sizeof_bits<OutputType>::value) == 0,
+        "Output N %ld does not meet minimum alignment requirements for NO_SMEM epilogue %d",
+        (long)inputs.n, (int)(256 / cutlass::sizeof_bits<OutputType>::value));
+  }


Similar to the check in moeGemmBiasAct, the alignment calculation is repeated here. Extracting it into a const auto variable will improve readability. Since this alignment check logic is now present in two places, you might also consider creating a private helper function to encapsulate this check and avoid code duplication.

if (inputs.gemm_config.epilogue_schedule == cutlass_extensions::EpilogueScheduleType::NO_SMEM) { TLLM_CHECK_WITH_INFO(inputs.gemm_config.epilogue_fusion_type != cutlass_extensions::CutlassGemmConfig::EpilogueFusionType::FINALIZE, "NO_SMEM epilogue schedule is not supported with FINALIZE fusion"); const auto min_alignment = 256 / cutlass::sizeof_bits<OutputType>::value; TLLM_CHECK_WITH_INFO( inputs.n % min_alignment == 0, "Output N %ld does not meet minimum alignment requirements for NO_SMEM epilogue %d", (long)inputs.n, (int)min_alignment); }

djns99 · 2026-02-04T22:01:47Z

+  // For NoSmem epilogue schedule, output N must be 256-bit aligned.
+  // For gated activation, this is automatic if the usual alignment requirement is met.
+  // This check is here so the autotuner can catch invalid tactics during profiling.
+  if (inputs.gemm_config.epilogue_schedule == cutlass_extensions::EpilogueScheduleType::NO_SMEM &&


If we put this in runGemm/dispatchToArch we don't need to have two copies of this check.

Maybe here since this is only relevant for SM90+

dispatchToArch doesn't work as we could not know if the activation is gated or not in the function.
runGemm works, but to align with the logic in the moe runner code, I think it's better to split the gemm1 and gemm2 logic separately to make it more clear and align with the original logic

We dont need to check isGatedActivation here

size_t const fc1_out_size = ((!use_ampere_activation_fusion) && is_gated_activation) ? inter_size * 2 : inter_size;

This line sets the value of N correctly. The original check is only working with inter_size so needs to explicitly check we are in the non-gated case

Thanks. Updated to dispatchToArch

aleozlx · 2026-02-04T23:36:43Z

/bot run

flashinfer-bot · 2026-02-04T23:37:41Z

GitLab MR !298 has been created, and the CI pipeline #43308250 is currently running. I'll report back once the pipeline job completes.

flashinfer-bot · 2026-02-05T00:14:44Z

[CANCELING] Pipeline #43308250: canceled

This reverts commit 6ccb7f3.

nv-yunzheq · 2026-02-05T00:22:18Z

/bot run

flashinfer-bot · 2026-02-05T00:23:06Z

GitLab MR !298 has been updated with latest changes, and the CI pipeline #43311572 is currently running. I'll report back once the pipeline job completes.

flashinfer-bot · 2026-02-05T00:42:49Z

[CANCELING] Pipeline #43311572: canceled

nv-yunzheq · 2026-02-05T00:42:54Z

/bot run

flashinfer-bot · 2026-02-05T00:43:57Z

GitLab MR !298 has been updated with latest changes, and the CI pipeline #43312738 is currently running. I'll report back once the pipeline job completes.

flashinfer-bot · 2026-02-05T09:55:52Z

[FAILED] Pipeline #43312738: 10/20 passed

aleozlx

tests clean

approved again

add support check for gemm config

5af2f49

nv-yunzheq requested review from djmmoss, wenscarl, yongwww and yzh119 as code owners February 4, 2026 21:39

gemini-code-assist Bot reviewed Feb 4, 2026

View reviewed changes

nv-yunzheq changed the title ~~add support check for gemm config for cutlass moe~~ fix: add support check for gemm config for cutlass moe Feb 4, 2026

djns99 reviewed Feb 4, 2026

View reviewed changes

djns99 approved these changes Feb 4, 2026

View reviewed changes

sricketts added the v0.6.3 label Feb 4, 2026

aleozlx approved these changes Feb 4, 2026

View reviewed changes

move to runGemm to reduce duplication

6ccb7f3

Revert "move to runGemm to reduce duplication"

0694be4

This reverts commit 6ccb7f3.

nv-yunzheq added 2 commits February 4, 2026 16:40

put logic to dispatch function

f2697d3

Merge branch 'main' into fix_moe_dispatch

30e44d5

aleozlx approved these changes Feb 5, 2026

View reviewed changes

yzh119 approved these changes Feb 5, 2026

View reviewed changes

yzh119 merged commit 8655234 into flashinfer-ai:main Feb 5, 2026
31 of 36 checks passed

coderabbitai Bot mentioned this pull request Feb 24, 2026

Implement cutlass_fused_moe mxfp8 #2581

Merged

5 tasks

nv-yunzheq deleted the fix_moe_dispatch branch March 2, 2026 23:38

coderabbitai Bot mentioned this pull request Mar 26, 2026

Add Relu2 activation support in CUTLASS MoE backend and fix autotuner async CUDA error handling #2897

Closed

coderabbitai Bot mentioned this pull request Apr 5, 2026

fix: restore SM120 CUTLASS MoE tile candidate removed by #2927 (test_trtllm_cutlass_fused_moe.py) #2984

Merged

1 task

Conversation

nv-yunzheq commented Feb 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Summary by CodeRabbit

Uh oh!

gemini-code-assist Bot commented Feb 4, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai Bot commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

djns99 Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nv-yunzheq Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

djns99 Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nv-yunzheq Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

aleozlx commented Feb 4, 2026

Uh oh!

flashinfer-bot commented Feb 4, 2026

Uh oh!

flashinfer-bot commented Feb 5, 2026

Uh oh!

nv-yunzheq commented Feb 5, 2026

Uh oh!

flashinfer-bot commented Feb 5, 2026

Uh oh!

flashinfer-bot commented Feb 5, 2026

Uh oh!

nv-yunzheq commented Feb 5, 2026

Uh oh!

flashinfer-bot commented Feb 5, 2026

Uh oh!

flashinfer-bot commented Feb 5, 2026

Uh oh!

aleozlx left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

nv-yunzheq commented Feb 4, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Feb 4, 2026 •

edited

Loading

djns99 Feb 4, 2026 •

edited

Loading

nv-yunzheq Feb 5, 2026 •

edited

Loading

djns99 Feb 5, 2026 •

edited

Loading