Open
Conversation
Wires moe_grouped_gemm (CUTLASS grouped GEMM for MoE experts) through the MegatronConfig TypedDict and _apply_moe_config(). Enables it in every root MoE performance recipe (Qwen3-30B-A3B 4n4g/4n8g/4n8g-40K, Qwen3-235B 16n8g, DeepSeek-V3 32n8g, DAPO DeepSeek-V3 64n8g); child recipes inherit. Signed-off-by: sna <sna@nvidia.com>
f7bdb19 to
1713776
Compare
Contributor
Author
|
/ok to test 1713776 |
terrykong
reviewed
Apr 19, 2026
Collaborator
terrykong
left a comment
There was a problem hiding this comment.
Summary
This PR adds moe_grouped_gemm support through the MegatronConfig TypedDict and _apply_moe_config(), enabling it in 6 root MoE performance recipes. The concept is sound and the feature is valuable, but the implementation has a critical issue with the config application pattern that could silently regress throughput for ~14 other MoE recipes.
Ship blocker
Silent performance regression: The .get("moe_grouped_gemm", False) pattern unconditionally overwrites Bridge's default of True with False for any config that omits the field. See inline comment for details and suggested fix.
Suggestions
- Performance evidence: Could you share before/after throughput data (tokens/sec or step time) on at least one representative recipe? A short convergence sanity check would also be helpful.
- PR description: The "What does this PR do?", "Issues", and "Usage" sections still have template placeholders — consider filling them in.
Generated by Claude Code
Co-authored-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Seonjin <sna@nvidia.com>
Co-authored-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Seonjin <sna@nvidia.com>
Contributor
Author
|
/ok to test 2520299 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Support moe_grouped_gemm (grouped GEMM for MoE experts) through the MegatronConfig TypedDict and _apply_moe_config(). Enables it in every root MoE performance recipes (Qwen3-30B-A3B 4n4g/4n8g/4n8g-40K, Qwen3-235B 16n8g, DeepSeek-V3 32n8g, DAPO DeepSeek-V3 64n8g); child recipes inherit.
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Issues
List issues that this PR closes (syntax):
Usage
# Add a code snippet demonstrating how to use thisBefore your PR is "Ready for review"
Pre checks:
Additional Information