[GPUHeuristics] Remove MNT boost for VeryLargeGemm on CDNA4#23876
Merged
Yu-Zhewen merged 1 commit intoiree-org:mainfrom Mar 21, 2026
Merged
[GPUHeuristics] Remove MNT boost for VeryLargeGemm on CDNA4#23876Yu-Zhewen merged 1 commit intoiree-org:mainfrom
Yu-Zhewen merged 1 commit intoiree-org:mainfrom
Conversation
Signed-off-by: Yu-Zhewen <zhewenyu@amd.com>
lialan
approved these changes
Mar 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #23831.
#23652 added
boostMNTileCountPerSubgroup=32for CDNA4 LargeGemm butapplied the same boost to VeryLargeGemm. That PR only benchmarked
LargeGemm shapes and didn't cover VeryLargeGemm.
For LargeGemm the heuristic selects the
MFMA_F32_16x16x32intrinsicwhere MNT=32 fits within register limits. However, for VeryLargeGemm
shapes (e.g. 16384x16384x16384), the heuristic prefers the larger
MFMA_F32_32x32x16intrinsic, and the boosted MNT=32 results in VGPRspilling, causing a ~10x regression on mi355x:
This patch removes
boostMNTileCountPerSubgroupandminUtilizationThresholdfrom VeryLargeGemm CDNA4 seeds, revertingto default. LargeGemm seeds are unchanged.