Skip to content

Origami solution selection #2126

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jun 3, 2025

Conversation

AlexBrownAMD
Copy link
Contributor

Add a new solution selection mechanism using the Origami library to select the best performing kernel.

This PR does not include any Origami library logic files. Sample libraries are available, and initial libraries will be checked in later in a separate PR.

aazz44ss
aazz44ss previously approved these changes May 28, 2025
@hcman2
Copy link
Contributor

hcman2 commented May 28, 2025

We have both select_best_macro_tile_size and select_best_grid_size. Will we use predict MT and Gridbased combination in the future?

@ryanswann-amd
Copy link
Contributor

ryanswann-amd commented May 28, 2025

We have both select_best_macro_tile_size and select_best_grid_size. Will we use predict MT and Gridbased combination in the future?

select_best_macro_tile_size : Determines the Macro Tile dimensions and DepthU (Internally called MT_M,MT_N,MT_K)

select_best_grid_size : Is an example of a dynamic mode for streamk, this is used to predict the number of workgroups it's best to launch for streamk, which then determines StreamK's splitting.

@pghysels
Copy link
Contributor

There are two slightly conflicting declarations of compute_reuse_in_block_gemm, but they are both not used? Looks like nothing in Reuse.hpp/.cpp is currently used?

@ryanswann-amd
Copy link
Contributor

There are two slightly conflicting declarations of compute_reuse_in_block_gemm, but they are both not used? Looks like nothing in Reuse.hpp/.cpp is currently used?

This is legacy code that was determined to be too slow. Will Remove

@yenong-amd yenong-amd force-pushed the StreamKPredictionPR branch from 0bb4277 to 261685f Compare May 30, 2025 05:29
Copy link
Contributor

@bstefanuk bstefanuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, a few minor comments with regards to the TensileCreateLibrary code.

bethune-bryant
bethune-bryant previously approved these changes May 31, 2025
neoblizz
neoblizz previously approved these changes May 31, 2025
@pghysels
Copy link
Contributor

pghysels commented Jun 2, 2025

Perhaps you should add a description for TENSILE_STREAMK_DYNAMIC_GRID=4,5 in ValidParameters.py and for TENSILE_STREAMK_DYNAMIC_WGM in ValidParameters.py and in how-to-use-streamk.rst?

Is TENSILE_STREAMK_DYNAMIC_GRID=3 still a good default for Origami?

@ryanswann-amd
Copy link
Contributor

Perhaps you should add a description for TENSILE_STREAMK_DYNAMIC_GRID=4,5 in ValidParameters.py and for TENSILE_STREAMK_DYNAMIC_WGM in ValidParameters.py and in how-to-use-streamk.rst?

Is TENSILE_STREAMK_DYNAMIC_GRID=3 still a good default for Origami?

I think that we wanted to move from TENSILE_STREAMK_DYNAMIC_GRID=3 as the default to either 4 or 5. Thoughts @AlexBrownAMD ? I can also add Descriptions where you mentioned.

@neoblizz
Copy link
Member

neoblizz commented Jun 3, 2025

Perhaps you should add a description for TENSILE_STREAMK_DYNAMIC_GRID=4,5 in ValidParameters.py and for TENSILE_STREAMK_DYNAMIC_WGM in ValidParameters.py and in how-to-use-streamk.rst?
Is TENSILE_STREAMK_DYNAMIC_GRID=3 still a good default for Origami?

I think that we wanted to move from TENSILE_STREAMK_DYNAMIC_GRID=3 as the default to either 4 or 5. Thoughts @AlexBrownAMD ? I can also add Descriptions where you mentioned.

I think we should start using 5 for now, there's a future PR that should consolidate all of these.

@ryanswann-amd ryanswann-amd dismissed stale reviews from neoblizz and bethune-bryant via b897c6e June 3, 2025 04:12
@ryanswann-amd
Copy link
Contributor

Perhaps you should add a description for TENSILE_STREAMK_DYNAMIC_GRID=4,5 in ValidParameters.py and for TENSILE_STREAMK_DYNAMIC_WGM in ValidParameters.py and in how-to-use-streamk.rst?
Is TENSILE_STREAMK_DYNAMIC_GRID=3 still a good default for Origami?

I think that we wanted to move from TENSILE_STREAMK_DYNAMIC_GRID=3 as the default to either 4 or 5. Thoughts @AlexBrownAMD ? I can also add Descriptions where you mentioned.

I think we should start using 5 for now, there's a future PR that should consolidate all of these.

This will Impact all streamk kernels though right? Should we check other libraries being built with StreamK function well with this before changing it?

@bethune-bryant bethune-bryant merged commit 1c95e39 into ROCm:develop Jun 3, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants