-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[Bug Fix]Bypass launch grids for SM120 Kernel with SM90 Mainloop & SM100 TileScheduler #2865
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug Fix]Bypass launch grids for SM120 Kernel with SM90 Mainloop & SM100 TileScheduler #2865
Conversation
|
@Junkai-Wu I believe the current solution is only temporary, and in the future we should add |
include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp
Outdated
Show resolved
Hide resolved
include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp
Outdated
Show resolved
Hide resolved
Agreed. @HydraQYH Do you think if you can do the corresponding changes in this PR? I think just an empty const function with false return should be OK. |
@Junkai-Wu Thank you for your suggestion, I will do it as soon as possible. |
42d17b3 to
4ece099
Compare
|
@Junkai-Wu I rebase code and just add There are no compilation errors anymore. And this PR is ready for review. |
|
@HydraQYH The changes look much cleaner now. Thanks for the quick action. Leave minor comments above. |
|
@HydraQYH I ran the internal pipeline with these changes and got a timeout issue of this unit test: https://github.com/NVIDIA/cutlass/blob/main/test/unit/gemm/device/sm120_tensorop_gemm/CMakeLists.txt#L51 The issue disappeared when I added the macro back on the |
|
@Junkai-Wu Sorry, I don't have an SM120 device, so all I can do is compile. However, I noticed that removing this macro in Pingpong causes the program to return immediately sometime: cutlass/include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp Lines 803 to 814 in d4e16f5
I think this radical approach may be risky, and may result in some semaphores not being released yet. In contrast, other changes will not affect the program's execution. I've reverted the changes to the early stop part. Could you try this? |
|
@HydraQYH After investigation, the timeout issue is caused by sm120 kernel calling sm90 scheduler where sm120 kernel didn't need to call |
This reverts commit 246cb42.
Refine name again.
67a08ba to
ae1f832
Compare
|
@Junkai-Wu Okay. I rebaseed the code and reverted to previously implementation. I changed |
|
@HydraQYH The internal pipeline still fails. After checking, I found all sm120 kernels should not call the |
include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_cooperative.hpp
Outdated
Show resolved
Hide resolved
include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_cooperative.hpp
Outdated
Show resolved
Hide resolved
include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_cooperative.hpp
Outdated
Show resolved
Hide resolved
include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_cooperative.hpp
Outdated
Show resolved
Hide resolved
include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_cooperative.hpp
Outdated
Show resolved
Hide resolved
include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp
Outdated
Show resolved
Hide resolved
| else if (warp_group_role == WarpGroupRole::Consumer0 || warp_group_role == WarpGroupRole::Consumer1) { | ||
| cutlass::arch::warpgroup_reg_alloc<MmaRegisterRequirement>(); | ||
| #ifdef CUTLASS_ENABLE_GDC_FOR_SM90 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add if constexpr (!IsSm120Family) condition here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp
Outdated
Show resolved
Hide resolved
|
@HydraQYH The internal pipeline passes. Leave suggested implementations in the comments. |
Skip `is_last_tile` for all sm120 kernels. Co-authored-by: Junkai-Wu <[email protected]>
|
@Junkai-Wu Thank you very much for your help. I have picked all your suggestions. |
Follow: #2719. Some SM120 kernels will use SM90 Cooperative/Pingpong Mainloop with SM100 TileScheduler. SM100 TileScheduler do not have
is_last_tilefunction. This can cause compilation errors when we remove unnecessary conditional compilation.In this PR, If SM120 blockscaled is detected at compile time, then bypass
launch_dependent_grids.