-
-
Notifications
You must be signed in to change notification settings - Fork 209
Open
Labels
Description
Larger work group sizes always result in better performance unless a larger work group size leads to less compute units being used. Why not calculate the work group size as the following:
WGS_Total = max(min(Num_Threads / Compute_Units, Max_WGS), Sub_Group_Size)
This reduces the number of parameters to explore while maintaining maximum performance, which improves tuning time.
For GEMM specifically, the ratio of WGS_M to WGS_N should be as close to the ratio of M to N as possible.