-
-
Notifications
You must be signed in to change notification settings - Fork 209
Open
Labels
Description
Intel provides the cl_intel_reqd_sub_group_size extension for specifying the sub group size to be used with a kernel. Specifying this value could improve the performance of GEMM instead of just assuming the sub group size to always be 8. In my experience, 16 is the most commonly picked sub group size on Intel because cache widths are usually 64 bytes and then 1 float per thread when aligned leads to 1 cache fetch.