Skip to content

Using Required Sub Group Size for Intel #611

@EJainDev

Description

@EJainDev

Intel provides the cl_intel_reqd_sub_group_size extension for specifying the sub group size to be used with a kernel. Specifying this value could improve the performance of GEMM instead of just assuming the sub group size to always be 8. In my experience, 16 is the most commonly picked sub group size on Intel because cache widths are usually 64 bytes and then 1 float per thread when aligned leads to 1 cache fetch.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions