Skip to content

Improving Tuning Algorithm #600

@EJainDev

Description

@EJainDev

The tuning algorithm tries a lot of different vectorization types for kernels like GEMM. To improve this, I think an interpolated vector width could be used instead.

Explanation:
OpenCL offers 4-6 queries for getting native or preferred vector widths. If a device returns a native vector width of 16 for char, 8 for short, 4 for int, 2 for long, 8 for half (if supported), 1 for float, 1 for double, the algorithm could assume that the best vector width for float is 4. Since 16 bytes are ideal for most of the native vector widths. A similar approach could be tried to preferred vector widths but if either native or preferred vector widths return only 1s, random vector widths could be tried. This could also be controlled by a parameter for whether to only use calculated vector width.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions