Commit ebf8721

authored

Optimizing sgemm rd kernels on zen3 (flame#293)

Fixing some inefficiencies on the zen (AVX2) SUP RD kernel for SGEMM. After performing the iteration for the 8 loop, the next loop that was being performed was the 1 loop for the k-direction. This caused a lot of unnecessary iterations when the remainder of k < 8. This has been fixed by introducing masked operations for k < 8 When remainder of k == 1, we handle this with the original non-masked code (with a branch) as the masked code introduces more penalty because of the masking operation. There were also some unnecessary instructions in the zen4 kernels which have been removed. AMD-Internal: https://amd.atlassian.net/browse/CPUPL-7775 Co-authored-by: rohrayan@amd.com

1 parent 50ae5a0 commit ebf8721Copy full SHA for ebf8721

6 files changed

kernels
- zen4/3/sup
- zen/3/sup

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit ebf8721

Uh oh!

File tree

0 commit comments