Skip to content

Commit ebf8721

Browse files
authored
Optimizing sgemm rd kernels on zen3 (flame#293)
Fixing some inefficiencies on the zen (AVX2) SUP RD kernel for SGEMM. After performing the iteration for the 8 loop, the next loop that was being performed was the 1 loop for the k-direction. This caused a lot of unnecessary iterations when the remainder of k < 8. This has been fixed by introducing masked operations for k < 8 When remainder of k == 1, we handle this with the original non-masked code (with a branch) as the masked code introduces more penalty because of the masking operation. There were also some unnecessary instructions in the zen4 kernels which have been removed. AMD-Internal: https://amd.atlassian.net/browse/CPUPL-7775 Co-authored-by: rohrayan@amd.com
1 parent 50ae5a0 commit ebf8721

6 files changed

Lines changed: 4769 additions & 4323 deletions

0 commit comments

Comments
 (0)