Skip to content

Fix SGEMV on POWER8 by reverting to the non-vectorized earlier code #5125

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 12, 2025

Conversation

martin-frbg
Copy link
Collaborator

reverts part of PR #4880 as already discussed there earlier, fixes #5122

@martin-frbg martin-frbg added this to the 0.3.30 milestone Feb 12, 2025
@martin-frbg martin-frbg merged commit e8b11a1 into OpenMathLib:develop Feb 12, 2025
83 of 86 checks passed
@ChipKerchner
Copy link
Contributor

This was probably an endianness issue but I'm OK with the fix.

@martin-frbg
Copy link
Collaborator Author

We can always improve on this later, but it would seem pointless to leave broken code in place for the next release. Curious how the exact same code works on POWER9&10 ppc64le but not on POWER8 ppc64le - does the POWER8 implement the vecmerge(l/h) differently ?
(As an aside, the 8x8 versions of the sgemv kernel are currently unused - looks like Abdel wrote them "for future use" when only POWER8 was available)

@ChipKerchner
Copy link
Contributor

It requires some investigation of what exactly is the cause. vec_mergeh/l should be implemented the same on P8, P9 or P10.

@martin-frbg
Copy link
Collaborator Author

Compiler options are identical except for the -mcpu and -mtune part. I can even compile for either of the three targets on P10 (OSUOSL's cfarm120 in the GCC Compile Farm) and see only the TARGET=POWER8 build fail (without this PR)

@ChipKerchner
Copy link
Contributor

Looks like the failure is related to inc_x and/or inc_y not being equal to one. It seems P8 isn't handling that correctly for SBGEMV without your latest fix. I know I updated the SBGEMV unit test to test for inc_x/y = 2 last year. I made sure it worked for P9 & P10 when I improved the SBGEMV for those architecture.

@ChipKerchner
Copy link
Contributor

ChipKerchner commented Feb 12, 2025

I think your change will fix that failure for P8 but I see it will probably fail for P7 and earlier. Instead of

#if defined(POWER8)

it should be

#ifndef _ARCH_PWR9

@martin-frbg
Copy link
Collaborator Author

P7 and earlier do not use these kernels, they fall back to the older gemv_(n/t).S
can't check right now but I think SBGEMV was only affected through the SGEMV comparison breaking - there were actual BLAS2 test failure (the "LESS THAN HALF ACCURATE") in SGEMV and related on P8 without this PR

@ChipKerchner
Copy link
Contributor

Ok, after digging into it a bit, it looks like an epsilon difference for P8. Still trying to figure out why but as I said before, your PR is probably the best approach.

@martin-frbg
Copy link
Collaborator Author

Maybe gcc doing something wild when -mcpu=power8 ? But I do not have the energy right now to compare the generated assembly

@xry111
Copy link
Contributor

xry111 commented Apr 18, 2025

FTR https://gcc.gnu.org/PR119234 (not a valid GCC bug report) has root caused the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

test_sbgemm fails with FATAL ERROR SBGEMV - Return code: 16862 on powerpc64le
3 participants