-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Fix top LLVM: renamed NVPTX barrier intrinsics. #8631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'm surprised this worked, because I thought module->getFunction will get declarations, regardless of whether or not the llvm backend will eventually provide definitions for them. Is it possible the new name has worked for a while? |
Shouldn't this be guarded by preprocessor macros, anyway? |
Well, the original code on main fails with that function returning |
Under LLVM 19, the invalid PTX is indeed quite invalid... { // callseq 2, 0
.param .b32 param0;
st.param.b32 [param0], 0;
call.uni
llvm.nvvm.barrier.cta.sync.aligned.all,
(
param0
);
} // callseq 2 Seems like it does return the declaration... |
@steven-johnson @abadams @alexreinking Please see the revised solution. Much simpler and very reliable. |
That's an elegant solution. Could you put the new name in the comment in ptx_dev.ll so that whoever does the later upgrade doesn't have to figure it out? |
TODO: Diff the PTX outputs before and after this PR in LLVM 19. CUDA tests seem to hang. |
Result: there are no barriers in Correctness tests are run in parallel, I believe, so perhaps there is an issue with tests involving barriers, running concurrently? Update: Yes, |
Okay, I figured out how to correctly check if the intrinsic exists. All tests passing. Will merge. |
Problem: LLVM changed the name of their NPTX barrier intrinsics: llvm/llvm-project#141143.
Solution: Rely on the accompanying auto-upgrade mechanism of LLVM for these intrinsics. As such, I added an
alwaysinline
wrapper that calls the original intrinsic name, which LLVM will upgrade to the new one, if it wants to. This way, it's compatible with all versions of LLVM without relying on any runtime or compiletime checks in the Halide code.