Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CIR][HIP] Use CUDA attributes for HIP global functions #1333

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

koparasy
Copy link
Contributor

No description provided.

@koparasy koparasy changed the title Use CUDA attributes for global functions Use CUDA attributes for HIP global functions Feb 11, 2025
@koparasy koparasy changed the title Use CUDA attributes for HIP global functions [CIR][HIP] Use CUDA attributes for HIP global functions Feb 11, 2025
@AdUhTkJm
Copy link
Contributor

I'm not sure about HIP, but see this commit in upstream OG. It does some changes specific to HIP there.
Quoting from the link,

So, to summarize how the patch changes the under-the-hood kernel launch machinery:
device-side is unchanged. Kernel function is generated with the real kernel name
host-side stub is still generated with the __device_stub prefix.
host-side generates a 'handle' variable with the kernel function name, which is a pointer to the stub.
host-side registers the handle variable -> device-side kernel name association with the HIP runtime.
the address of the handle variable is used everywhere where we need a kernel pointer on the host side. I.e. passing kernel pointers around, referring to kernels across TUs, etc.
<<<>>> becomes an indirect call to a __device_stub function using the pointer retrieved from the handle.

So you might need to generate a 'handle' variable. It's different from CUDA since for CUDA the handle is just the device stub. Whether you attach the attribute to handle or the device stub depends on how HIP works - I don't quite know about it.

The attribute is used in CUDA to register the correspondence between host and device; the same kernel is mangled differently in host and device, so we need some runtime registration to map host names to device names. This registration function is going to be emitted in LLVM lowering (not written yet).

@koparasy
Copy link
Contributor Author

I was planning on doing this redirection when I actually generate the stub function (the respective #1332) .

@AdUhTkJm
Copy link
Contributor

I was planning on doing this redirection when I actually generate the stub function (the respective #1332) .

That makes sense.
Now both CUDA and HIP places the attribute on the real device stub, so hopefully we can continue to reuse lots of code.

@koparasy
Copy link
Contributor Author

@bcardosolopes what do you think? Should I introduce a new attribute or re-use the cuda one and handle it during the generation of the device stub?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants