[CIR][CUDA] Generate device stubs #1332

AdUhTkJm · 2025-02-10T22:51:30Z

Now we're able to generate device stubs.

A simple explanation:

We first store function arguments inside a void* args[], which shall be passed into cudaLaunchKernel.

Then we retrieve configuration using __cudaPopCallConfiguration, popping the config pushed by callsite. (We can't generate calls to kernels currently.)

Now we have enough arguments. Invoke cudaLaunchKernel and we're OK.

clang/lib/CIR/CodeGen/CIRGenFunction.h

clang/lib/CIR/CodeGen/CIRGenFunction.cpp

bcardosolopes · 2025-02-11T14:14:29Z

clang/lib/CIR/CodeGen/CIRGenCUDA.cpp

+  // Now emit the call to cudaLaunchKernel
+  // cudaError_t cudaLaunchKernel(const void *func, dim3 gridDim, dim3 blockDim,
+  //                              void **args, size_t sharedMem,
+  //                              cudaStream_t stream);


Seems like we could have a ... = cir.cuda.setup_device_stub <name>, args, dim3_ty = <some_type>, stream_ty = <some_type2> ... that will hide both __cudaPopCallConfiguration and cudaLaunchKernel calls. This will then be expanded in LoweringPrepare to these calls (so we don't have to postpone this to LLVMLowering).

However, I'd rather see you adding this as-is first (after the other comment about OG is addressed) and in a follow up PR we can raise the representation and move it to LoweringPrepare.

Similar things also happen at call site. Shall we also generate a cir.cuda.call_kernel for that and expand in LoweringPrepare?

which callsite do you mean? not the call happening in the device stub? if not in the device host, does it not call __cudaPopCallConfiguration to retrieve the dims? Perhaps we should have a bit more of direct CIRGen to have a better grasp of uses of these internal functions before we raise them.

I mean the place where we invoke the kernel, for example in main we can write global_fn<<<1, 1>>>(a, b, c). This is the place where we call __cudaPushCallConfiguration for device stub to pop. I guess I'll directly generate them and adjust according to review.

clang/test/CIR/CodeGen/CUDA/simple.cu

github-actions · 2025-02-11T15:36:44Z

✅ With the latest revision this PR passed the C/C++ code formatter.

clang/lib/CIR/CodeGen/CIRGenFunction.cpp

clang/lib/CIR/CodeGen/CIRGenCUDARuntime.h

This PR deals with several issues currently present in CUDA CodeGen. Each of them requires only a few lines to fix, so they're combined in a single PR. **Bug 1.** Suppose we write ```cpp __global__ void kernel(int a, int b); ``` Then when we call this kernel with `cudaLaunchKernel`, the 4th argument to that function is something of the form `void *kernel_args[2] = {&a, &b}`. OG allocates the space of it with `alloca ptr, i32 2`, but that doesn't seem to be feasible in CIR, so we allocated `alloca [2 x ptr], i32 1`. This means there must be an extra GEP as compared to OG. In CIR, it means we must add an `array_to_ptrdecay` cast before trying to accessing the array elements. I missed that out in #1332 . **Bug 2.** We missed a load instruction for 6th argument to `cudaLaunchKernel`. It's added back in this PR. **Bug 3.** When we launch a kernel, we first retrieve the return value of `__cudaPopCallConfiguration`. If it's zero, then the call succeeds and we should proceed to call the device stub. In #1348 we did exactly the opposite, calling the device stub only if it's not zero. It's fixed here. **Issue 4.** CallConvLowering is required to make `cudaLaunchKernel` correct. The codepath is unblocked by adding a `getIndirectResult` at the same place as OG does -- the function is already implemented so we can just call it. After this (and other pending PRs), CIR is now able to compile real CUDA programs. There are still missing features, which will be followed up later.

Now we're able to generate device stubs. A simple explanation: We first store function arguments inside a `void* args[]`, which shall be passed into `cudaLaunchKernel`. Then we retrieve configuration using `__cudaPopCallConfiguration`, popping the config pushed by callsite. (We can't generate calls to kernels currently.) Now we have enough arguments. Invoke `cudaLaunchKernel` and we're OK.

This PR deals with several issues currently present in CUDA CodeGen. Each of them requires only a few lines to fix, so they're combined in a single PR. **Bug 1.** Suppose we write ```cpp __global__ void kernel(int a, int b); ``` Then when we call this kernel with `cudaLaunchKernel`, the 4th argument to that function is something of the form `void *kernel_args[2] = {&a, &b}`. OG allocates the space of it with `alloca ptr, i32 2`, but that doesn't seem to be feasible in CIR, so we allocated `alloca [2 x ptr], i32 1`. This means there must be an extra GEP as compared to OG. In CIR, it means we must add an `array_to_ptrdecay` cast before trying to accessing the array elements. I missed that out in #1332 . **Bug 2.** We missed a load instruction for 6th argument to `cudaLaunchKernel`. It's added back in this PR. **Bug 3.** When we launch a kernel, we first retrieve the return value of `__cudaPopCallConfiguration`. If it's zero, then the call succeeds and we should proceed to call the device stub. In #1348 we did exactly the opposite, calling the device stub only if it's not zero. It's fixed here. **Issue 4.** CallConvLowering is required to make `cudaLaunchKernel` correct. The codepath is unblocked by adding a `getIndirectResult` at the same place as OG does -- the function is already implemented so we can just call it. After this (and other pending PRs), CIR is now able to compile real CUDA programs. There are still missing features, which will be followed up later.

AdUhTkJm requested review from lanza and bcardosolopes as code owners February 10, 2025 22:51

Lancern reviewed Feb 11, 2025

View reviewed changes

clang/lib/CIR/CodeGen/CIRGenFunction.h Outdated Show resolved Hide resolved

clang/lib/CIR/CodeGen/CIRGenFunction.cpp Outdated Show resolved Hide resolved

AdUhTkJm force-pushed the main branch from 18a86c8 to 3ad22db Compare February 11, 2025 10:30

bcardosolopes reviewed Feb 11, 2025

View reviewed changes

AdUhTkJm force-pushed the main branch from 3ad22db to 45ac993 Compare February 11, 2025 15:33

AdUhTkJm force-pushed the main branch 2 times, most recently from 0832402 to fe4f3c5 Compare February 11, 2025 15:48

koparasy mentioned this pull request Feb 11, 2025

[CIR][HIP] Use CUDA attributes for HIP global functions #1333

Merged

bcardosolopes reviewed Feb 12, 2025

View reviewed changes

clang/lib/CIR/CodeGen/CIRGenFunction.cpp Outdated Show resolved Hide resolved

bcardosolopes reviewed Feb 12, 2025

View reviewed changes

clang/lib/CIR/CodeGen/CIRGenCUDARuntime.h Outdated Show resolved Hide resolved

[CIR][CUDA] Generate device stubs

324e4c2

AdUhTkJm force-pushed the main branch from fe4f3c5 to 324e4c2 Compare February 12, 2025 11:56

bcardosolopes approved these changes Feb 12, 2025

View reviewed changes

bcardosolopes merged commit e342308 into llvm:main Feb 12, 2025
6 checks passed

AdUhTkJm mentioned this pull request Mar 9, 2025

[CIR][CUDA] Miscellanous bugfixes #1462

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CIR][CUDA] Generate device stubs #1332

[CIR][CUDA] Generate device stubs #1332

Uh oh!

AdUhTkJm commented Feb 10, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bcardosolopes Feb 11, 2025

Uh oh!

AdUhTkJm Feb 11, 2025

Uh oh!

bcardosolopes Feb 12, 2025

Uh oh!

AdUhTkJm Feb 12, 2025

Uh oh!

Uh oh!

github-actions bot commented Feb 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[CIR][CUDA] Generate device stubs #1332

[CIR][CUDA] Generate device stubs #1332

Uh oh!

Conversation

AdUhTkJm commented Feb 10, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bcardosolopes Feb 11, 2025

Choose a reason for hiding this comment

Uh oh!

AdUhTkJm Feb 11, 2025

Choose a reason for hiding this comment

Uh oh!

bcardosolopes Feb 12, 2025

Choose a reason for hiding this comment

Uh oh!

AdUhTkJm Feb 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Feb 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Feb 11, 2025 •

edited

Loading