-
Notifications
You must be signed in to change notification settings - Fork 13.4k
gpu offload host code generation #142097
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
gpu offload host code generation #142097
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@oli-obk Featurewise, I am almost done. I'll add a few more lines to describe the layout of Rust types to the offload library, but in this PR I only intend to support one type or two (maybe array's, raw pointer, or slices). I might even hardcode the length in the very first approach. In a follow-up PR I'll do some proper type parsing on a higher level, similar to what I did in the past with Rust TypeTrees. This work is much simpler and more reliable though, since offload doesn't care what type something has, just how many bytes it is large, and hence need to be moved to/from the GPU. I was able to just move a few of the builder methods I needed to the generic builder. |
r? ghost
This will generate most of the host side code to use llvm's offload feature.
The first PR will only handle automatic mem-transfers to and from the device.
So if a user calls a kernel, we will copy inputs back and forth, but we won't do the actual kernel launch.
Before merging, we will use LLVM's Info infrastructure to verify that the memcopies match what openmp offloa generates in C++.
LIBOMPTARGET_INFO=-1 ./my_rust_binary
should print that a memcpy to and later from the device is happening.A follow-up PR will generate the actual device-side kernel which will then do computations on the GPU.
A third PR will implement manual host2device and device2host functionality, but the goal is to minimize cases where a user has to overwrite our default handling due to performance issues.
I'm trying to get a full MVP out first, so this just recognizes GPU functions based on magic names. The final frontend will obviously move this over to use proper macros, like I'm already doing it for the autodiff work.
This work will also be compatible with std::autodiff, so one can differentiate GPU kernels.
Tracking: