-
Notifications
You must be signed in to change notification settings - Fork 85
Description
Hey!
Great project, I've been using it very successfully over in https://github.com/acts-project/traccc/.
One gap we've noticed in comparison to regular CUDA/HIP etc, is that there doesn't seem to be a way to pass over the CUDA __launch_bounds__ hints to the compiler, which we've found can have a pretty large impact on some of our code (i.e. ~20% performance uplift for a few line change - acts-project/traccc#1037, though does include another change which makes it a bit harder to split out).
I've taken a stab at adding an equivalent (https://github.com/CrossR/alpaka/tree/CrossR/LaunchBoundsTest), but figured before spending any extra time on it, I'd check in here. I don't know if its something someone else is already doing, or if my test approach is entirely idiotic (or I'm just missing something and it already exists!).
Even in my brief foray, I already hit a few snags:
- CUDA supports three options: https://docs.nvidia.com/cuda/cuda-c-programming-guide/#launch-bounds.
maxThreadsPerBlock, minBlocksPerMultiprocessor, maxBlocksPerCluster, so arguably you could pass over either one, two or all of them (though you seemingly couldn't pass over just one and three say). My test supports nothing,maxThreadsPerBlock, andmaxThreadsPerBlock, minBlocksPerMultiprocessor, as it gets more complicated after that (HIP doesn't supportmaxBlocksPerClusterat all, and CUDA only supports it on certain targets). - HIP - The code I've got so far works, but the second param
minBlocksPerMultiprocessorisn't the same as HIP, it isminWarpsPerExecutionUnit. Arguable if we need to support all the CUDA + HIP values, as they are "just names" (though I guess it could be moved out of the shared code and into the specific code to help there?). - SYCL....is confusing. There is references to similar variables but I couldn't honestly tell when they apply (SYCL as a whole? Just cross compilation to CUDA? Just FPGAs? The docs are a maze...). I left it for that reason, and because our SYCL testing is still in the "ensuring things work" etc step, so it isn't as useful for us yet. I'm also less sure how it all interacts with the existing
reqd_work_groupcode.
I can certainly open a PR for easier code discussion if my approach isn't completely insane, otherwise, happy to test if someone else is working on it.