Skip to content

__launch_bounds__ Alpaka Equivalent #2548

@CrossR

Description

@CrossR

Hey!

Great project, I've been using it very successfully over in https://github.com/acts-project/traccc/.

One gap we've noticed in comparison to regular CUDA/HIP etc, is that there doesn't seem to be a way to pass over the CUDA __launch_bounds__ hints to the compiler, which we've found can have a pretty large impact on some of our code (i.e. ~20% performance uplift for a few line change - acts-project/traccc#1037, though does include another change which makes it a bit harder to split out).

I've taken a stab at adding an equivalent (https://github.com/CrossR/alpaka/tree/CrossR/LaunchBoundsTest), but figured before spending any extra time on it, I'd check in here. I don't know if its something someone else is already doing, or if my test approach is entirely idiotic (or I'm just missing something and it already exists!).

Even in my brief foray, I already hit a few snags:

  • CUDA supports three options: https://docs.nvidia.com/cuda/cuda-c-programming-guide/#launch-bounds. maxThreadsPerBlock, minBlocksPerMultiprocessor, maxBlocksPerCluster, so arguably you could pass over either one, two or all of them (though you seemingly couldn't pass over just one and three say). My test supports nothing, maxThreadsPerBlock, and maxThreadsPerBlock, minBlocksPerMultiprocessor, as it gets more complicated after that (HIP doesn't support maxBlocksPerCluster at all, and CUDA only supports it on certain targets).
  • HIP - The code I've got so far works, but the second param minBlocksPerMultiprocessor isn't the same as HIP, it is minWarpsPerExecutionUnit. Arguable if we need to support all the CUDA + HIP values, as they are "just names" (though I guess it could be moved out of the shared code and into the specific code to help there?).
  • SYCL....is confusing. There is references to similar variables but I couldn't honestly tell when they apply (SYCL as a whole? Just cross compilation to CUDA? Just FPGAs? The docs are a maze...). I left it for that reason, and because our SYCL testing is still in the "ensuring things work" etc step, so it isn't as useful for us yet. I'm also less sure how it all interacts with the existing reqd_work_group code.

I can certainly open a PR for easier code discussion if my approach isn't completely insane, otherwise, happy to test if someone else is working on it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions