Commit fee4b21
committed
Always provide occupancy API methods with CUfunction for jitify2::KernelData
Jitify2 jitify2::KernelData::function returns a jitify2::CudaFunction, which is either a CUkernel or a CUfunction depending on if JITIFY_USE_CONTEXT_INDEPENDENT_LOADING is being used.
If JITIFY_USE_CONTEXT_INDEPENDENT_LOADING is not defined in advance, for CUDA >= 12 a CUkernel will be returned, otherwise a CUfunction is returned.
In our case, given a jitify2::KernelData for an agent function or condition, we want to pass the CUfunction to cuOccupancyMaxPotentialBlockSize to get the block size for the kernel launch. Jitify2 includes a method for this, but it configures the kernel for a grid-stride loop using the full device, where as we want to launch the minimum gridsize possibel that includes at least one thread per agent (not using a grid stride loop).
In the ugprade from Jitify 1 to Jitify 2, we did not update out use of this to reflect the change in type. We missed this, as passing a CUkernel to cuOccupancyMaxPotentialBlockSize works as expected on recent CUDA drivers (i.e R575 and R580).
However, on systems with older CUDA drivers (R550) such as Google Colab and TUoS Stanage, uncaught cuda errors within cuOccupancyMaxPotentialBlockSize were leading to a division by zero floating point exceptions.
This is because cuOccupancyMaxPotentialBlockSize expects a CUfunction, but we were providing a CUkernel cast to a CUfunction. The documentation says that cuLibraryGetKernel should be used, to get a CUkernel from a CUfunction:
> the API can also be used with context-less kernel CUkernel by querying the handle using cuLibraryGetKernel() and then passing it to the API by casting to CUfunction. Here, the context to use for calculations will be the current context.
This commits adds an unnamed namespace function which given a jitify2::KernelData returns a CUfunction regardless of the value of JITIFY_USE_CONTEXT_INDEPENDENT_LOADING, which uses if constexpr in a c++20 templated lambda (to avoid both sides of a non-templated if constexpr needing to compile at the same time)1 parent a941005 commit fee4b21
1 file changed
Lines changed: 29 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
57 | 57 | | |
58 | 58 | | |
59 | 59 | | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
60 | 87 | | |
61 | 88 | | |
62 | 89 | | |
| |||
755 | 782 | | |
756 | 783 | | |
757 | 784 | | |
758 | | - | |
| 785 | + | |
759 | 786 | | |
760 | 787 | | |
761 | 788 | | |
| |||
983 | 1010 | | |
984 | 1011 | | |
985 | 1012 | | |
986 | | - | |
| 1013 | + | |
987 | 1014 | | |
988 | 1015 | | |
989 | 1016 | | |
| |||
0 commit comments