What is the issue
The function CudaFunction::occupancy_max_potential_block_size expects a function pointer for the parameter block_size_to_dynamic_smem_size, as well as a usize parameter for dynamic_smem_size. This is fine when the smem size must be calculated based on the block_size since, in that case, dynamic_smem_size is ignored completely. However, when the smem size does not depend on the block size, the official Nvidia-Docs state that block_size_to_dynamic_smem_size must be NULL, otherwise dynamic_smem_size will not be read at all.
This makes it impossible to use this function through the save Cudarc wrapper if we want to work with a fixed dynamic shared memory size.
Proposed Fix
To comply with the CUDA docs, CudaFunction::occupancy_max_potential_block_size should accept block_size_to_dynamic_smem_size as an Option. With that, we should also add documentation to that function, so its more clear what it actually does with different parameters.
For futureproofing, I also think that the CUfunction wrapped in CudaFunction should be exposed through something like a cu_function() getter.
That would be consistent with the abstraction pattern in other parts of the wrapper and future issues like this can be temporarily circumvented by falling back to the unsafe API until the safe wrapper is fixed.
Why this is Important
CudaFunction::occupancy_max_potential_block_size and its siblings are pretty good to get a first approximation for optimal load out on different hardware.
Even though manual tweaking can lead to better performance in many cases, this machine usually does a better job at guessing things like occupancy then the programmer will.
So, having access to this API is actually pretty useful.
What is the issue
The function
CudaFunction::occupancy_max_potential_block_sizeexpects a function pointer for the parameterblock_size_to_dynamic_smem_size, as well as ausizeparameter fordynamic_smem_size. This is fine when the smem size must be calculated based on theblock_sizesince, in that case,dynamic_smem_sizeis ignored completely. However, when the smem size does not depend on the block size, the official Nvidia-Docs state thatblock_size_to_dynamic_smem_sizemust beNULL, otherwisedynamic_smem_sizewill not be read at all.This makes it impossible to use this function through the save Cudarc wrapper if we want to work with a fixed dynamic shared memory size.
Proposed Fix
To comply with the CUDA docs,
CudaFunction::occupancy_max_potential_block_sizeshould acceptblock_size_to_dynamic_smem_sizeas anOption. With that, we should also add documentation to that function, so its more clear what it actually does with different parameters.For futureproofing, I also think that the CUfunction wrapped in
CudaFunctionshould be exposed through something like acu_function()getter.That would be consistent with the abstraction pattern in other parts of the wrapper and future issues like this can be temporarily circumvented by falling back to the unsafe API until the safe wrapper is fixed.
Why this is Important
CudaFunction::occupancy_max_potential_block_sizeand its siblings are pretty good to get a first approximation for optimal load out on different hardware.Even though manual tweaking can lead to better performance in many cases, this machine usually does a better job at guessing things like occupancy then the programmer will.
So, having access to this API is actually pretty useful.