Bug: occupancy_max_potential_block_size accepts wrong parameter combinations, going from the official Nvidia documentation.

# What is the issue

The function `CudaFunction::occupancy_max_potential_block_size` expects a function pointer for the parameter `block_size_to_dynamic_smem_size`, as well as a `usize` parameter for `dynamic_smem_size`. This is fine when the smem size must be calculated based on the `block_size` since, in that case, `dynamic_smem_size` is ignored completely. However, when the smem size does not depend on the block size, the official [Nvidia-Docs](https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__OCCUPANCY.html#group__CUDA__OCCUPANCY_1g04c0bb65630f82d9b99a5ca0203ee5aa) state that `block_size_to_dynamic_smem_size` *must* be `NULL`, otherwise `dynamic_smem_size` will not be read at all.

This makes it impossible to use this function through the save Cudarc wrapper if we want to work with a fixed dynamic shared memory size.

# Proposed Fix

To comply with the CUDA docs, `CudaFunction::occupancy_max_potential_block_size` should accept `block_size_to_dynamic_smem_size` as an `Option`. With that, we should also add documentation to that function, so its more clear what it actually does with different parameters.
For futureproofing, I also think that the CUfunction wrapped in `CudaFunction` should be exposed through something like a `cu_function()` getter.
That would be consistent with the abstraction pattern in other parts of the wrapper and future issues like this can be temporarily circumvented by falling back to the unsafe API until the safe wrapper is fixed.

# Why this is Important

`CudaFunction::occupancy_max_potential_block_size` and its siblings are pretty good to get a first approximation for optimal load out on different hardware.
Even though manual tweaking can lead to better performance in many cases, this machine usually does a better job at guessing things like occupancy then the programmer will.
So, having access to this API is actually pretty useful. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: occupancy_max_potential_block_size accepts wrong parameter combinations, going from the official Nvidia documentation. #586

What is the issue

Proposed Fix

Why this is Important

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Bug: occupancy_max_potential_block_size accepts wrong parameter combinations, going from the official Nvidia documentation. #586

Description

What is the issue

Proposed Fix

Why this is Important

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions