Skip to content

Bug: occupancy_max_potential_block_size accepts wrong parameter combinations, going from the official Nvidia documentation. #586

@LateinCecer

Description

@LateinCecer

What is the issue

The function CudaFunction::occupancy_max_potential_block_size expects a function pointer for the parameter block_size_to_dynamic_smem_size, as well as a usize parameter for dynamic_smem_size. This is fine when the smem size must be calculated based on the block_size since, in that case, dynamic_smem_size is ignored completely. However, when the smem size does not depend on the block size, the official Nvidia-Docs state that block_size_to_dynamic_smem_size must be NULL, otherwise dynamic_smem_size will not be read at all.

This makes it impossible to use this function through the save Cudarc wrapper if we want to work with a fixed dynamic shared memory size.

Proposed Fix

To comply with the CUDA docs, CudaFunction::occupancy_max_potential_block_size should accept block_size_to_dynamic_smem_size as an Option. With that, we should also add documentation to that function, so its more clear what it actually does with different parameters.
For futureproofing, I also think that the CUfunction wrapped in CudaFunction should be exposed through something like a cu_function() getter.
That would be consistent with the abstraction pattern in other parts of the wrapper and future issues like this can be temporarily circumvented by falling back to the unsafe API until the safe wrapper is fixed.

Why this is Important

CudaFunction::occupancy_max_potential_block_size and its siblings are pretty good to get a first approximation for optimal load out on different hardware.
Even though manual tweaking can lead to better performance in many cases, this machine usually does a better job at guessing things like occupancy then the programmer will.
So, having access to this API is actually pretty useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions