Skip to content

[FEA] Add __launch_bounds__ specifier to all kernels #3002

Open
@ttnghia

Description

@ttnghia

We've experience a compiler regression issue when compiling the kernel on GB100 with CUDA 12.8, in which situation the kernels are compiled to use much more registers than usual. As such, some kernels failed to launch in our tests, resulting in test failures.

For example, when compiling with sm_75:

ptxas info    : Compiling entry function '_ZN16spark_rapids_jni6detail12copy_to_rowsINS0_30fixed_width_row_offset_functorEEEviiiN4cudf11device_spanIKNS0_9tile_infoELm18446744073709551615EEEPPKaPKiSC_T_SC_PPa' for 'sm_75'
ptxas info    : Function properties for _ZN16spark_rapids_jni6detail12copy_to_rowsINS0_30fixed_width_row_offset_functorEEEviiiN4cudf11device_spanIKNS0_9tile_infoELm18446744073709551615EEEPPKaPKiSC_T_SC_PPa
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 34 registers, 16 bytes smem, 432 bytes cmem[0], 12 bytes cmem[2]

vs compiling with sm_100:

ptxas info    : Compiling entry function '_ZN16spark_rapids_jni6detail12copy_to_rowsINS0_30fixed_width_row_offset_functorEEEviiiN4cudf11device_spanIKNS0_9tile_infoELm18446744073709551615EEEPPKaPKiSC_T_SC_PPa' for 'sm_100'
ptxas info    : Function properties for _ZN16spark_rapids_jni6detail12copy_to_rowsINS0_30fixed_width_row_offset_functorEEEviiiN4cudf11device_spanIKNS0_9tile_infoELm18446744073709551615EEEPPKaPKiSC_T_SC_PPa
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 72 registers, used 1 barriers, 16 bytes smem

We should proactively prevent such situations from happening again by adding __launch_bounds__ to all our kernels, which can tell the compiler hint to allocate resources better (register/cmem etc.) including not to use too many registers.
That should help the kernels to request a more predictable amount of resources (registers) on any GPU so they can be launched successfully.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions