Open
Description
The triplet counting kernel is our third-hottest kernel in terms of throughput. However, this kernel introduces some atomic contention through the way it pushes data to the output array:
traccc/device/common/include/traccc/seeding/device/impl/count_triplets.ipp
Lines 105 to 115 in dee541f
Although modern GPGPU architectures do automatically coalesce atomic accesses to some extent, we might still benefit from coalescing the atomic addition on a block-scale first (using, e.g., barrier::blockCount
) and issuing only a single atomic increment per block.
This relatively simple and well-contained issue should be very suitable for developers trying to get started with traccc or with GPGPU programming in general.