CUB 1.16.0
Summary
CUB 1.16.0 is a major release providing several improvements to the device scope algorithms. DeviceRadixSort now supports large (64-bit indexed) input data. A new UniqueByKey algorithm has been added to DeviceSelect. DeviceAdjacentDifference provides new SubtractLeft and SubtractRight functionality.
This release also deprecates several obsolete APIs, including type traits and BlockAdjacentDifference algorithms. Many bugfixes and documentation updates are also included.
64-bit Offsets in DeviceRadixSort Public APIs
Users frequently want to process large datasets using CUB’s device-scope algorithms, but the current public APIs limit input data sizes to those that can be indexed by a 32-bit integer. Beginning with this release, CUB is updating these APIs to support 64-bit offsets, as discussed in #212.
The device-scope algorithms will be updated with 64-bit offset support incrementally, starting with the cub::DeviceRadixSort family of algorithms. Thanks to @canonizer for contributing this functionality.
New DeviceSelect::UniqueByKey Algorithm
cub::DeviceSelect now provides a UniqueByKey algorithm, which has been ported from Thrust. Thanks to @zasdfgbnm for this contribution.
New DeviceAdjacentDifference Algorithms
The new cub::DeviceAdjacentDifference interface, also ported from Thrust, provides SubtractLeft and SubtractRight algorithms as CUB kernels.
Deprecation Notices
Synchronous CUDA Dynamic Parallelism Support
A future version of CUB will change the debug_synchronous behavior of device-scope algorithms when invoked via CUDA Dynamic Parallelism (CDP).
This will only affect calls to CUB device-scope algorithms launched from device-side code with debug_synchronous = true. Such invocations will continue to print extra debugging information, but they will no longer synchronize after kernel launches.
Deprecated Traits
CUB provided a variety of metaprogramming type traits in order to support C++03. Since C++14 is now required, these traits have been deprecated in favor of their STL equivalents, as shown below:
| Deprecated CUB Trait | Replacement STL Trait |
|---|---|
| cub::If | std::conditional |
| cub::Equals | std::is_same |
| cub::IsPointer | std::is_pointer |
| cub::IsVolatile | std::is_volatile |
| cub::RemoveQualifiers | std::remove_cv |
| cub::EnableIf | std::enable_if |
CUB now uses the STL traits internally, resulting in a ~6% improvement in compile time.
Misnamed cub::BlockAdjacentDifference APIs
The algorithms in cub::BlockAdjacentDifference have been deprecated, as their names did not clearly describe their intent. The FlagHeads method is now SubtractLeft, and FlagTails has been replaced by SubtractRight.
Breaking Changes
- #331: Deprecate the misnamed
BlockAdjacentDifference::FlagHeadsandFlagTailsmethods. Use the newSubtractLeftandSubtractRightmethods instead. - #364: Deprecate some obsolete type traits. These should be replaced by the equivalent traits in
<type_traits>as described above.
New Features
- #331: Port the
thrust::adjacent_differencekernel and expose it ascub::DeviceAdjacentDifference. - #405: Port the
thrust::unique_by_keykernel and expose it ascub::DeviceSelect::UniqueByKey. Thanks to @zasdfgbmn for this contribution.
Enhancements
- #340: Allow 64-bit offsets in
DeviceRadixSortpublic APIs. Thanks to @canonizer for this contribution. - #400: Implement a significant reduction in
DeviceMergeSortcompilation time. - #415: Support user-defined
CMAKE_INSTALL_INCLUDEDIRvalues in Thrust’s CMake install rules. Thanks for @robertmaynard for this contribution.
Bug Fixes
- #381: Fix shared memory alignment in
dyn_smemexample. - #393: Fix some collisions with the
min/maxmacros defined inwindows.h. - #404: Fix bad cast in
util_device. - #410: Fix CDP issues in
DeviceSegmentedSort. - #411: Ensure that the
nv_exec_check_disablepragma is only used on nvcc. - #418: Fix
-Wsizeof-array-divwarning on gcc 11. Thanks to @robertmaynard for this contribution. - #420: Fix new uninitialized variable warning in
DiscardIteratoron gcc 10. - #423: Fix some collisions with the
smallmacro defined inwindows.h. - #426: Fix some issues with version handling in CUB’s CMake packages.
- #430: Remove documentation for
DeviceSpmvparameters that are absent from public APIs. - #432: Remove incorrect documentation for
DeviceScanalgorithms that guaranteed run-to-run deterministic results for floating-point addition.