|
| 1 | +# CUB 1.14.0 (NVIDIA HPC SDK 21.9) |
| 2 | + |
| 3 | +## Summary |
| 4 | + |
| 5 | +CUB 1.14.0 is a major release accompanying the NVIDIA HPC SDK 21.9. |
| 6 | + |
| 7 | +This release provides the often-requested merge sort algorithm, ported from the |
| 8 | +`thrust::sort` implementation. Merge sort provides more flexibility than the |
| 9 | +existing radix sort by supporting arbitrary data types and comparators, though |
| 10 | +radix sorting is still faster for supported inputs. This functionality is |
| 11 | +provided through the new `cub::DeviceMergeSort` and `cub::BlockMergeSort` |
| 12 | +algorithms. |
| 13 | + |
| 14 | +The namespace wrapping mechanism has been overhauled for 1.14. The existing |
| 15 | +macros (`CUB_NS_PREFIX`/`CUB_NS_POSTFIX`) can now be replaced by a single macro, |
| 16 | +`CUB_WRAPPED_NAMESPACE`, which is set to the name of the desired wrapped |
| 17 | +namespace. Defining a similar `THRUST_CUB_WRAPPED_NAMESPACE` macro will embed |
| 18 | +both `thrust::` and `cub::` symbols in the same external namespace. The |
| 19 | +prefix/postfix macros are still supported, but now require a new |
| 20 | +`CUB_NS_QUALIFIER` macro to be defined, which provides the fully qualified CUB |
| 21 | +namespace (e.g. `::foo::cub`). See `cub/util_namespace.cuh` for details. |
| 22 | + |
| 23 | +## Breaking Changes |
| 24 | + |
| 25 | +- NVIDIA/cub#350: When the `CUB_NS_[PRE|POST]FIX` macros are set, |
| 26 | + `CUB_NS_QUALIFIER` must also be defined to the fully qualified CUB namespace |
| 27 | + (e.g. `#define CUB_NS_QUALIFIER ::foo::cub`). Note that this is handled |
| 28 | + automatically when using the new `[THRUST_]CUB_WRAPPED_NAMESPACE` mechanism. |
| 29 | + |
| 30 | +## New Features |
| 31 | + |
| 32 | +- NVIDIA/cub#322: Ported the merge sort algorithm from Thrust: |
| 33 | + `cub::BlockMergeSort` and `cub::DeviceMergeSort` are now available. |
| 34 | +- NVIDIA/cub#326: Simplify the namespace wrapper macros, and detect when |
| 35 | + Thrust's symbols are in a wrapped namespace. |
| 36 | + |
| 37 | +## Bug Fixes |
| 38 | + |
| 39 | +- NVIDIA/cub#160, NVIDIA/cub#163, NVIDIA/cub#352: Fixed several bugs in |
| 40 | + `cub::DeviceSpmv` and added basic tests for this algorithm. Thanks to James |
| 41 | + Wyles and Seunghwa Kang for their contributions. |
| 42 | +- NVIDIA/cub#328: Fixed error handling bug and incorrect debugging output in |
| 43 | + `cub::CachingDeviceAllocator`. Thanks to Felix Kallenborn for this |
| 44 | + contribution. |
| 45 | +- NVIDIA/cub#335: Fixed a compile error affecting clang and NVRTC. Thanks to |
| 46 | + Jiading Guo for this contribution. |
| 47 | +- NVIDIA/cub#351: Fixed some errors in the `cub::DeviceHistogram` documentation. |
| 48 | + |
| 49 | +## Enhancements |
| 50 | + |
| 51 | +- NVIDIA/cub#348: Add an example that demonstrates how to use dynamic shared |
| 52 | + memory with a CUB block algorithm. Thanks to Matthias Jouanneaux for this |
| 53 | + contribution. |
| 54 | + |
1 | 55 | # CUB 1.13.1 (CUDA Toolkit 11.5) |
2 | 56 |
|
3 | 57 | CUB 1.13.1 is a minor release accompanying the CUDA Toolkit 11.5. |
|
0 commit comments