Releases · NVIDIA/cub

This repository was archived by the owner on Mar 21, 2024. It is now read-only.

19 May 08:59

brycelelbach

1.8.0

c3cceac

CUB 1.8.0

Summary

CUB 1.8.0 introduces changes to the cub::Shuffle* interfaces.

Breaking Changes

The interfaces of cub::ShuffleIndex, cub::ShuffleUp, and cub::ShuffleDown have been changed to allow for better computation of the PTX SHFL control constant for logical warps smaller than 32 threads.

Bug Fixes

#112: Fix cub::WarpScan's broadcast of warp-wide aggregate for logical warps smaller than 32 threads.

Assets 2

19 May 08:56

brycelelbach

1.7.5

1fafcc0

CUB 1.7.5

Summary

CUB 1.7.5 adds support for radix sorting __half keys and improved sorting performance for 1 byte keys. It was incorporated into Thrust 1.9.2.

Enhancements

Radix sort support for __half keys.
Radix sort tuning policy updates to improve 1 byte key performance.

Bug Fixes

Syntax tweaks to mollify Clang.
#127: cub::DeviceRunLengthEncode::Encode returns incorrect results.
#128: 7-bit sorting passes fail for SM61 with large values.

Assets 2

19 May 08:56

brycelelbach

1.7.4

d622848

CUB 1.7.4

Summary

CUB 1.7.4 is a minor release that was incorporated into Thrust 1.9.1-2.

Bug Fixes

#114: Can't pair non-trivially-constructible values in radix sort.
#115: cub::WarpReduce segmented reduction is broken in CUDA 9 for logical warp sizes smaller than 32.

Assets 2

19 May 08:56

brycelelbach

1.7.3

68a50fa

CUB 1.7.3

Summary

CUB 1.7.3 is a minor release.

Bug Fixes

#110: cub::DeviceHistogram null-pointer exception bug for iterator inputs.

Assets 2

19 May 08:56

brycelelbach

1.7.2

53cfb10

CUB 1.7.2

Summary

CUB 1.7.2 is a minor release.

Bug Fixes

#104: Device-wide reduction is now "run-to-run" deterministic for pseudo-associative reduction operators (like floating point addition).

Assets 2

19 May 08:55

brycelelbach

1.7.1

b165e1f

CUB 1.7.1

Summary

CUB 1.7.0 brings support for CUDA 9.0 and SM7x (Volta) GPUs.
It is compatible with independent thread scheduling.

Breaking Changes

Remove cub::WarpAll and cub::WarpAny. These functions served to emulate __all and __any functionality for SM1x devices, which did not have those operations. However, SM1x devices are now deprecated in CUDA, and the interfaces of these two functions are now lacking the lane-mask needed for collectives to run on SM7x and newer GPUs which have independent thread scheduling.

Other Enhancements

Remove any assumptions of implicit warp synchronization to be compatible with SM7x's (Volta) independent thread scheduling.

Bug Fixes

#86: Incorrect results with reduce-by-key.

Assets 2

19 May 08:55

brycelelbach

1.7.0

b20808b

CUB 1.7.0

Summary

CUB 1.7.0 brings support for CUDA 9.0 and SM7x (Volta) GPUs. It is compatible with independent thread scheduling. It was incorporated into Thrust 1.9.2.

Breaking Changes

Remove cub::WarpAll and cub::WarpAny. These functions served to emulate __all and __any functionality for SM1x devices, which did not have those operations. However, SM1x devices are now deprecated in CUDA, and the interfaces of these two functions are now lacking the lane-mask needed for collectives to run on SM7x and newer GPUs which have independent thread scheduling.

Other Enhancements

Remove any assumptions of implicit warp synchronization to be compatible with SM7x's (Volta) independent thread scheduling.

Bug Fixes

#86: Incorrect results with reduce-by-key.

Assets 2

19 May 08:45

brycelelbach

1.6.4

7ba78ae

CUB 1.6.4

Summary

CUB 1.6.4 improves radix sorting performance for SM5x (Maxwell) and SM6x (Pascal) GPUs.

Enhancements

Radix sort tuning policies updated for SM5x (Maxwell) and SM6x (Pascal) - 3.5B and 3.4B 32 byte keys/s on TitanX and GTX 1080, respectively.

Bug Fixes

Restore fence work-around for scan (reduce-by-key, etc.) hangs in CUDA 8.5.
#65: cub::DeviceSegmentedRadixSort should allow inputs to have pointer-to-const type.
Mollify Clang device-side warnings.
Remove out-dated MSVC project files.

Assets 2

19 May 08:45

brycelelbach

1.6.3

af70707

CUB 1.6.3

Summary

CUB 1.6.3 improves support for Windows, changes cub::BlockLoad/cub::BlockStore interface to take the local data type, and enhances radix sort performance for SM6x (Pascal) GPUs.

Breaking Changes

cub::BlockLoad and cub::BlockStore are now templated by the local data type, instead of the Iterator type. This allows for output iterators having void as their value_type (e.g. discard iterators).

Other Enhancements

Radix sort tuning policies updated for SM6x (Pascal) GPUs - 6.2B 4 byte keys/s on GP100.
Improved support for Windows (warnings, alignment, etc).

Bug Fixes

#74: cub::WarpReduce executes reduction operator for out-of-bounds items.
#72: cub:InequalityWrapper::operator should be non-const.
#71: cub::KeyValuePair won't work if Key has non-trivial constructor.
#69: cub::BlockStore::Storedoesn't compile ifOutputIteratorT::value_typeisn'tT`.
#68: cub::TilePrefixCallbackOp::WarpReduce doesn't permit PTX arch specialization.

Assets 2

19 May 08:45

brycelelbach

1.6.2

b14ba0d

CUB 1.6.2 (previously 1.5.5)

Summary

CUB 1.6.2 (previously 1.5.5) improves radix sort performance for SM6x (Pascal) GPUs.

Enhancements

Radix sort tuning policies updated for SM6x (Pascal) GPUs.

Bug Fixes

Fix AArch64 compilation of cub::CachingDeviceAllocator.

Assets 2

Releases: NVIDIA/cub

CUB 1.8.0

Summary

Breaking Changes

Bug Fixes

Uh oh!

CUB 1.7.5

Summary

Enhancements

Bug Fixes

Uh oh!

CUB 1.7.4

Summary

Bug Fixes

Uh oh!

CUB 1.7.3

Summary

Bug Fixes

Uh oh!

CUB 1.7.2

Summary

Bug Fixes

Uh oh!

CUB 1.7.1

Summary

Breaking Changes

Other Enhancements

Bug Fixes

Uh oh!

CUB 1.7.0

Summary

Breaking Changes

Other Enhancements

Bug Fixes

Uh oh!

CUB 1.6.4

Summary

Enhancements

Bug Fixes

Uh oh!

CUB 1.6.3

Summary

Breaking Changes

Other Enhancements

Bug Fixes

Uh oh!

CUB 1.6.2 (previously 1.5.5)

Summary

Enhancements

Bug Fixes

Uh oh!