Releases: uxlfoundation/oneCCL
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.17
What's New 2021.17:
- New API: Technical preview of NCCL* like API alignment with an addition of onecclcommDestroy, onecclGetErrorstring, and onecclGetLastError APIs
- Support for single process and multiple threads: Currently supporting Allgather, Allreduce, Alltoall, ReduceScatter, Broadcast, pt2pt and Group API for scale up
- Added Operations: Added support for user defined reduction operations for scale up and extended group API to also support pt2pt operations.
- Improved Performance: Allgather optimizations for large messages for scale out up to 8 nodes
- Support for BMG: Added BMG support, for now only available on the opensource
- Bug fixes and performance optimizations
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.15.6
This ccl_2021.15.6-arc branch introduces several enhancements for Intel ARC A and B Series GPU:
- Bug Fixes
- Add implementation for ofi barrier to optimize the CCL barrier in OFI transport
- Applying chunking in Allgather scale-up (LL protocol) to WA a PCIe hardware bug
- Code refactoring
An example of the cmake command for Intel ARC A Series GPU :
cmake .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCA=1
An example of the cmake command for Intel ARC B Series GPU :
cmake .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCB=1
Attached binaries:
2021.15.6.2 package is built using 2025.0.0 version of Intel® oneAPI DPC++/C++ Compiler
2021.15.6.9 package is built using 2025.2.0 version of Intel® oneAPI DPC++/C++ Compiler
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.16.2
What's new:
- Bug fixes
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.15.5
This ccl_2021.15.5-arc branch introduces several enhancements for Intel ARC A and B Series GPU:
This release introduces bug fixes and refactoring, along with new implementations for Alltoall LL and one-way RDMA send-receive functionalities.
The cmake command is the same as before:
An example of the cmake command for Intel ARC A Series GPU :
cmake .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCA=1
An example of the cmake command for Intel ARC B Series GPU :
cmake .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCB=1
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.16.1
What's new:
- Bug fixes
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.15.4
This ccl_2021.15.4-arc branch introduces several enhancements for Intel ARC A and B Series GPU:
- Support for Reduce-Scatter and Point-To-Point in addition to previously enabled Allreduce and Allgather
- Support for 8 bit datatypes (int8, uint8)
- Bug fixes, including removal of previously required setting of IGC_VISAOptions=-activeThreadsOnlyBarrier, which is no longer needed.
The cmake command is the same as before:
make .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCA=1
An example of the cmake command for Intel ARC B Series GPU :
cmake .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCB=1
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.16
What's New 2021.16:
- Added SYCL graph support for Record and Replay for Allgather, Allreduce, Alltoall, ReduceScatter and Broadcast
- Added SYCL-based implementation of ring algorithm for Allgather
- Added SYCL-based implementation for Broadcast
- Added multithread support for Allgather and ReduceScatter scale up impementation
- Added attribute in the communicator to specify blocking operations for CPU
- Bug fixes
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.15.3
This ccl_2021.15.3-arc branch adds support for Intel ARC A and B Series GPU and some bug fixes.
An example of the cmake command for Intel ARC A Series GPU :
cmake .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCA=1
An example of the cmake command for Intel ARC B Series GPU :
cmake .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCB=1
If the system does not have GPU Peer-to-Peer (P2P) support, you will need to add this compiler environment flag (export IGC_VISAOptions=-activeThreadsOnlyBarrier) before compiling. Similarly, on a system without P2P support, add export IGC_VISAOptions=-activeThreadsOnlyBarrier to your command line before running the application.
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.15.2
What's new:
- Bug fix - Improvement of User Experience based on setting of Environment Variables.
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.15.1
What's new:
- Bug fixes