-
Notifications
You must be signed in to change notification settings - Fork 62
Enable oneCCL v2 C API and add runtime switch #1579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR merges oneCCL v1 and v2 API support into a unified XCCL interface by creating an abstraction layer that can dynamically switch between the two API versions based on an environment variable USE_CCL_V2.
- Introduces a unified abstraction layer using C++ variants to handle both API versions
- Adds runtime switching between oneCCL v1 and v2 based on environment variable
- Consolidates duplicate code and functions into shared utilities
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/xccl/xccl.h | New header defining unified API abstractions, data types, and utility functions for both oneCCL versions |
| src/xccl/xccl.cpp | Implementation of unified collective operations that dispatch to appropriate API version |
| src/xccl/ProcessGroupXCCL.hpp | Updated to use new unified types and removed duplicate utility functions |
| src/xccl/ProcessGroupXCCL.cpp | Refactored to use unified API calls and removed version-specific implementations |
| cmake/XCCL.cmake | Added linking to v2.0 library |
| cmake/Modules/FindXCCL.cmake | Added discovery of v2.0 library file |
Comments suppressed due to low confidence (1)
src/xccl/ProcessGroupXCCL.cpp:1818
- Missing 'opts.asyncOp' parameter in the collective call. This should be the fourth parameter before the profiling title.
outputSplitSizes, output, &recv_lengths, &recv_offsets);
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
src/xccl/xccl.h
Outdated
| struct XCCLStream { | ||
| at::xpu::XPUStream xpuStream; | ||
| ccl::stream cclStream; | ||
| void* syclQueue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| void* syclQueue; |
I think syclQueue is a redundant member.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It always can be fetched from xpuStream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You’re right — previously, converting an XPU stream to a CCL stream incurred significant overhead, so it was stored in a map. Since sycl::queue is essentially an alias for the XPU stream, I’ll make the necessary modification.
This PR introduces the new oneCCL C API (aligned with NCCL’s API) and implements runtime switching between the oneCCL v1 API and the new v2 C API (defaulting to the legacy API, with
USE_CCL_V2=1enabling the new one). In CMake, sincelibcclhas not yet merged the v1 and v2 shared libraries,libccl.so.2is located separately.