Skip to content

Conversation

@Chao1Han
Copy link
Contributor

@Chao1Han Chao1Han commented Apr 16, 2025

This PR introduces the new oneCCL C API (aligned with NCCL’s API) and implements runtime switching between the oneCCL v1 API and the new v2 C API (defaulting to the legacy API, with USE_CCL_V2=1 enabling the new one). In CMake, since libccl has not yet merged the v1 and v2 shared libraries, libccl.so.2 is located separately.

  • Integration of oneCCL v2 C API into the communication backend.
  • Addition of runtime switch mechanisms to dynamically select CCL API implementation.
  • Refactoring affected modules to support both v1 and v2 APIs.

Copilot AI review requested due to automatic review settings July 28, 2025 08:22

This comment was marked as outdated.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR merges oneCCL v1 and v2 API support into a unified XCCL interface by creating an abstraction layer that can dynamically switch between the two API versions based on an environment variable USE_CCL_V2.

  • Introduces a unified abstraction layer using C++ variants to handle both API versions
  • Adds runtime switching between oneCCL v1 and v2 based on environment variable
  • Consolidates duplicate code and functions into shared utilities

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/xccl/xccl.h New header defining unified API abstractions, data types, and utility functions for both oneCCL versions
src/xccl/xccl.cpp Implementation of unified collective operations that dispatch to appropriate API version
src/xccl/ProcessGroupXCCL.hpp Updated to use new unified types and removed duplicate utility functions
src/xccl/ProcessGroupXCCL.cpp Refactored to use unified API calls and removed version-specific implementations
cmake/XCCL.cmake Added linking to v2.0 library
cmake/Modules/FindXCCL.cmake Added discovery of v2.0 library file
Comments suppressed due to low confidence (1)

src/xccl/ProcessGroupXCCL.cpp:1818

  • Missing 'opts.asyncOp' parameter in the collective call. This should be the fourth parameter before the profiling title.
            outputSplitSizes, output, &recv_lengths, &recv_offsets);

@Chao1Han Chao1Han changed the title [wip] merge oneccl v1&v2 api Enable oneCCL v2 C API and add runtime switch Nov 5, 2025
@Chao1Han Chao1Han requested a review from guangyey November 5, 2025 02:48
src/xccl/xccl.h Outdated
struct XCCLStream {
at::xpu::XPUStream xpuStream;
ccl::stream cclStream;
void* syclQueue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
void* syclQueue;

I think syclQueue is a redundant member.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It always can be fetched from xpuStream

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You’re right — previously, converting an XPU stream to a CCL stream incurred significant overhead, so it was stored in a map. Since sycl::queue is essentially an alias for the XPU stream, I’ll make the necessary modification.

@Chao1Han Chao1Han requested a review from guangyey November 5, 2025 07:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants