Skip to content

Commit 7e4ff57

Browse files
authored
Merge pull request #134 from rscohn2/dev/new-api
[RFC]: C API
2 parents fb5dd2a + 5340048 commit 7e4ff57

File tree

1 file changed

+127
-0
lines changed

1 file changed

+127
-0
lines changed

rfcs/20240806-c-api/README.md

+127
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
# C API Design Document (RFC)
2+
3+
4+
## Introduction
5+
6+
The oneCCL communication library’s current APIs is defined in the [oneAPI
7+
specification][ccl-spec]. However, other APIs used by similar collective
8+
communication libraries differ from those used by oneCCL. For example, see
9+
[NCCL][nccl-spec] from Nvidia, [RCCL][rccl-spec] from AMD, and hccl from
10+
Habana. This RFC asks for feedback about aligning the oneCCL APIs to be closer
11+
to other vendor libraries, since this facilitates integration with frameworks
12+
and upstreaming to the open source.
13+
14+
One difference between oneCCL and other vendors communication libraries is that
15+
all other communication libraries have a C API, while oneCCL has a C++ API.
16+
This is because oneCCL was designed to integrate with SYCL, which is based on
17+
C++. One of the goals of oneCCL is to support different hardware and vendors,
18+
such as Intel Data Center GPU Max Series, Intel Core and Intel Xeon family,
19+
Intel Gaudi, Nvidia or AMD GPUs, among others.
20+
21+
[ccl-spec]: https://uxlfoundation.github.io/oneAPI-spec/spec/elements/oneCCL/source/index.html
22+
[hccl-spec]: https://docs.habana.ai/en/latest/API_Reference_Guides/HCCL_APIs/C_API.html
23+
[nccl-spec]: https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/api.html
24+
[rccl-spec]: https://rocm.docs.amd.com/projects/rccl/en/latest/api-reference/api-library.html#api-library
25+
26+
## Proposal
27+
28+
The proposal is to define a C-like API that aligns with current APIs in other
29+
communication libraries, while introducing a few changes, as described next:
30+
31+
1. Most APIs are C-based like other communication libraries. C++ data
32+
structures are hidden behind handles returned to the user, such as
33+
`ccl::stream` and `ccl::comm`.
34+
35+
2. The API is extended to support different types of streams or queues:
36+
37+
- `onecclResult_t onecclCreateStreamXPU(onecclStream_t* oneccl_stream, void *args)`
38+
the args is a pointer to the stream or queue that is vendor specific.
39+
- `onecclResult_t onecclStreamCreateCPU(onecclStream_t* oneccl_stream, void* args)`
40+
this API is explicit for CPU.
41+
42+
- `onecclResult_t onecclStreamDestroy(onecclStream_t oneccl_stream)`
43+
44+
Once the sycl::queue is registered, it is hidden behind the `onecclStream_t`
45+
handle
46+
47+
3. Add functions to allow users to explicitly control the lifetime of objects,
48+
instead of relying on the C++ destructors
49+
50+
- `onecclResult_t onecclCommFinalize(comm)`
51+
- `onecclResult_t onecclCommDestroy(comm)`
52+
53+
4. Drop support for out-of-order SYCL queue and SYCL buffers. The current
54+
oneCCL library support out of order SYCL queues, but this feature is not
55+
used by the users of the library. In general, the collective operations are
56+
submitted to an in-order queue. When out-of order behavior is required,
57+
commands are submitted to a different in-order queue, and the two queues are
58+
synchronized.
59+
60+
5. Drop support for SYCL buffers. Only [Unified Shared Memory][usm-example] is
61+
supported.
62+
63+
[usm-example]: https://www.intel.com/content/www/us/en/developer/articles/code-sample/dpcpp-usm-code-sample.html
64+
65+
### APIs
66+
67+
The tables below contain the NCCL API, the corresponding new proposed oneCCL
68+
API, and the current oneCCL API.
69+
70+
#### APIs related with communicator creation.
71+
72+
| NCCL | oneCCL (proposed C) | oneCCL (current, C++) |
73+
|-------------------|------------------------------|-------------------------|
74+
|`ncclResult_t ncclGetUniqueId (id)`| `onecclResult_t onecclGetUniqueId (id)`| `ccl::create_main_kvs(); ccl::create_kvs(main_addr);`|
75+
|`ncclResult_t ncclCommInitRank(comm, size, id, rank)`|`onecclResult_t onecclCommInitRank(comm, size, id, rank)(1)`|`comm cl::create_communicator(size, rank, device, context, kvs) comms ccl:create_communicators(size, rank, device, context, kvs)`|
76+
|`ncclResult_t ncclCommInitRankConfig(comm, size, id, rank, attr)`|`onecclResult_t onecclCommInitRankConfig(comm, size, id, rank, attr)`|`comm ccl:create_communicator(size, rank, device, context, kvs, attr)`|
77+
|`ncclResult_t ncclCommInitAll (comms, ndev, dev_list)`|`onecclResult_t onecclCommInitAll(comms,ndev,dev_list)`| Not currently available.Working on adding support.|
78+
|`ncclCommSplit` | Not implemented | Not implemented |
79+
|`nccltResult ncclCommFinalize(comm)`|`onecclResult_t onecclCommFinalize(comm)`| N/A |
80+
|`ncclResult_t ncclCommDestroy(comm)`|`onecclResult_t onecclCommDestroy(comm)`| Destructor |
81+
82+
This assumes that each rank is associated with a device, which has been set before calling this function (ncclCommInitRank).
83+
84+
#### APIs related with Collective Communication operations
85+
86+
| NCCL | oneCCL (proposed C) | oneCCL (current, C++) |
87+
|-------------------|------------------------------|-------------------------|
88+
|`ncclResult_t ncclAllgather (sendbuff,recvbuff,count, datatype, op, comm, stream)`|`onecclResult_t onecclAllgather(sendbuff, recvbuff, count, datatype, op, comm, oneccl_stream)`|`ccl::event communicator::allgather (2) (sendbuff, recvbuff, count, datatype, op, comm, oneccl_stream, deps)`|
89+
|`ncclResult_t ncclAllreduce(sendbuff,recvbuff, count, datatype, op, comm, stream)`|`onecclResult_t onecclAllreduce(sendbuff, recvbuff, count, datatype, op, comm, oneccl_stream)`|`ccl::event
90+
communicator::allreduce(sendbuff, recvbuff, count, datatype, op, comm, oneccl_stream, deps)`|
91+
|`ncclResult_t ncclBroadcast(sendbuff,recvbuff,count, datatype, op, comm, stream)`|`onecclResult_t onecclBroadcast(sendbuff, recvbuff, count, datatype, op, comm, oneccl_stream)`|`ccl::event communicator::broadcast (3) (sendbuff, recvbuff, count, datatype, op, comm, oneccl_stream, deps)`|
92+
|`ncclResult_t ncclReduce(sendbuff,recvbuff,count, datatype, op, comm, stream)`|`onecclResult_t onecclReduce(sendbuff, recvbuff, count, datatype, op, comm, oneccl_stream)`|`ccl::event communicator::reduce(sendbuff, recvbuff, count, datatype, op, comm, oneccl_stream, deps)`|
93+
|`ncclResult_t ncclReduceScatter(sendbuff,recvbuff, count, datatype, op, comm, stream)`|`onecclResult_t onecclReduceScatter(sendbuff, recvbuff, count, datatype, op, comm, oneccl_stream)`|`ccl::event communicator::reduce_scatter(sendbuff, recvbuff, count, datatype, op, comm, oneccl_stream, deps)`|
94+
| N/A |`onecclAlltoall onecclAlltoallv` We could deprecate|`communicator::alltoall communicator::alltoallv`|
95+
| N/A |`onecclBarrier` We could deprecate and use Allreduce with 1 Byte|`ccl::event communicator::barrier`|
96+
97+
- Currently oneCCL contains Allgatherv, but this will be deprecated in the
98+
future
99+
- The current API is slightly different, but the next oneCCL release will align
100+
the Broadcast with the one shown here
101+
102+
#### Group APIs
103+
104+
| NCCL | oneCCL (proposed C) | oneCCL (current, C++) |
105+
|-------------------|------------------------------|-------------------------|
106+
|`ncclResult_t ncclGroupStart()`|`onecclResult_t onecclGroupStart()`| N/A |
107+
|`ncclResult_t ncclGroupEnd()` |`onecclResult_t onecclGroupEnd()` | N/A |
108+
109+
#### Point to Point APIs
110+
111+
| NCCL | oneCCL (proposed C) | oneCCL (current, C++) |
112+
|-------------------|------------------------------|-------------------------|
113+
|`ncclResult_t ncclSend(sendbuf, count, datatype, peer, comm, stream)`|`onecclResult_t onecclSend(sendbuf, count, datatype, peer, comm, oneccl_stream)`|`ccl::event communicator::send(sendbuf, count,datatype, peer, comm, oneccl_stream)`|
114+
|`ncclResult_t ncclRecv(…)`|`onecclResult_t onecclRecv(…)`|`communicator::recv`|
115+
116+
#### Other APIs
117+
118+
| NCCL | oneCCL (proposed C) | oneCCL (current, C++) |
119+
|-------------------|------------------------------|-------------------------|
120+
|`ncclResult_t ncclCommCount(comm, size)`|`onecclResult_t onecclCommCount(comm, size)`|`size communicator::size()`|
121+
|`ncclResult_t ncclCommCuDevice(comm, device)`|`onecclResult_t onecclCommGetDevice(comm, device)`|`device communicator::get_device()`|
122+
|`ncclResult_t ncclCommUserRank(comm, rank)`|`onecclResult_t onecclCommUserRank(comm, rank)`|`rank communicator::rank()`|
123+
|`ncclResult_t ncclGetVersion(version)`|`onecclResult_t onecclGetVersion(version)`|`version ccl:get_library_version()`|
124+
|`ncclCommAbort` | `onecclCommAbort` | N/A |
125+
|`ncclCommGetAsyncError`| `onecclCommGetAsyncError` | N/A |
126+
|`ncclGetLastError` | `onecclGetLastError` | N/A |
127+
|`ncclGetErrorString`| `onecclGetErrorString` | N/A |

0 commit comments

Comments
 (0)