Skip to content

Commit 5ab276f

Browse files
committed
Add TorchComms support documentation
- docs/quickstart.md: build instructions, usage example, supported collectives table, environment variables, test/benchmark commands - Consistent with existing doc style (dollar prompts, MSCCLPP_BUILD var)
1 parent 8716b6c commit 5ab276f

1 file changed

Lines changed: 73 additions & 0 deletions

File tree

docs/quickstart.md

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,7 @@ There are a few optional CMake options you can set:
9292
- `-DMSCCLPP_BUILD_PYTHON_BINDINGS=OFF`: Don't build the Python module.
9393
- `-DMSCCLPP_BUILD_TESTS=OFF`: Don't build the tests.
9494
- `-DMSCCLPP_BUILD_APPS_NCCL=OFF`: Don't build the NCCL API.
95+
- `-DMSCCLPP_BUILD_EXT_TORCHCOMMS=ON`: Build [TorchComms](https://github.com/meta-pytorch/torchcomms) support for MSCCL++ (off by default). Requires PyTorch and pybind11.
9596
```
9697

9798
(install-from-source-python-module)=
@@ -205,6 +206,78 @@ export LD_LIBRARY_PATH=$MSCCLPP_INSTALL_DIR:$LD_LIBRARY_PATH
205206
torchrun --nnodes=1 --nproc_per_node=8 your_script.py
206207
```
207208

209+
(torchcomms-support)=
210+
### TorchComms Support
211+
212+
MSCCL++ integrates with [TorchComms](https://github.com/meta-pytorch/torchcomms), enabling PyTorch users to use MSCCL++ collectives through the TorchComms API. This is the recommended way to use MSCCL++ in PyTorch training for mixed-backend setups (e.g., MSCCL++ for allreduce, NCCL for broadcast/barrier).
213+
214+
#### Building
215+
216+
Prerequisites: PyTorch, pybind11, and [torchcomms](https://github.com/meta-pytorch/torchcomms) (`pip install --pre torchcomms`).
217+
218+
```bash
219+
$ mkdir -p build && cd build
220+
$ cmake -DCMAKE_BUILD_TYPE=Release \
221+
-DMSCCLPP_BUILD_EXT_TORCHCOMMS=ON \
222+
..
223+
$ make -j$(nproc)
224+
$ cd ..
225+
```
226+
227+
This produces `_comms_mscclpp.*.so` in the build output. TorchComms discovers MSCCL++ via the `TORCHCOMMS_BACKEND_LIB_PATH_MSCCLPP` environment variable, where `MSCCLPP_BUILD` is your MSCCL++ build directory.
228+
229+
#### Usage
230+
231+
```bash
232+
$ export TORCHCOMMS_BACKEND_LIB_PATH_MSCCLPP=$MSCCLPP_BUILD/lib/_comms_mscclpp.cpython-*.so
233+
$ torchrun --nproc_per_node=8 your_script.py
234+
```
235+
236+
```python
237+
import torch
238+
import torchcomms
239+
240+
# Create an MSCCL++ communicator
241+
comm = torchcomms.new_comm("mscclpp", torch.device(f"cuda:{local_rank}"), name="my_comm")
242+
243+
# Run allreduce (MSCCL++ automatically selects the best algorithm)
244+
comm.all_reduce(tensor, torchcomms.ReduceOp.SUM, False)
245+
246+
# Cleanup
247+
comm.finalize()
248+
```
249+
250+
#### Supported Collectives
251+
252+
| Collective | Status | Notes |
253+
|---|---|---|
254+
| AllReduce | Supported | SUM, MIN. Auto-selects from ~10 native algorithms by message size and topology |
255+
| AllGather | Supported | Fullmesh algorithms |
256+
| ReduceScatter | Dispatched | Requires a registered DSL algorithm |
257+
| AllToAll | Dispatched | Requires a registered DSL algorithm |
258+
| All others | Not supported | Throws with guidance to use a separate NCCL/RCCL communicator |
259+
260+
#### Environment Variables
261+
262+
| Variable | Description |
263+
|---|---|
264+
| `TORCHCOMMS_BACKEND_LIB_PATH_MSCCLPP` | **Required.** Path to the built `_comms_mscclpp.*.so` module |
265+
266+
#### Running Tests
267+
268+
```bash
269+
$ export TORCHCOMMS_BACKEND_LIB_PATH_MSCCLPP=$MSCCLPP_BUILD/lib/_comms_mscclpp.cpython-*.so
270+
$ torchrun --nproc_per_node=8 test/torchcomms/test_correctness.py --all
271+
```
272+
273+
#### Running Benchmarks
274+
275+
```bash
276+
$ export TORCHCOMMS_BACKEND_LIB_PATH_MSCCLPP=$MSCCLPP_BUILD/lib/_comms_mscclpp.cpython-*.so
277+
$ torchrun --nproc_per_node=8 test/torchcomms/bench_torchcomms.py --collective allreduce --warmup 100 --iters 200
278+
$ torchrun --nproc_per_node=8 test/torchcomms/bench_torchcomms.py --collective allgather --warmup 100 --iters 200
279+
```
280+
208281
## Version Tracking
209282

210283
The MSCCL++ Python package includes comprehensive version tracking that captures git repository information at build time. This feature allows users to identify the exact source code version of their installed package.

0 commit comments

Comments
 (0)