microsoft
diff --git a/‎CMakeLists.txt‎
Lines changed: 1 addition & 1 deletion b/‎CMakeLists.txt‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/index.rst‎
Lines changed: 2 additions & 0 deletions b/‎docs/index.rst‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎docs/quickstart.md‎
Lines changed: 4 additions & 56 deletions b/‎docs/quickstart.md‎
Lines changed: 4 additions & 56 deletions
@@ -276,5 +276,5 @@ endif()
 
 # TorchComms MSCCL++ backend
 if(MSCCLPP_BUILD_EXT_TORCHCOMMS)
-    add_subdirectory(python/mscclpp_torchcomm)
+    add_subdirectory(python/mscclpp_torchcomms)
 endif()
@@ -10,6 +10,7 @@ You can find the followings from this documentation.
 
 - **Overview:** An overview of MSCCL++ and its features. :doc:`🔗 <overview>`
 - **Quick Start:** A guide to build, install, and run MSCCL++. :doc:`🔗 <quickstart>`
+- **TorchComms:** Using MSCCL++ as a TorchComms backend for PyTorch training. :doc:`🔗 <torchcomms>`
 - **MSCCL++ DSL:** A guide to get started with the MSCCL++ DSL. :doc:`🔗 <dsl>`
 - **Tutorials:** A step-by-step guide for GPU communication using MSCCL++. :doc:`🔗 <tutorials>`
 - **Programming Guide:** Advanced topics and best practices for using MSCCL++. :doc:`🔗 <programming_guide>`
@@ -22,6 +23,7 @@ You can find the followings from this documentation.
 
    overview
    quickstart
+   torchcomms
    dsl
    tutorials
    programming_guide
 
@@ -232,72 +232,20 @@ torchrun --nnodes=1 --nproc_per_node=8 your_script.py
 
 MSCCL++ integrates with [TorchComms](https://github.com/meta-pytorch/torchcomms), enabling PyTorch users to use MSCCL++ collectives through the TorchComms API. This is the recommended way to use MSCCL++ in PyTorch training for mixed-backend setups (e.g., MSCCL++ for allreduce, NCCL for broadcast/barrier).
 
-#### Building
-
-Prerequisites: PyTorch, pybind11, and [torchcomms](https://github.com/meta-pytorch/torchcomms) (`pip install --pre torchcomms`).
-
 ```bash
-$ mkdir -p build && cd build
-$ cmake -DCMAKE_BUILD_TYPE=Release \
-      -DMSCCLPP_BUILD_EXT_TORCHCOMMS=ON \
-      ..
-$ make -j$(nproc)
-$ cd ..
-```
-
-This produces `_comms_mscclpp.*.so` in the build output. TorchComms discovers MSCCL++ via the `TORCHCOMMS_BACKEND_LIB_PATH_MSCCLPP` environment variable, where `MSCCLPP_BUILD` is your MSCCL++ build directory.
-
-#### Usage
-
-```bash
-$ export TORCHCOMMS_BACKEND_LIB_PATH_MSCCLPP=$MSCCLPP_BUILD/lib/_comms_mscclpp.cpython-*.so
-$ torchrun --nproc_per_node=8 your_script.py
+$ python -m pip install ./python/mscclpp_torchcomms
 ```
 
 ```python
-import torch
 import torchcomms
+import mscclpp_torchcomms  # auto-registers the backend
 
-# Create an MSCCL++ communicator
-comm = torchcomms.new_comm("mscclpp", torch.device(f"cuda:{local_rank}"), name="my_comm")
-
-# Run allreduce (MSCCL++ automatically selects the best algorithm)
+comm = torchcomms.new_comm("mscclpp", device, name="my_comm")
 comm.all_reduce(tensor, torchcomms.ReduceOp.SUM, False)
-
-# Cleanup
 comm.finalize()
 ```
 
-#### Supported Collectives
-
-| Collective | Status | Notes |
-|---|---|---|
-| AllReduce | Supported | SUM, MIN. Auto-selects from ~10 native algorithms by message size and topology |
-| AllGather | Supported | Fullmesh algorithms |
-| ReduceScatter | Dispatched | Requires a registered DSL algorithm |
-| AllToAll | Dispatched | Requires a registered DSL algorithm |
-| All others | Not supported | Throws with guidance to use a separate NCCL/RCCL communicator |
-
-#### Environment Variables
-
-| Variable | Description |
-|---|---|
-| `TORCHCOMMS_BACKEND_LIB_PATH_MSCCLPP` | **Required.** Path to the built `_comms_mscclpp.*.so` module |
-
-#### Running Tests
-
-```bash
-$ export TORCHCOMMS_BACKEND_LIB_PATH_MSCCLPP=$MSCCLPP_BUILD/lib/_comms_mscclpp.cpython-*.so
-$ torchrun --nproc_per_node=8 test/torchcomms/test_correctness.py --all
-```
-
-#### Running Benchmarks
-
-```bash
-$ export TORCHCOMMS_BACKEND_LIB_PATH_MSCCLPP=$MSCCLPP_BUILD/lib/_comms_mscclpp.cpython-*.so
-$ torchrun --nproc_per_node=8 test/torchcomms/bench_torchcomms.py --collective allreduce --warmup 100 --iters 200
-$ torchrun --nproc_per_node=8 test/torchcomms/bench_torchcomms.py --collective allgather --warmup 100 --iters 200
-```
+See [TorchComms Integration](torchcomms.md) for full documentation including architecture, algorithm selection, user-defined algorithms, testing, benchmarks, and troubleshooting.
 
 ## Version Tracking