Commit 78af9ac
committed
Address PR #771 review feedback, add pip install and docs
Review feedback (chhwang):
- TorchCommMSCCLPP::init(): replace raw cudaSetDevice with RAII
CudaDeviceGuard to restore previous device on return/exception
- TorchCommMSCCLPP::init(): remove redundant cudaGetDevice call, use
device_.index() directly for compute capability queries
- Add pip install support via separate mscclpp-torchcomms package with
pyproject.toml, scikit-build-core, and auto-discovery of backend .so
- docs/quickstart.md: add tested version table
Review feedback (Copilot bot):
- TorchCommMSCCLPPBootstrap: add "_" delimiter between name and counter
in store key to prevent collisions, make counter_ std::atomic<int>
- TorchCommMSCCLPP::finalize(): wrap cudaStreamSynchronize and
cudaStreamDestroy with MSCCLPP_CUDATHROW for error surfacing
- All 4 supported collectives: replace tensor.contiguous() with
TORCH_CHECK(tensor.is_contiguous()) to prevent silently dropping
results for non-contiguous tensors
- CMakeLists.txt: replace manual glog search with find_package(glog
REQUIRED) for consistency with codebase conventions
Rename and documentation:
- Rename python/mscclpp_torchcomm to python/mscclpp_torchcomms for
consistency with the torchcomms library naming
- Add docs/torchcomms.md: standalone doc covering architecture,
algorithm selection, user-defined algorithms, testing, benchmarks,
limitations, and troubleshooting
- Slim down quickstart.md TorchComms section to brief snippet + link
- Add torchcomms entry to docs/index.rst
- Add import mscclpp_torchcomms to all test/benchmark files for
automatic backend .so discovery (no env var needed)1 parent db92aee commit 78af9ac
23 files changed
Lines changed: 464 additions & 115 deletions
File tree
- docs
- python
- mscclpp_torchcomms
- csrc
- mscclpp_torchcomm
- test/torchcomms
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
276 | 276 | | |
277 | 277 | | |
278 | 278 | | |
279 | | - | |
| 279 | + | |
280 | 280 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
| 13 | + | |
13 | 14 | | |
14 | 15 | | |
15 | 16 | | |
| |||
22 | 23 | | |
23 | 24 | | |
24 | 25 | | |
| 26 | + | |
25 | 27 | | |
26 | 28 | | |
27 | 29 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
232 | 232 | | |
233 | 233 | | |
234 | 234 | | |
235 | | - | |
236 | | - | |
237 | | - | |
238 | | - | |
239 | 235 | | |
240 | | - | |
241 | | - | |
242 | | - | |
243 | | - | |
244 | | - | |
245 | | - | |
246 | | - | |
247 | | - | |
248 | | - | |
249 | | - | |
250 | | - | |
251 | | - | |
252 | | - | |
253 | | - | |
254 | | - | |
| 236 | + | |
255 | 237 | | |
256 | 238 | | |
257 | 239 | | |
258 | | - | |
259 | 240 | | |
| 241 | + | |
260 | 242 | | |
261 | | - | |
262 | | - | |
263 | | - | |
264 | | - | |
| 243 | + | |
265 | 244 | | |
266 | | - | |
267 | | - | |
268 | 245 | | |
269 | 246 | | |
270 | 247 | | |
271 | | - | |
272 | | - | |
273 | | - | |
274 | | - | |
275 | | - | |
276 | | - | |
277 | | - | |
278 | | - | |
279 | | - | |
280 | | - | |
281 | | - | |
282 | | - | |
283 | | - | |
284 | | - | |
285 | | - | |
286 | | - | |
287 | | - | |
288 | | - | |
289 | | - | |
290 | | - | |
291 | | - | |
292 | | - | |
293 | | - | |
294 | | - | |
295 | | - | |
296 | | - | |
297 | | - | |
298 | | - | |
299 | | - | |
300 | | - | |
| 248 | + | |
301 | 249 | | |
302 | 250 | | |
303 | 251 | | |
| |||
0 commit comments