Commit b8a1059
committed
Add ②편: NCCL 알고리즘 (한국어 초안)
①편 톤으로 풀 재작성. NCCL master (v2.30) 기준 cost model + algorithm
selection 흐름을 코드로 따라간다.
- §1-2 motivation + αβγ cost model
- §3 Tree / Ring / Butterfly 가족 + 8-rank AllReduce 비교
- §4 NCCL 의 algorithm 7 / pattern 6 / protocol 3
- §5 Algorithm deep-dive: Ring, Double Binary Tree (Sanders 2007),
PAT (1-GPU/노드 제약 + producer/worker kernel), NVLS, CollNet
- §6 Protocol Simple / LL / LL128 + LL128 enable path-type 조건
- §7 Selection 머신: ncclTopoTuneModel α/β 테이블 → eligibility filter
→ topoGetAlgoInfo argmin → channel/chunk
- §7.2 baseLatencies / hwLatencies / nvlsEfficiency verbatim
- §8 Numerical example: 8-GPU H100 NVLink + 8-node IB 의 ring/tree/NVLS 비교
- §9 NCCL_ALGO override + determinism + NCCL_DEBUG_SUBSYS=TUNING
- 부록 A/B1 parent edcb0c8 commit b8a1059
1 file changed
Lines changed: 713 additions & 0 deletions
0 commit comments