Skip to content

Commit b8a1059

Browse files
committed
Add ②편: NCCL 알고리즘 (한국어 초안)
①편 톤으로 풀 재작성. NCCL master (v2.30) 기준 cost model + algorithm selection 흐름을 코드로 따라간다. - §1-2 motivation + αβγ cost model - §3 Tree / Ring / Butterfly 가족 + 8-rank AllReduce 비교 - §4 NCCL 의 algorithm 7 / pattern 6 / protocol 3 - §5 Algorithm deep-dive: Ring, Double Binary Tree (Sanders 2007), PAT (1-GPU/노드 제약 + producer/worker kernel), NVLS, CollNet - §6 Protocol Simple / LL / LL128 + LL128 enable path-type 조건 - §7 Selection 머신: ncclTopoTuneModel α/β 테이블 → eligibility filter → topoGetAlgoInfo argmin → channel/chunk - §7.2 baseLatencies / hwLatencies / nvlsEfficiency verbatim - §8 Numerical example: 8-GPU H100 NVLink + 8-node IB 의 ring/tree/NVLS 비교 - §9 NCCL_ALGO override + determinism + NCCL_DEBUG_SUBSYS=TUNING - 부록 A/B
1 parent edcb0c8 commit b8a1059

1 file changed

Lines changed: 713 additions & 0 deletions

File tree

0 commit comments

Comments
 (0)