Skip to content

Commit b207120

Browse files
committed
Clarify intro line on 1:1 communication cost
Replace the awkward "becomes inefficient fast" / "빠르게 비효율로 간다" with a concrete claim: communication time scales linearly with node count.
1 parent 1f0c89f commit b207120

2 files changed

Lines changed: 2 additions & 2 deletions

File tree

_posts/2026-04-21-nccl-collectives.ko.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ mermaid: true
1212

1313
## 1. 왜 Collective 인가
1414

15-
여러 프로세스가 있을 때 1:1 통신만으로 broadcast 나 reduce 같은 집단 동작을 짜면 빠르게 비효율로 간다. Parallel computing 은 그래서 집단 단위 통신 패턴 (collective) 을 공식 API 로 제공한다. MPI 시대부터 굳어진 추상이고, NCCL 은 같은 개념을 GPU 와 NVLink / InfiniBand (IB) / RDMA (Remote Direct Memory Access) 위에 옮겨 놓은 것.
15+
여러 프로세스가 있을 때 1:1 통신만으로 broadcast 나 reduce 같은 집단 동작을 짜면 통신 시간이 노드 수에 선형으로 늘어난다. Parallel computing 은 그래서 집단 단위 통신 패턴 (collective) 을 공식 API 로 제공한다. MPI 시대부터 굳어진 추상이고, NCCL 은 같은 개념을 GPU 와 NVLink / InfiniBand (IB) / RDMA (Remote Direct Memory Access) 위에 옮겨 놓은 것.
1616

1717
이 글은 NCCL 기준이지만 어휘 자체는 MPI 와 호환된다. AllReduce, AllGather 같은 이름이 똑같고, 알고리즘 선택도 비슷한 cost model 을 쓴다.
1818

_posts/2026-04-21-nccl-collectives.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ mermaid: true
1212

1313
## 1. Why Collective?
1414

15-
When many processes are involved, building group-wide actions like broadcast or reduce out of 1:1 communication alone becomes inefficient fast. So parallel computing exposes group-level communication patterns (collectives) as first-class APIs. The abstraction has been settled since the MPI era, and NCCL is the same idea ported onto GPUs and NVLink / InfiniBand (IB) / RDMA (Remote Direct Memory Access).
15+
When many processes are involved, building group-wide actions like broadcast or reduce out of 1:1 communication alone makes communication time scale linearly with node count. So parallel computing exposes group-level communication patterns (collectives) as first-class APIs. The abstraction has been settled since the MPI era, and NCCL is the same idea ported onto GPUs and NVLink / InfiniBand (IB) / RDMA (Remote Direct Memory Access).
1616

1717
This post is NCCL-centric, but the vocabulary is MPI-compatible. Names like AllReduce, AllGather are identical, and the algorithm-selection logic uses a similar cost model.
1818

0 commit comments

Comments
 (0)