Skip to content

Commit dbedc53

Browse files
authored
Merge pull request #20 from AmourWaltz/main
Revision of 3D Tensor Parallelism
2 parents ef3881e + 127bb03 commit dbedc53

File tree

1 file changed

+9
-9
lines changed

1 file changed

+9
-9
lines changed

docs/chapter8/chapter8_4.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -106,16 +106,16 @@ x = m(x)
106106

107107
3D 张量并行是一种更高级的并行技术,当扩展到更多设备时,3D 张量并行相比 1D 和 2D 张量并行可进一步减少内存和通信成本。3D 张量并行技术将张量分割成立方体形状,并对第一个和最后一个维度进行划分。对于矩阵乘法 $Y=XW$,给定 $2\times2\times2=8$个处理器,我们把输入 $X$ 和权重 $W$ 分别划分为 $[X_{000}\ X_{001}\ X_{010}\ X_{011}\ X_{100} \ X_{101}\ X_{110} \ X_{111}]$ 和 $[W_{000}\ W_{001}\ W_{010}\ W_{011}\ W_{100} \ W_{101}\ W_{110} \ W_{111}]$,假设 $a,b,c$分别代表矩阵的三个维度,每个 $X_{abc}$ 和 $W_{cba}$ 都被存储在 $(a,b,c)$ 的节点上,每个节点上的操作如下表所示
108108

109-
| Rank $$a$$ | Rank $$b$$ | Rank $$c$$ | $$X$$ | $$W$$ | All Gather (+)$$X_{ac}$$ | All Gather (+)$$W_{cb}$$ | Reduce-scatter (-)$$Y$$ |
109+
| Rank $$a$$ | Rank $$b$$ | Rank $$c$$ | $$X$$ | $$W$$ | All Gather (+)$$X_{ac}$$ | All Gather (+)$$W_{cb}$$ | Reduce-scatter (-)$$Y_{abc}=X_{ac}W_{cb}$$ |
110110
| --- | --- | --- | --- | --- | --- | --- | --- |
111-
| 0 | 0 | 0 | $$X_{000}$$ | $$W_{000}$$ | $$X_{00}=X_{000}+X_{010}$$ | $$W_{00}=W_{000}+W_{001}$$ | $$Y_{000}=X_{00}W_{00}-X_{01}W_{10}$$ |
112-
| 0 | 0 | 1 | $$X_{001}$$ | $$W_{100}$$ | $$X_{01}=X_{001}+X_{011}$$ | $$W_{10}=W_{100}+W_{101}$$ | $$Y_{001}=X_{01}W_{10}-X_{00}W_{00}$$ |
113-
| 0 | 1 | 0 | $$X_{010}$$ | $$W_{010}$$ | $$X_{00}=X_{000}+X_{010}$$ | $$W_{01}=W_{010}+W_{011}$$ | $$Y_{010}=X_{00}W_{01}-X_{01}W_{11}$$ |
114-
| 0 | 1 | 1 | $$X_{011}$$ | $$W_{110}$$ | $$X_{01}=X_{000}+X_{011}$$ | $$W_{11}=W_{110}+W_{111}$$ | $$Y_{011}=X_{01}W_{11}-X_{00}W_{01}$$ |
115-
| 1 | 0 | 0 | $$X_{100}$$ | $$W_{001}$$ | $$X_{10}=X_{100}+X_{110}$$ | $$W_{00}=W_{000}+W_{001}$$ | $$Y_{100}=X_{10}W_{00}-X_{11}W_{10}$$ |
116-
| 1 | 0 | 1 | $$X_{101}$$ | $$W_{101}$$ | $$X_{11}=X_{101}+X_{111}$$ | $$W_{10}=W_{100}+W_{101}$$ | $$Y_{100}=X_{11}W_{10}-X_{10}W_{00}$$ |
117-
| 1 | 1 | 0 | $$X_{110}$$ | $$W_{011}$$ | $$X_{10}=X_{100}+X_{110}$$ | $$W_{01}=W_{010}+W_{011}$$ | $$Y_{110}=X_{10}W_{01}-X_{11}W_{11}$$ |
118-
| 1 | 1 | 1 | $$X_{111}$$ | $$W_{111}$$ | $$X_{11}=X_{101}+X_{111}$$ | $$W_{11}=W_{110}+W_{111}$$ | $$Y_{111}=X_{11}W_{11}-X_{10}W_{01}$$ |
111+
| 0 | 0 | 0 | $$X_{000}$$ | $$W_{000}$$ | $$X_{00}=[X_{000}, X_{010}]$$ | $$W_{00}=[W_{000}, W_{001}]$$ | $$Y_{000}=X_{00}W_{00}$$ |
112+
| 0 | 0 | 1 | $$X_{001}$$ | $$W_{100}$$ | $$X_{01}=[X_{001}, X_{011}]$$ | $$W_{10}=[W_{100}, W_{101}]$$ | $$Y_{001}=X_{01}W_{10}$$ |
113+
| 0 | 1 | 0 | $$X_{010}$$ | $$W_{010}$$ | $$X_{00}=[X_{000}, X_{010}]$$ | $$W_{01}=[W_{010}, W_{011}]$$ | $$Y_{010}=X_{00}W_{01}$$ |
114+
| 0 | 1 | 1 | $$X_{011}$$ | $$W_{110}$$ | $$X_{01}=[X_{001}, X_{011}]$$ | $$W_{11}=[W_{110}, W_{111}]$$ | $$Y_{011}=X_{01}W_{11}$$ |
115+
| 1 | 0 | 0 | $$X_{100}$$ | $$W_{001}$$ | $$X_{10}=[X_{100}, X_{110}]$$ | $$W_{00}=[W_{000}, W_{001}]$$ | $$Y_{100}=X_{10}W_{00}$$ |
116+
| 1 | 0 | 1 | $$X_{101}$$ | $$W_{101}$$ | $$X_{11}=[X_{101}, X_{111}]$$ | $$W_{10}=[W_{100}, W_{101}]$$ | $$Y_{100}=X_{11}W_{10}$$ |
117+
| 1 | 1 | 0 | $$X_{110}$$ | $$W_{011}$$ | $$X_{10}=[X_{100}, X_{110}]$$ | $$W_{01}=[W_{010}, W_{011}]$$ | $$Y_{110}=X_{10}W_{01}$$ |
118+
| 1 | 1 | 1 | $$X_{111}$$ | $$W_{111}$$ | $$X_{11}=[X_{101}, X_{111}]$$ | $$W_{11}=[W_{110}, W_{111}]$$ | $$Y_{111}=X_{11}W_{11}$$ |
119119

120120
```python
121121
# 并行设置

0 commit comments

Comments
 (0)