Skip to content

Commit bb6fb12

Browse files
committed
add link
1 parent b32ad27 commit bb6fb12

1 file changed

Lines changed: 2 additions & 1 deletion

File tree

examples/distributed_training/mfu_benchmark.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -134,7 +134,8 @@ def train_step():
134134
elapsed = elapsed_tensor.item()
135135

136136
step_seconds = elapsed / measure_steps
137-
# Linear layer training FLOPs: forward matmul + dInput + dWeight.
137+
# Linear layer training FLOPs: forward matmul + dInput + dWeight (6ND rule).
138+
# See PaLM paper, Appendix B: https://arxiv.org/abs/2204.02311
138139
flops_per_rank_step = 6.0 * batch_size * hidden_size * hidden_size * layers
139140
tflops_per_gpu = flops_per_rank_step / step_seconds / 1e12
140141
device_name = torch.cuda.get_device_name(device)

0 commit comments

Comments
 (0)