forked from NVIDIA/Megatron-LM
-
Notifications
You must be signed in to change notification settings - Fork 32
Open
Description
Your question
--no-pre-communication-optimization
By default zb runtime dispatches a tiny communication before the real communication to optimize computation
Why?
code
zero-bubble-pipeline-parallelism/megatron/core/pipeline_parallel/zerobubble/runtime.py
Line 756 in 7e03eac
| # Cannot fuse "pre_send" with other send kernels, or they will get stuck |
# Cannot fuse "pre_send" with other send kernels, or they will get stuck
# possibly as there will be 2 send-recv with the same source and target.
with nvtx_range_ctx("pre_send"):
pre_send, _ = multi_pipeline_ops(
pre_sp_tensors, [],
pre_sn_tensors, [],
batch_p2p,
)
with nvtx_range_ctx(send_fused_name):
send_reqs, _ = multi_pipeline_ops(
sp_tensors, [],
sn_tensors, [],
batch_p2p,
)Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels