Skip to content
Discussion options

You must be logged in to vote

The overlap_p2p_comm optimization relies on the presence of independent virtual pipeline stages to hide communication latency behind computation.

In the non-interleaved (standard 1F1B) pipeline schedule, each rank owns a single pipeline stage with a strict Recv -> Compute -> Send dependency chain. There is no independent virtual-stage work available to execute while P2P communication is in flight.

As a result, enabling overlap in this setting would either break the dependency graph (by issuing sends/recvs out of order across ranks) or collapse back to serialized execution, defeating the purpose of overlap and risking NCCL ordering violations. For this reason, Megatron explicitly disables o…

Replies: 3 comments 1 reply

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
@new-TonyWang
Comment options

Answer selected by new-TonyWang
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
4 participants
Converted from issue

This discussion was converted from issue #1069 on September 04, 2024 18:13.