[QUESTION] why non-interleaved pipeline does not support overlap_p2p_comm? #1070
-
|
Your question if args.num_layers_per_virtual_pipeline_stage is not None:
if args.overlap_p2p_comm:
assert args.pipeline_model_parallel_size > 1, \
'when interleaved schedule is used, pipeline-model-parallel size '\
'should be greater than 1'
else:
assert args.pipeline_model_parallel_size > 2, \
'when interleaved schedule is used and p2p communication overlap is disabled, '\
'pipeline-model-parallel size should be greater than 2 to avoid having multiple '\
'p2p sends and recvs between same 2 ranks per communication batch'
assert args.num_layers % args.transformer_pipeline_model_parallel_size == 0, \
'number of layers should be divisible by the pipeline parallel size'
num_layers_per_pipeline_stage = args.num_layers // args.transformer_pipeline_model_parallel_size
assert num_layers_per_pipeline_stage % args.num_layers_per_virtual_pipeline_stage == 0, \
'number of layers per pipeline stage must be divisible number of layers per virtual pipeline stage'
args.virtual_pipeline_model_parallel_size = num_layers_per_pipeline_stage // \
args.num_layers_per_virtual_pipeline_stage
else:
args.virtual_pipeline_model_parallel_size = None
# Overlap P2P communication is disabled if not using the interleaved schedule.
args.overlap_p2p_comm = False
if args.rank == 0:
print('WARNING: Setting args.overlap_p2p_comm to False since non-interleaved '
'schedule does not support overlapping p2p communication') |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 1 reply
-
|
Hi, I have the same question. Have you managed to solve it? |
Beta Was this translation helpful? Give feedback.
-
|
The In the non-interleaved (standard 1F1B) pipeline schedule, each rank owns a single pipeline stage with a strict As a result, enabling overlap in this setting would either break the dependency graph (by issuing sends/recvs out of order across ranks) or collapse back to serialized execution, defeating the purpose of overlap and risking NCCL ordering violations. For this reason, Megatron explicitly disables Hope this helps, please correct me otherwise. Thank you!! |
Beta Was this translation helpful? Give feedback.
-
|
@CodersAcademy006 What he said is correct. Non-interleaved pipeline parallelism itself does not have space for p2p_comm overlap; this is an algorithmic issue rather than an engineering problem. |
Beta Was this translation helpful? Give feedback.
The
overlap_p2p_commoptimization relies on the presence of independent virtual pipeline stages to hide communication latency behind computation.In the non-interleaved (standard 1F1B) pipeline schedule, each rank owns a single pipeline stage with a strict
Recv -> Compute -> Senddependency chain. There is no independent virtual-stage work available to execute while P2P communication is in flight.As a result, enabling overlap in this setting would either break the dependency graph (by issuing sends/recvs out of order across ranks) or collapse back to serialized execution, defeating the purpose of overlap and risking NCCL ordering violations. For this reason, Megatron explicitly disables
o…