[Docs-only] Clarify seq_length behavior when variable_seq_lengths=True under pipeline parallelism (PP>1)

*Is your feature request related to a problem? Please describe.*

In megatron/core/pipeline_parallel/schedules.py#get_forward_backward_func, current comments/docstrings say seq_length is ignored when variable_seq_lengths=True. That’s correct for PP=1, but for PP>1 the pipelined schedules still use seq_length to size P2P activation tensors—e.g., forward_backward_pipelining_without_interleaving(...) calls get_tensor_shapes(seq_length, ...), and forward_backward_pipelining_with_interleaving(...) builds tensor_shape = [seq_length, micro_batch_size, hidden_size]. This mismatch confuses users and can cause shape errors (if seq_length is too small) or wasted memory/bandwidth (if too large).

*Describe the solution you'd like*

Documentation-only: update comments/docstrings near get_forward_backward_func (and the two pipelined schedule functions) to state that with variable_seq_lengths=True PP=1 ignores seq_length, while PP>1 requires it as the per-step maximum sequence length used to size P2P tensors (actual microbatches may be ≤ seq_length).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Docs-only] Clarify seq_length behavior when variable_seq_lengths=True under pipeline parallelism (PP>1) #2064

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Docs-only] Clarify seq_length behavior when variable_seq_lengths=True under pipeline parallelism (PP>1) #2064

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions