Is your feature request related to a problem? Please describe.
In megatron/core/pipeline_parallel/schedules.py#get_forward_backward_func, current comments/docstrings say seq_length is ignored when variable_seq_lengths=True. That’s correct for PP=1, but for PP>1 the pipelined schedules still use seq_length to size P2P activation tensors—e.g., forward_backward_pipelining_without_interleaving(...) calls get_tensor_shapes(seq_length, ...), and forward_backward_pipelining_with_interleaving(...) builds tensor_shape = [seq_length, micro_batch_size, hidden_size]. This mismatch confuses users and can cause shape errors (if seq_length is too small) or wasted memory/bandwidth (if too large).
Describe the solution you'd like
Documentation-only: update comments/docstrings near get_forward_backward_func (and the two pipelined schedule functions) to state that with variable_seq_lengths=True PP=1 ignores seq_length, while PP>1 requires it as the per-step maximum sequence length used to size P2P tensors (actual microbatches may be ≤ seq_length).