[megatron, GRPO] fix: CP/padding_free repeat_interleave mismatch #6720

HollowMan6 · 2025-11-23T22:49:47Z

PR type

Bug Fix
New Feature
Document Updates
More Models or Datasets Support

PR information

This PR fix the logic to ensure template encoding captures true packed lengths and propagates them through seq_lengths, advantages expansion, and truncation masks. It normalizes packed-sequence metadata in loss_func by aligning lengths_with_padding with the completion tensor width and padding/trimming per-token log-probabilities so downstream splits stay consistent. It guards against invalid metadata by validating the adjusted final length and reusing the helper for ref/old log-probs.

Error stack trace:

Traceback (most recent call last):
 File "ms-swift/swift/cli/_megatron/rlhf.py", line 5, in <module>
   megatron_rlhf_main()
 File "ms-swift/swift/megatron/train/rlhf.py", line 70, in megatron_rlhf_main
   return MegatronRLHF(args).main()
 File "ms-swift/swift/llm/base.py", line 49, in main
   result = self.run()
 File "ms-swift/swift/megatron/train/sft.py", line 63, in run
   self.trainer.train(train_dataset, val_dataset, data_collator)
 File "ms-swift/swift/megatron/trainers/grpo_trainer.py", line 66, in train
   super().train(train_dataset, val_dataset, data_collator)
 File "ms-swift/swift/megatron/trainers/base.py", line 990, in train
   pretrain(
 File "megatron/training/training.py", line 710, in pretrain
   iteration, num_floating_point_operations_so_far = train(
 File "megatron/training/training.py", line 2122, in train
   ) = train_step(
 File "ms-swift/swift/megatron/trainers/base.py", line 496, in train_step
   new_data_iterator = self._replace_data_iterator(data_iterator, model)
 File "ms-swift/swift/megatron/trainers/grpo_trainer.py", line 468, in _replace_data_iterator
   micro_batch_data = self._generate_and_score_completions(rollout_batch)
 File "ms-swift/swift/megatron/trainers/grpo_trainer.py", line 557, in _generate_and_score_completions
   micro_batch_data = _get_encoded_batch(micro_batch_data, micro_batch_advantages)
 File "ms-swift/swift/megatron/trainers/grpo_trainer.py", line 521, in _get_encoded_batch
   advantages = torch.repeat_interleave(advantages, lengths)
RuntimeError: repeats must have the same size as input along dim, but got repeats.size(0) = 2 and input.size(0) = 1

Experiment results

Script to reproduce on with 8 GPUs:

HF_HOME=${HF_HOME:-"/models"}
MODEL=${MODEL:-"Qwen/Qwen3-30B-A3B"}
LORA_RANK=${LORA_RANK:-8}
LORA_ALPHA=${LORA_ALPHA:-16}
COMMON_TP=${COMMON_TP:-4}
COMMON_EP=${COMMON_EP:-8}
COMMON_PP=${COMMON_PP:-1}
COMMON_CP=${COMMON_CP:-2}
INFER_TP=${INFER_TP:-4}

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
NNODES=1 \
NODE_RANK=0 \
MASTER_ADDR=127.0.0.1 \
MASTER_PORT=29500 \
NPROC_PER_NODE=8 \
PYTORCH_CUDA_ALLOC_CONF='expandable_segments:True' \
megatron rlhf \
    --rlhf_type grpo \
    --model "$HF_HOME/$MODEL" \
    --load_safetensors true \
    --save_safetensors true \
    --context_parallel_size $COMMON_CP \
    --tensor_model_parallel_size $COMMON_TP \
    --expert_model_parallel_size $COMMON_EP \
    --expert_tensor_parallel_size 1 \
    --pipeline_model_parallel_size $COMMON_PP \
    --dataset 'zouxuhong/Countdown-Tasks-3to4#50000' \
    --system 'You are a helpful assistant. You first thinks about the reasoning process in the mind and then provides the user with the answer.' \
    --max_epochs 1 \
    --global_batch_size 16 \
    --micro_batch_size 1 \
    --steps_per_generation 2 \
    --num_generations 8 \
    --external_plugins /workspace/ms-swift/examples/train/grpo/plugin/plugin.py \
    --reward_funcs external_countdown format \
    --use_vllm true \
    --vllm_mode colocate \
    --vllm_gpu_memory_utilization 0.3 \
    --vllm_tensor_parallel_size $INFER_TP \
    --vllm_max_model_len 4096 \
    --max_length 2048 \
    --max_completion_length 2048 \
    --train_type lora \
    --lora_rank $LORA_RANK \
    --lora_alpha $LORA_ALPHA \
    --lr 5e-5 \
    --bf16 true \
    --beta 0.001 \
    --importance_sampling_level sequence \
    --epsilon 3e-4 \
    --epsilon_high 4e-4 \
    --dynamic_sample false \
    --overlong_filter true \
    --loss_type grpo \
    --sleep_level 2 \
    --offload_model false \
    --offload_optimizer true \
    --log_interval 1 \
    --recompute_granularity full \
    --recompute_method uniform \
    --recompute_num_layers 1 \
    --finetune \
    --num_workers 8 \
    --dataset_num_proc 8 \
    --no_save_optim \
    --no_save_rng \
    --attention_backend flash \
    --temperature 1.0 \
    --padding_free true \
    --sequence_parallel true \
    --log_completions true

_{✨ Presented to you with Mind Lab - A Lab for Experiential Intelligence.}

This PR fix the logic to ensure template encoding captures true packed lengths and propagates them through seq_lengths, advantages expansion, and truncation masks. It normalizes packed-sequence metadata in loss_func by aligning lengths_with_padding with the completion tensor width and padding/trimming per-token log-probabilities so downstream splits stay consistent. It guards against invalid metadata by validating the adjusted final length and reusing the helper for ref/old log-probs. Error stack trace: ```logs Traceback (most recent call last): File "ms-swift/swift/cli/_megatron/rlhf.py", line 5, in <module> megatron_rlhf_main() File "ms-swift/swift/megatron/train/rlhf.py", line 70, in megatron_rlhf_main return MegatronRLHF(args).main() File "ms-swift/swift/llm/base.py", line 49, in main result = self.run() File "ms-swift/swift/megatron/train/sft.py", line 63, in run self.trainer.train(train_dataset, val_dataset, data_collator) File "ms-swift/swift/megatron/trainers/grpo_trainer.py", line 66, in train super().train(train_dataset, val_dataset, data_collator) File "ms-swift/swift/megatron/trainers/base.py", line 990, in train pretrain( File "megatron/training/training.py", line 710, in pretrain iteration, num_floating_point_operations_so_far = train( File "megatron/training/training.py", line 2122, in train ) = train_step( File "ms-swift/swift/megatron/trainers/base.py", line 496, in train_step new_data_iterator = self._replace_data_iterator(data_iterator, model) File "ms-swift/swift/megatron/trainers/grpo_trainer.py", line 468, in _replace_data_iterator micro_batch_data = self._generate_and_score_completions(rollout_batch) File "ms-swift/swift/megatron/trainers/grpo_trainer.py", line 557, in _generate_and_score_completions micro_batch_data = _get_encoded_batch(micro_batch_data, micro_batch_advantages) File "ms-swift/swift/megatron/trainers/grpo_trainer.py", line 521, in _get_encoded_batch advantages = torch.repeat_interleave(advantages, lengths) RuntimeError: repeats must have the same size as input along dim, but got repeats.size(0) = 2 and input.size(0) = 1 ``` Signed-off-by: Hollow Man <[email protected]>

gemini-code-assist · 2025-11-23T22:50:03Z

Summary of Changes

Hello @HollowMan6, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical bug in the Megatron GRPO trainer related to tensor dimension mismatches, specifically when using padding_free sequences. The fix ensures that sequence lengths, advantages, and log-probability tensors are correctly aligned and padded/trimmed, preventing RuntimeError during torch.repeat_interleave operations and maintaining data consistency throughout the training pipeline.

Highlights

Corrected repeat_interleave mismatch: The primary fix addresses a RuntimeError in torch.repeat_interleave by ensuring the advantages tensor is repeated according to accurately calculated sequence lengths (lengths_tensor) derived directly from encoded batch lengths, rather than inferred from position_ids.
Normalized packed-sequence metadata: The loss_func now includes logic to align lengths_with_padding with the completion_mask's width. It dynamically adjusts the last length if there's a mismatch and adds a validation check to prevent negative padded lengths.
Consistent log-probability tensor dimensions: A new helper function, _pad_or_trim_last_dim, is introduced and applied to per_token_logps, ref_per_token_logps, and old_per_token_logps to ensure their last dimension consistently matches the target_token_count, preventing downstream dimension mismatches.
Simplified length calculation: The method for determining sequence lengths for repeat_interleave has been simplified by directly using the lengths returned during template encoding, removing complex position_ids manipulation.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request effectively resolves a repeat_interleave mismatch error that occurred during GRPO training with context parallelism. The fix correctly simplifies the logic for determining sequence lengths by using the length returned directly from the template's encode method, rather than relying on brittle calculations from position_ids. Furthermore, the introduction of a padding and trimming mechanism within the loss function is a robust solution to prevent tensor shape mismatches, which is crucial for stability in distributed training environments. The changes are well-reasoned and directly address the bug described.

swift/megatron/trainers/grpo_trainer.py

gemini-code-assist bot reviewed Nov 23, 2025

View reviewed changes

swift/megatron/trainers/grpo_trainer.py Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[megatron, GRPO] fix: CP/padding_free repeat_interleave mismatch #6720

[megatron, GRPO] fix: CP/padding_free repeat_interleave mismatch #6720

HollowMan6 commented Nov 23, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Nov 23, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[megatron, GRPO] fix: CP/padding_free repeat_interleave mismatch #6720

Are you sure you want to change the base?

[megatron, GRPO] fix: CP/padding_free repeat_interleave mismatch #6720

Conversation

HollowMan6 commented Nov 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR type

PR information

Experiment results

Uh oh!

gemini-code-assist bot commented Nov 23, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

HollowMan6 commented Nov 23, 2025 •

edited

Loading