Skip to content

Conversation

@HollowMan6
Copy link

@HollowMan6 HollowMan6 commented Nov 23, 2025

PR type

  • Bug Fix
  • New Feature
  • Document Updates
  • More Models or Datasets Support

PR information

This PR fix the logic to ensure template encoding captures true packed lengths and propagates them through seq_lengths, advantages expansion, and truncation masks. It normalizes packed-sequence metadata in loss_func by aligning lengths_with_padding with the completion tensor width and padding/trimming per-token log-probabilities so downstream splits stay consistent. It guards against invalid metadata by validating the adjusted final length and reusing the helper for ref/old log-probs.

Error stack trace:

Traceback (most recent call last):
 File "ms-swift/swift/cli/_megatron/rlhf.py", line 5, in <module>
   megatron_rlhf_main()
 File "ms-swift/swift/megatron/train/rlhf.py", line 70, in megatron_rlhf_main
   return MegatronRLHF(args).main()
 File "ms-swift/swift/llm/base.py", line 49, in main
   result = self.run()
 File "ms-swift/swift/megatron/train/sft.py", line 63, in run
   self.trainer.train(train_dataset, val_dataset, data_collator)
 File "ms-swift/swift/megatron/trainers/grpo_trainer.py", line 66, in train
   super().train(train_dataset, val_dataset, data_collator)
 File "ms-swift/swift/megatron/trainers/base.py", line 990, in train
   pretrain(
 File "megatron/training/training.py", line 710, in pretrain
   iteration, num_floating_point_operations_so_far = train(
 File "megatron/training/training.py", line 2122, in train
   ) = train_step(
 File "ms-swift/swift/megatron/trainers/base.py", line 496, in train_step
   new_data_iterator = self._replace_data_iterator(data_iterator, model)
 File "ms-swift/swift/megatron/trainers/grpo_trainer.py", line 468, in _replace_data_iterator
   micro_batch_data = self._generate_and_score_completions(rollout_batch)
 File "ms-swift/swift/megatron/trainers/grpo_trainer.py", line 557, in _generate_and_score_completions
   micro_batch_data = _get_encoded_batch(micro_batch_data, micro_batch_advantages)
 File "ms-swift/swift/megatron/trainers/grpo_trainer.py", line 521, in _get_encoded_batch
   advantages = torch.repeat_interleave(advantages, lengths)
RuntimeError: repeats must have the same size as input along dim, but got repeats.size(0) = 2 and input.size(0) = 1

Experiment results

Script to reproduce on with 8 GPUs:

HF_HOME=${HF_HOME:-"/models"}
MODEL=${MODEL:-"Qwen/Qwen3-30B-A3B"}
LORA_RANK=${LORA_RANK:-8}
LORA_ALPHA=${LORA_ALPHA:-16}
COMMON_TP=${COMMON_TP:-4}
COMMON_EP=${COMMON_EP:-8}
COMMON_PP=${COMMON_PP:-1}
COMMON_CP=${COMMON_CP:-2}
INFER_TP=${INFER_TP:-4}

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
NNODES=1 \
NODE_RANK=0 \
MASTER_ADDR=127.0.0.1 \
MASTER_PORT=29500 \
NPROC_PER_NODE=8 \
PYTORCH_CUDA_ALLOC_CONF='expandable_segments:True' \
megatron rlhf \
    --rlhf_type grpo \
    --model "$HF_HOME/$MODEL" \
    --load_safetensors true \
    --save_safetensors true \
    --context_parallel_size $COMMON_CP \
    --tensor_model_parallel_size $COMMON_TP \
    --expert_model_parallel_size $COMMON_EP \
    --expert_tensor_parallel_size 1 \
    --pipeline_model_parallel_size $COMMON_PP \
    --dataset 'zouxuhong/Countdown-Tasks-3to4#50000' \
    --system 'You are a helpful assistant. You first thinks about the reasoning process in the mind and then provides the user with the answer.' \
    --max_epochs 1 \
    --global_batch_size 16 \
    --micro_batch_size 1 \
    --steps_per_generation 2 \
    --num_generations 8 \
    --external_plugins /workspace/ms-swift/examples/train/grpo/plugin/plugin.py \
    --reward_funcs external_countdown format \
    --use_vllm true \
    --vllm_mode colocate \
    --vllm_gpu_memory_utilization 0.3 \
    --vllm_tensor_parallel_size $INFER_TP \
    --vllm_max_model_len 4096 \
    --max_length 2048 \
    --max_completion_length 2048 \
    --train_type lora \
    --lora_rank $LORA_RANK \
    --lora_alpha $LORA_ALPHA \
    --lr 5e-5 \
    --bf16 true \
    --beta 0.001 \
    --importance_sampling_level sequence \
    --epsilon 3e-4 \
    --epsilon_high 4e-4 \
    --dynamic_sample false \
    --overlong_filter true \
    --loss_type grpo \
    --sleep_level 2 \
    --offload_model false \
    --offload_optimizer true \
    --log_interval 1 \
    --recompute_granularity full \
    --recompute_method uniform \
    --recompute_num_layers 1 \
    --finetune \
    --num_workers 8 \
    --dataset_num_proc 8 \
    --no_save_optim \
    --no_save_rng \
    --attention_backend flash \
    --temperature 1.0 \
    --padding_free true \
    --sequence_parallel true \
    --log_completions true

✨ Presented to you with Mind Lab - A Lab for Experiential Intelligence.

This PR fix the logic to ensure template encoding captures true
packed lengths and propagates them through seq_lengths, advantages
expansion, and truncation masks. It normalizes packed-sequence
metadata in loss_func by aligning lengths_with_padding with the
completion tensor width and padding/trimming per-token
log-probabilities so downstream splits stay consistent. It guards
against invalid metadata by validating the adjusted final length
and reusing the helper for ref/old log-probs.

Error stack trace:
```logs
Traceback (most recent call last):
 File "ms-swift/swift/cli/_megatron/rlhf.py", line 5, in <module>
   megatron_rlhf_main()
 File "ms-swift/swift/megatron/train/rlhf.py", line 70, in megatron_rlhf_main
   return MegatronRLHF(args).main()
 File "ms-swift/swift/llm/base.py", line 49, in main
   result = self.run()
 File "ms-swift/swift/megatron/train/sft.py", line 63, in run
   self.trainer.train(train_dataset, val_dataset, data_collator)
 File "ms-swift/swift/megatron/trainers/grpo_trainer.py", line 66, in train
   super().train(train_dataset, val_dataset, data_collator)
 File "ms-swift/swift/megatron/trainers/base.py", line 990, in train
   pretrain(
 File "megatron/training/training.py", line 710, in pretrain
   iteration, num_floating_point_operations_so_far = train(
 File "megatron/training/training.py", line 2122, in train
   ) = train_step(
 File "ms-swift/swift/megatron/trainers/base.py", line 496, in train_step
   new_data_iterator = self._replace_data_iterator(data_iterator, model)
 File "ms-swift/swift/megatron/trainers/grpo_trainer.py", line 468, in _replace_data_iterator
   micro_batch_data = self._generate_and_score_completions(rollout_batch)
 File "ms-swift/swift/megatron/trainers/grpo_trainer.py", line 557, in _generate_and_score_completions
   micro_batch_data = _get_encoded_batch(micro_batch_data, micro_batch_advantages)
 File "ms-swift/swift/megatron/trainers/grpo_trainer.py", line 521, in _get_encoded_batch
   advantages = torch.repeat_interleave(advantages, lengths)
RuntimeError: repeats must have the same size as input along dim, but got repeats.size(0) = 2 and input.size(0) = 1
```

Signed-off-by: Hollow Man <[email protected]>
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @HollowMan6, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical bug in the Megatron GRPO trainer related to tensor dimension mismatches, specifically when using padding_free sequences. The fix ensures that sequence lengths, advantages, and log-probability tensors are correctly aligned and padded/trimmed, preventing RuntimeError during torch.repeat_interleave operations and maintaining data consistency throughout the training pipeline.

Highlights

  • Corrected repeat_interleave mismatch: The primary fix addresses a RuntimeError in torch.repeat_interleave by ensuring the advantages tensor is repeated according to accurately calculated sequence lengths (lengths_tensor) derived directly from encoded batch lengths, rather than inferred from position_ids.
  • Normalized packed-sequence metadata: The loss_func now includes logic to align lengths_with_padding with the completion_mask's width. It dynamically adjusts the last length if there's a mismatch and adds a validation check to prevent negative padded lengths.
  • Consistent log-probability tensor dimensions: A new helper function, _pad_or_trim_last_dim, is introduced and applied to per_token_logps, ref_per_token_logps, and old_per_token_logps to ensure their last dimension consistently matches the target_token_count, preventing downstream dimension mismatches.
  • Simplified length calculation: The method for determining sequence lengths for repeat_interleave has been simplified by directly using the lengths returned during template encoding, removing complex position_ids manipulation.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively resolves a repeat_interleave mismatch error that occurred during GRPO training with context parallelism. The fix correctly simplifies the logic for determining sequence lengths by using the length returned directly from the template's encode method, rather than relying on brittle calculations from position_ids. Furthermore, the introduction of a padding and trimming mechanism within the loss function is a robust solution to prevent tensor shape mismatches, which is crucial for stability in distributed training environments. The changes are well-reasoned and directly address the bug described.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant