Skip to content

Bug encountered applying LoRA to GRPO #234

@Waylon-John-777

Description

@Waylon-John-777

Describe the bug
I encountered the following issues while using EasyDeL with GPU backend:

  1. When combining LoRA with GRPO, only the first token appears correct during the sampling phase, while the subsequent outputs from the LLM are garbled.

  2. In addition, when running LoRA + GRPO in multi-GPU mode, the partitioning of the LoRA matrices seems to be inconsistent with that of the base model.

To Reproduce
Steps to reproduce the behavior

Similar scripts with tutorials/post-training/easy_peft/

Inside your ORPO/GRPO script's main() function...

1. Load the model as usual

logger.info(f"Loading base model: {MODEL_ID}")
model = ed.AutoEasyDeLModelForCausalLM.from_pretrained(
MODEL_ID,
# ... your model config ...
)

2. Add the magic line

logger.info(f"Applying LoRA with rank={LORA_RANK} to pattern='{LORA_PATTERN}'")
model = model.apply_lora_to_layers(LORA_RANK, LORA_PATTERN)

3. Create your ORPOTrainer or GRPOTrainer as usual.

trainer = ed.ORPOTrainer( # Or ed.GRPOTrainer(...)
arguments=orpo_arguments,
model=model, # Pass the LoRA-modified model
# ... other trainer arguments ...
)

trainer.train()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions