Bug encountered applying LoRA to GRPO

**Describe the bug**
I encountered the following issues while using EasyDeL with GPU backend: 

1. When combining LoRA with GRPO, only the first token appears correct during the sampling phase, while the subsequent outputs from the LLM are garbled.
 
2. In addition, when running LoRA + GRPO in multi-GPU mode, the partitioning of the LoRA matrices seems to be inconsistent with that of the base model.

**To Reproduce**
Steps to reproduce the behavior

Similar scripts with tutorials/post-training/easy_peft/

### Inside your ORPO/GRPO script's main() function...

### 1. Load the model as usual
logger.info(f"Loading base model: {MODEL_ID}")
model = ed.AutoEasyDeLModelForCausalLM.from_pretrained(
    MODEL_ID,
    # ... your model config ...
)

### 2. Add the magic line
logger.info(f"Applying LoRA with rank={LORA_RANK} to pattern='{LORA_PATTERN}'")
model = model.apply_lora_to_layers(LORA_RANK, LORA_PATTERN)

### 3. Create your ORPOTrainer or GRPOTrainer as usual.
trainer = ed.ORPOTrainer( # Or ed.GRPOTrainer(...)
    arguments=orpo_arguments,
    model=model, # Pass the LoRA-modified model
    # ... other trainer arguments ...
)

trainer.train()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug encountered applying LoRA to GRPO #234

Inside your ORPO/GRPO script's main() function...

1. Load the model as usual

2. Add the magic line

3. Create your ORPOTrainer or GRPOTrainer as usual.

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Bug encountered applying LoRA to GRPO #234

Description

Inside your ORPO/GRPO script's main() function...

1. Load the model as usual

2. Add the magic line

3. Create your ORPOTrainer or GRPOTrainer as usual.

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions