Describe the bug
I encountered the following issues while using EasyDeL with GPU backend:
-
When combining LoRA with GRPO, only the first token appears correct during the sampling phase, while the subsequent outputs from the LLM are garbled.
-
In addition, when running LoRA + GRPO in multi-GPU mode, the partitioning of the LoRA matrices seems to be inconsistent with that of the base model.
To Reproduce
Steps to reproduce the behavior
Similar scripts with tutorials/post-training/easy_peft/
Inside your ORPO/GRPO script's main() function...
1. Load the model as usual
logger.info(f"Loading base model: {MODEL_ID}")
model = ed.AutoEasyDeLModelForCausalLM.from_pretrained(
MODEL_ID,
# ... your model config ...
)
2. Add the magic line
logger.info(f"Applying LoRA with rank={LORA_RANK} to pattern='{LORA_PATTERN}'")
model = model.apply_lora_to_layers(LORA_RANK, LORA_PATTERN)
3. Create your ORPOTrainer or GRPOTrainer as usual.
trainer = ed.ORPOTrainer( # Or ed.GRPOTrainer(...)
arguments=orpo_arguments,
model=model, # Pass the LoRA-modified model
# ... other trainer arguments ...
)
trainer.train()
Describe the bug
I encountered the following issues while using EasyDeL with GPU backend:
When combining LoRA with GRPO, only the first token appears correct during the sampling phase, while the subsequent outputs from the LLM are garbled.
In addition, when running LoRA + GRPO in multi-GPU mode, the partitioning of the LoRA matrices seems to be inconsistent with that of the base model.
To Reproduce
Steps to reproduce the behavior
Similar scripts with tutorials/post-training/easy_peft/
Inside your ORPO/GRPO script's main() function...
1. Load the model as usual
logger.info(f"Loading base model: {MODEL_ID}")
model = ed.AutoEasyDeLModelForCausalLM.from_pretrained(
MODEL_ID,
# ... your model config ...
)
2. Add the magic line
logger.info(f"Applying LoRA with rank={LORA_RANK} to pattern='{LORA_PATTERN}'")
model = model.apply_lora_to_layers(LORA_RANK, LORA_PATTERN)
3. Create your ORPOTrainer or GRPOTrainer as usual.
trainer = ed.ORPOTrainer( # Or ed.GRPOTrainer(...)
arguments=orpo_arguments,
model=model, # Pass the LoRA-modified model
# ... other trainer arguments ...
)
trainer.train()