Add LoRA support for AsyncGRPO#5610
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit e158804. Configure here.
|
@qgallouedec I'm not sure if this is something you guys were interested in merging, let me know |
|
Hey! thanks for the pr, yes it's definitely something we want. We will review at some point, please keep this opened |
Hi @qgallouedec, any chance you could prioritize this PR integration? AsyncGRPO improves training time, and with LoRA support it would be super useful. Thanks! |
qgallouedec
left a comment
There was a problem hiding this comment.
What I don't quite understand is, what do you mean by adding LoRA? is it on the server side? client (ie trainer) side? some configuration seem not compatible. Plus I don't get the need of having additional parameters. If lora is enabled, we should be able to know it directly from the trained model (is you mean, training a lora adapter) or from the server (if you mean, inference with a lora adapter)
There was a problem hiding this comment.
this change seems unrelated no?

What does this PR do?
AsyncGRPO seems to only support full fine-tuning and NCCL weight sync to vLLM. This PR adds LoRA support (it was tested with Gemma 4). HTTP reload was chosen over NCCL because LoRA parameter names don't match vLLM's internal names and fixing that would require vLLM-side changes. It also includes a fix to unfreeze LoRA parameters after model loading since
AutoModelForCausalLM.from_pretrainedfreezes them on load by default. I tested with Gemma4 and GSM8k.I added a few config fields (
use_lora, lora_adapter_path, lora_name) into theAsyncGRPOConfig.The Gemma4 schema is taken from transformers library
transformers/tests/utils/test_chat_parsing_utils.pyand the gemma4.jinja file comes fromtokenizer.save_pretrainedon that same Gemma4 model. But it might make sense to just remove them from the PR and make that a separate PR.Fixes # (issue)
Before submitting
AI writing disclosure
We welcome the use of AI tools to help with contributions. For transparency and to help us improve our review process, please indicate the level of AI involvement in this PR.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.
Note
Medium Risk
Changes weight sync and parameter-freezing behavior in the async training pipeline and adds an HTTP-based vLLM adapter reload path, which could impact training stability and rollout correctness if misconfigured.
Overview
Adds LoRA mode to
AsyncGRPOTrainer, including new config flags (use_lora,lora_adapter_path,lora_name) and validation to ensure an adapter path is provided.When enabled, training now unfreezes only LoRA parameters and switches weight synchronization from NCCL streaming to a save-to-disk + HTTP hot-reload flow: the trainer saves the adapter with
save_pretrained(), pauses/resumes vLLM around the write, and instructs the rollout worker to reload via/v1/load_lora_adapterwhile generation requests target the configured LoRAmodelname.Separately, adds Gemma 4 chat support by introducing
trl/chat_templates/gemma4.jinjaplus agemma4_schemahook inadd_response_schema()for response/tool-call parsing.Reviewed by Cursor Bugbot for commit 9c5daf1. Bugbot is set up for automated code reviews on this repo. Configure here.