Add LoRA support for AsyncGRPO by jonahsamost · Pull Request #5610 · huggingface/trl

jonahsamost · 2026-04-21T01:57:43Z

What does this PR do?

AsyncGRPO seems to only support full fine-tuning and NCCL weight sync to vLLM. This PR adds LoRA support (it was tested with Gemma 4). HTTP reload was chosen over NCCL because LoRA parameter names don't match vLLM's internal names and fixing that would require vLLM-side changes. It also includes a fix to unfreeze LoRA parameters after model loading since AutoModelForCausalLM.from_pretrained freezes them on load by default. I tested with Gemma4 and GSM8k.

I added a few config fields (use_lora, lora_adapter_path, lora_name) into the AsyncGRPOConfig.

The Gemma4 schema is taken from transformers library transformers/tests/utils/test_chat_parsing_utils.py and the gemma4.jinja file comes from tokenizer.save_pretrained on that same Gemma4 model. But it might make sense to just remove them from the PR and make that a separate PR.

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline, Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

AI writing disclosure

We welcome the use of AI tools to help with contributions. For transparency and to help us improve our review process, please indicate the level of AI involvement in this PR.

No AI usage: the PR was written entirely by a human.
AI-assisted: some parts were suggested or improved by AI, but the PR was written and reviewed by a human.
AI-generated: the PR was mostly or fully generated by an AI tool.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

Note

Medium Risk
Changes weight sync and parameter-freezing behavior in the async training pipeline and adds an HTTP-based vLLM adapter reload path, which could impact training stability and rollout correctness if misconfigured.

Overview
Adds LoRA mode to AsyncGRPOTrainer, including new config flags (use_lora, lora_adapter_path, lora_name) and validation to ensure an adapter path is provided.

When enabled, training now unfreezes only LoRA parameters and switches weight synchronization from NCCL streaming to a save-to-disk + HTTP hot-reload flow: the trainer saves the adapter with save_pretrained(), pauses/resumes vLLM around the write, and instructs the rollout worker to reload via /v1/load_lora_adapter while generation requests target the configured LoRA model name.

Separately, adds Gemma 4 chat support by introducing trl/chat_templates/gemma4.jinja plus a gemma4_schema hook in add_response_schema() for response/tool-call parsing.

^{Reviewed by Cursor Bugbot for commit 9c5daf1. Bugbot is set up for automated code reviews on this repo. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit e158804. Configure here.}

jonahsamost · 2026-04-24T15:46:42Z

@qgallouedec I'm not sure if this is something you guys were interested in merging, let me know

qgallouedec · 2026-04-24T16:21:02Z

Hey! thanks for the pr, yes it's definitely something we want. We will review at some point, please keep this opened

Ofir408 · 2026-05-04T08:00:30Z

Hey! thanks for the pr, yes it's definitely something we want. We will review at some point, please keep this opened

Hi @qgallouedec, any chance you could prioritize this PR integration? AsyncGRPO improves training time, and with LoRA support it would be super useful. Thanks!

qgallouedec

What I don't quite understand is, what do you mean by adding LoRA? is it on the server side? client (ie trainer) side? some configuration seem not compatible. Plus I don't get the need of having additional parameters. If lora is enabled, we should be able to know it directly from the trained model (is you mean, training a lora adapter) or from the server (if you mean, inference with a lora adapter)

qgallouedec · 2026-05-04T20:56:47Z

this change seems unrelated no?

Jonah Samost added 2 commits April 20, 2026 13:16

working lora + gemma4

2c7fdf8

training

e158804

cursor Bot reviewed Apr 21, 2026

View reviewed changes

Comment thread trl/experimental/async_grpo/async_grpo_trainer.py

Comment thread trl/experimental/async_grpo/async_grpo_trainer.py

bugbot

9c5daf1

qgallouedec reviewed May 4, 2026

View reviewed changes

Comment thread trl/chat_templates/gemma4.jinja

Copy link
Copy Markdown

Member

qgallouedec May 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change seems unrelated no?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LoRA support for AsyncGRPO#5610

Add LoRA support for AsyncGRPO#5610
jonahsamost wants to merge 3 commits intohuggingface:mainfrom
jonahsamost:jonah_lora_4_20

jonahsamost commented Apr 21, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

jonahsamost commented Apr 24, 2026

Uh oh!

qgallouedec commented Apr 24, 2026 •

edited

Loading

Uh oh!

Ofir408 commented May 4, 2026 •

edited

Loading

Uh oh!

qgallouedec left a comment

Uh oh!

qgallouedec May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jonahsamost commented Apr 21, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

AI writing disclosure

Who can review?

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jonahsamost commented Apr 24, 2026

Uh oh!

qgallouedec commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ofir408 commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qgallouedec left a comment

Choose a reason for hiding this comment

Uh oh!

qgallouedec May 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jonahsamost commented Apr 21, 2026 •

edited by cursor Bot

Loading

qgallouedec commented Apr 24, 2026 •

edited

Loading

Ofir408 commented May 4, 2026 •

edited

Loading