Align GRPO and RLOO initialization #4685

qgallouedec · 2025-12-12T21:26:49Z

GRPO recently benefited from some improvements in initialization that were not applied to RLOO. This PR aligns the two initializations.

HuggingFaceDocBuilderDev · 2025-12-12T21:30:21Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qgallouedec · 2025-12-12T21:28:35Z

trl/trainer/rloo_trainer.py

    from datasets import load_dataset
    from trl import RLOOTrainer
+    from trl.rewards import accuracy_reward

-    dataset = load_dataset("trl-lib/tldr", split="train")
-
-
-    def reward_func(completions, **kwargs):
-        # Dummy reward function that rewards completions with more unique letters.
-        return [float(len(set(completion))) for completion in completions]
-
+    dataset = load_dataset("trl-lib/DeepMath-103K", split="train")

    trainer = RLOOTrainer(
        model="Qwen/Qwen2-0.5B-Instruct",
-        reward_funcs=reward_func,
+        reward_funcs=accuracy_reward,
        train_dataset=dataset,
    )
-
    trainer.train()


qgallouedec · 2025-12-12T21:30:31Z

trl/trainer/rloo_trainer.py

            model_name = model_name.split("/")[-1]
            args = RLOOConfig(f"{model_name}-RLOO")

-        # Models


all other changes come from #4577

qgallouedec added 2 commits December 12, 2025 21:24

align rloo and grpo

4d7345a

style

21549b2

qgallouedec commented Dec 12, 2025

View reviewed changes

qgallouedec requested review from albertvillanova, edbeeching, kashif and lewtun December 12, 2025 21:31

Merge branch 'main' into align-rloo

c336c9a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Align GRPO and RLOO initialization #4685

Align GRPO and RLOO initialization #4685

Uh oh!

qgallouedec commented Dec 12, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Dec 12, 2025

Uh oh!

qgallouedec Dec 12, 2025

Uh oh!

qgallouedec Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Align GRPO and RLOO initialization #4685

Are you sure you want to change the base?

Align GRPO and RLOO initialization #4685

Uh oh!

Conversation

qgallouedec commented Dec 12, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Dec 12, 2025

Uh oh!

qgallouedec Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

qgallouedec Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants