Introduce backend rollout-completions interface and decouple OpenEnv helper from vLLM internals by rycerzes · Pull Request #5256 · huggingface/trl

rycerzes · 2026-03-10T06:12:36Z

Summary

Closes #5194, (previous step #5244, part of #5119) adds an internal rollout-completions capability to backend.py and refactors utils.py to dispatch through trainer.generation_backend, removing direct trainer/backend introspection in helper flow.

Changes

backend.py:
- add RolloutCompletion dataclass
- add internal RolloutCompletionsBackend protocol
- implement generate_rollout_completions(...) in VLLMBackendAdapter (server + colocate)
- raise explicit unsupported-capability errors in non-vLLM adapters
utils.py:
- remove direct branching on trainer.use_vllm / trainer.vllm_mode
- remove direct trainer.vllm_generation.* calls
- route via trainer.generation_backend.generate_rollout_completions(...)
- preserve output schema: prompt_ids, completion_ids, logprobs, text
Tests:
- expanded test_generation_backend.py
- new test_openenv_utils.py

Preserved

Existing OpenEnv helper API and return contract
Existing OpenEnv example call sites
Backend-specific generation/runtime details remain in adapter layer

CC: @albertvillanova

Note

^{Cursor Bugbot is generating a summary for commit f9bb56b. Configure here.}

…neration and sync weights

- prevent recursive generation

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-03-10T06:21:11Z

trl/generation/backend.py

+
+    # Use the base Trainer input preparation path, not trainer-specific overrides
+    # like GRPO/RLOO _prepare_inputs, to avoid recursive generation.
+    base_prepare_inputs = super(type(trainer), trainer)._prepare_inputs


super(type(trainer)) breaks for subclassed trainers

Medium Severity

super(type(trainer), trainer)._prepare_inputs resolves based on the runtime class, not a fixed class. If a user subclasses GRPOTrainer (or RLOOTrainer), type(trainer) is the subclass, and super() lands on the trainer-specific _prepare_inputs override instead of the base Trainer._prepare_inputs. The old inline code used Python 3's argument-free super() inside the trainer method, which always resolved relative to the defining class (GRPOTrainer), correctly skipping its own override. The new standalone factory function can't use __class__-based super(), and the type()-based workaround doesn't skip enough MRO levels for subclassed trainers.

rycerzes added 8 commits March 7, 2026 14:42

add GenerationBackend and associated adapters for model generation

d43f9de

refactor GRPOTrainer to utilize generation_backend for single turn ge…

96c57ee

…neration and sync weights

use generation_backend in RLOOTrainer

80f80b0

refactor create_generation_backend to use base trainer input preparation

5f161d8

- prevent recursive generation

tests for single generation backend

1862c15

add rollout capability to backend layer

0f390bc

refactor openenv/utils to be orchestration-only

65fafaa

add tests for generate_rollout_completions in VLLMBackendAdapter

f9bb56b

cursor bot reviewed Mar 10, 2026

View reviewed changes

rycerzes mentioned this pull request Mar 10, 2026

Introduce minimal generation backend interface for GRPO and RLOO trainers #5244

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce backend rollout-completions interface and decouple OpenEnv helper from vLLM internals#5256

Introduce backend rollout-completions interface and decouple OpenEnv helper from vLLM internals#5256
rycerzes wants to merge 8 commits intohuggingface:mainfrom
rycerzes:fix-5194-openenv

rycerzes commented Mar 10, 2026 •

edited

Loading

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rycerzes commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Preserved

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Mar 10, 2026

Choose a reason for hiding this comment

super(type(trainer)) breaks for subclassed trainers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rycerzes commented Mar 10, 2026 •

edited

Loading

`super(type(trainer))` breaks for subclassed trainers