Skip to content

Introduce backend rollout-completions interface and decouple OpenEnv helper from vLLM internals#5256

Open
rycerzes wants to merge 8 commits intohuggingface:mainfrom
rycerzes:fix-5194-openenv
Open

Introduce backend rollout-completions interface and decouple OpenEnv helper from vLLM internals#5256
rycerzes wants to merge 8 commits intohuggingface:mainfrom
rycerzes:fix-5194-openenv

Conversation

@rycerzes
Copy link
Contributor

@rycerzes rycerzes commented Mar 10, 2026

Summary

Closes #5194, (previous step #5244, part of #5119) adds an internal rollout-completions capability to backend.py and refactors utils.py to dispatch through trainer.generation_backend, removing direct trainer/backend introspection in helper flow.

Changes

  • backend.py:
    • add RolloutCompletion dataclass
    • add internal RolloutCompletionsBackend protocol
    • implement generate_rollout_completions(...) in VLLMBackendAdapter (server + colocate)
    • raise explicit unsupported-capability errors in non-vLLM adapters
  • utils.py:
    • remove direct branching on trainer.use_vllm / trainer.vllm_mode
    • remove direct trainer.vllm_generation.* calls
    • route via trainer.generation_backend.generate_rollout_completions(...)
    • preserve output schema: prompt_ids, completion_ids, logprobs, text
  • Tests:
    • expanded test_generation_backend.py
    • new test_openenv_utils.py

Preserved

  • Existing OpenEnv helper API and return contract
  • Existing OpenEnv example call sites
  • Backend-specific generation/runtime details remain in adapter layer

CC: @albertvillanova


Note

Cursor Bugbot is generating a summary for commit f9bb56b. Configure here.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.


# Use the base Trainer input preparation path, not trainer-specific overrides
# like GRPO/RLOO _prepare_inputs, to avoid recursive generation.
base_prepare_inputs = super(type(trainer), trainer)._prepare_inputs
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super(type(trainer)) breaks for subclassed trainers

Medium Severity

super(type(trainer), trainer)._prepare_inputs resolves based on the runtime class, not a fixed class. If a user subclasses GRPOTrainer (or RLOOTrainer), type(trainer) is the subclass, and super() lands on the trainer-specific _prepare_inputs override instead of the base Trainer._prepare_inputs. The old inline code used Python 3's argument-free super() inside the trainer method, which always resolved relative to the defining class (GRPOTrainer), correctly skipping its own override. The new standalone factory function can't use __class__-based super(), and the type()-based workaround doesn't skip enough MRO levels for subclassed trainers.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make openenv/utils.py Backend-Agnostic

1 participant