Skip to content

[rl] Add native Reasoning Gym environment support#4684

Open
taivu1998 wants to merge 3 commits intomarin-community:mainfrom
taivu1998:tdv/reasoning-gym-pr1
Open

[rl] Add native Reasoning Gym environment support#4684
taivu1998 wants to merge 3 commits intomarin-community:mainfrom
taivu1998:tdv/reasoning-gym-pr1

Conversation

@taivu1998
Copy link
Copy Markdown

Add a native Marin RL adapter for Reasoning Gym plus a minimal curriculum example. This wires direct dataset scoring through the upstream API, preserves pass metrics via correctness_reward for fractional rewards, and adds focused tests for environment loading, composite configs, and rollout statistics.

This adds a native Marin RL adapter for Reasoning Gym so lessons can score completions through the upstream dataset API while keeping rollout metrics compatible with existing workers. It also includes focused tests and a small curriculum example so the MVP is usable and reviewable.
Copy link
Copy Markdown
Author

🤖 Specification

Problem
Marin RL had a Prime Intellect environment path but no native Reasoning Gym adapter, so Reasoning Gym tasks could not be configured as first-class RL lessons through EnvConfig. The missing piece was direct use of reasoning_gym.create_dataset(...) and dataset.score_answer(...) while keeping Marin rollout metrics and worker contracts intact.

Approach
The PR adds ReasoningGymEnv in lib/marin/src/marin/rl/environments/reasoning_gym_env.py, introduces an optional reasoning-gym extra in lib/marin/pyproject.toml, adds a minimal curriculum example in experiments/reasoning_gym_curriculum_examples.py, and covers the integration with focused tests. The environment normalizes composite dataset specs, requires explicit train/eval seeds and sizes, samples deterministically from the dataset, scores completions with the upstream API, records raw episode_reward, and maps binary success into correctness_reward so pass@k-style metrics stay correct for fractional-reward tasks.

Key code

for choice in completion.choices:
    response_text = choice.message.content or ""
    reward = self._score_choice(dataset, example.raw_entry, response_text)
    solved = float(reward >= self.success_threshold)

    rollout = inference_ctx.create_rollout_from_choice(
        prompt=example.prompt,
        choice=choice,
        env_name=self.env_name,
        env_example_id=example.example_id,
        reward=reward,
        correctness_reward=solved,
        temperature=temperature,
        top_k=top_k,
        system_prompt=system_prompt,
    )

This is the main contract seam: Reasoning Gym supplies the raw score, while Marin keeps binary correctness separate for rollout statistics.

Tests
Focused environment and rollout tests were added in tests/rl/environments/test_reasoning_gym_env.py, tests/rl/environments/test_load_environment.py, and tests/rl/test_rollout_worker.py. I also ran ./infra/pre-commit.py --all-files --fix, pytest tests/rl/environments -q, pytest tests/rl/test_curriculum.py tests/rl/test_rollout_worker.py -q, and live package sanity checks against the real reasoning_gym==0.1.19 API for leg_counting, composite, and a ReasoningGymEnv.sample(...) smoke path.

@taivu1998 taivu1998 marked this pull request as ready for review April 13, 2026 05:17
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 09df51cfa6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

def test_load_reasoning_gym_environment(monkeypatch):
"""Test loading ReasoningGymEnv via EnvConfig."""
install_fake_reasoning_gym(monkeypatch)
from marin.rl.environments.reasoning_gym_env import ReasoningGymEnv
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Move ReasoningGymEnv import to module scope

This introduces a local import that violates the repository rule in /workspace/marin/AGENTS.md (“All imports at the top of the file. No local imports except to break circular dependencies or guard optional deps.”). ReasoningGymEnv is not optional at import time here and there is no circular dependency, so keeping this import inside the test function hides import errors until runtime and breaks the enforced project convention.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant