[rl] Add native Reasoning Gym environment support by taivu1998 · Pull Request #4684 · marin-community/marin

taivu1998 · 2026-04-13T00:33:21Z

Add a native Marin RL adapter for Reasoning Gym plus a minimal curriculum example. This wires direct dataset scoring through the upstream API, preserves pass metrics via correctness_reward for fractional rewards, and adds focused tests for environment loading, composite configs, and rollout statistics.

This adds a native Marin RL adapter for Reasoning Gym so lessons can score completions through the upstream dataset API while keeping rollout metrics compatible with existing workers. It also includes focused tests and a small curriculum example so the MVP is usable and reviewable.

taivu1998 · 2026-04-13T00:35:00Z

🤖 Specification

Problem
Marin RL had a Prime Intellect environment path but no native Reasoning Gym adapter, so Reasoning Gym tasks could not be configured as first-class RL lessons through EnvConfig. The missing piece was direct use of reasoning_gym.create_dataset(...) and dataset.score_answer(...) while keeping Marin rollout metrics and worker contracts intact.

Approach
The PR adds ReasoningGymEnv in lib/marin/src/marin/rl/environments/reasoning_gym_env.py, introduces an optional reasoning-gym extra in lib/marin/pyproject.toml, adds a minimal curriculum example in experiments/reasoning_gym_curriculum_examples.py, and covers the integration with focused tests. The environment normalizes composite dataset specs, requires explicit train/eval seeds and sizes, samples deterministically from the dataset, scores completions with the upstream API, records raw episode_reward, and maps binary success into correctness_reward so pass@k-style metrics stay correct for fractional-reward tasks.

Key code

for choice in completion.choices:
    response_text = choice.message.content or ""
    reward = self._score_choice(dataset, example.raw_entry, response_text)
    solved = float(reward >= self.success_threshold)

    rollout = inference_ctx.create_rollout_from_choice(
        prompt=example.prompt,
        choice=choice,
        env_name=self.env_name,
        env_example_id=example.example_id,
        reward=reward,
        correctness_reward=solved,
        temperature=temperature,
        top_k=top_k,
        system_prompt=system_prompt,
    )

This is the main contract seam: Reasoning Gym supplies the raw score, while Marin keeps binary correctness separate for rollout statistics.

Tests
Focused environment and rollout tests were added in tests/rl/environments/test_reasoning_gym_env.py, tests/rl/environments/test_load_environment.py, and tests/rl/test_rollout_worker.py. I also ran ./infra/pre-commit.py --all-files --fix, pytest tests/rl/environments -q, pytest tests/rl/test_curriculum.py tests/rl/test_rollout_worker.py -q, and live package sanity checks against the real reasoning_gym==0.1.19 API for leg_counting, composite, and a ReasoningGymEnv.sample(...) smoke path.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 09df51cfa6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-13T05:21:14Z

tests/rl/environments/test_load_environment.py

+def test_load_reasoning_gym_environment(monkeypatch):
+    """Test loading ReasoningGymEnv via EnvConfig."""
+    install_fake_reasoning_gym(monkeypatch)
+    from marin.rl.environments.reasoning_gym_env import ReasoningGymEnv


Move ReasoningGymEnv import to module scope

This introduces a local import that violates the repository rule in /workspace/marin/AGENTS.md (“All imports at the top of the file. No local imports except to break circular dependencies or guard optional deps.”). ReasoningGymEnv is not optional at import time here and there is no circular dependency, so keeping this import inside the test function hides import errors until runtime and breaks the enforced project convention.

Useful? React with 👍 / 👎.

taivu1998 added 2 commits April 12, 2026 19:55

Merge origin/main into tdv/reasoning-gym-pr1

00bdcc8

[rl] Minimize Reasoning Gym lockfile churn

09df51c

taivu1998 marked this pull request as ready for review April 13, 2026 05:17

chatgpt-codex-connector bot reviewed Apr 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rl] Add native Reasoning Gym environment support#4684

[rl] Add native Reasoning Gym environment support#4684
taivu1998 wants to merge 3 commits intomarin-community:mainfrom
taivu1998:tdv/reasoning-gym-pr1

taivu1998 commented Apr 13, 2026

Uh oh!

taivu1998 commented Apr 13, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

taivu1998 commented Apr 13, 2026

Uh oh!

taivu1998 commented Apr 13, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant