[rl] Add native Reasoning Gym environment support#4684
[rl] Add native Reasoning Gym environment support#4684taivu1998 wants to merge 3 commits intomarin-community:mainfrom
Conversation
This adds a native Marin RL adapter for Reasoning Gym so lessons can score completions through the upstream dataset API while keeping rollout metrics compatible with existing workers. It also includes focused tests and a small curriculum example so the MVP is usable and reviewable.
|
🤖 Specification Problem Approach Key code for choice in completion.choices:
response_text = choice.message.content or ""
reward = self._score_choice(dataset, example.raw_entry, response_text)
solved = float(reward >= self.success_threshold)
rollout = inference_ctx.create_rollout_from_choice(
prompt=example.prompt,
choice=choice,
env_name=self.env_name,
env_example_id=example.example_id,
reward=reward,
correctness_reward=solved,
temperature=temperature,
top_k=top_k,
system_prompt=system_prompt,
)This is the main contract seam: Reasoning Gym supplies the raw score, while Marin keeps binary correctness separate for rollout statistics. Tests |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 09df51cfa6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| def test_load_reasoning_gym_environment(monkeypatch): | ||
| """Test loading ReasoningGymEnv via EnvConfig.""" | ||
| install_fake_reasoning_gym(monkeypatch) | ||
| from marin.rl.environments.reasoning_gym_env import ReasoningGymEnv |
There was a problem hiding this comment.
Move ReasoningGymEnv import to module scope
This introduces a local import that violates the repository rule in /workspace/marin/AGENTS.md (“All imports at the top of the file. No local imports except to break circular dependencies or guard optional deps.”). ReasoningGymEnv is not optional at import time here and there is no circular dependency, so keeping this import inside the test function hides import errors until runtime and breaks the enforced project convention.
Useful? React with 👍 / 👎.
Add a native Marin RL adapter for Reasoning Gym plus a minimal curriculum example. This wires direct dataset scoring through the upstream API, preserves pass metrics via correctness_reward for fractional rewards, and adds focused tests for environment loading, composite configs, and rollout statistics.