-
Notifications
You must be signed in to change notification settings - Fork 174
Add the Reasoning Gym set of environments #326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Integrate reasoning_gym library to provide single-step reasoning tasks. Each episode presents one question from a configurable dataset, the agent submits an answer, and receives a score (0.0 to 1.0). Features: - Single-step episodes: reset() provides question, step() validates answer - Dataset persistence: Dataset reused across resets until config changes - Flexible configuration: Supports simple and composite datasets - Concurrent sessions: Multiple clients can connect simultaneously Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Replace EchoEnv template content with accurate documentation for Reasoning Gym environment. Update includes: - Single-step reasoning task workflow - Dataset configuration (simple and composite) - Dataset persistence behavior - Correct action/observation models (answer, score, question) - Reward structure (score-based, not length-based) - Use cases for LLM evaluation and agent training Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Show how to access the dataset_metadata field in the Quick Start example, demonstrating the full observation interface. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
- Add comprehensive test suite with 26 tests covering environment behavior, models, client, and integration workflows - Fix imports in server files to support both Docker (direct import) and local testing (relative import) - Fix minor formatting issue in docstring Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Greptile OverviewGreptile SummaryAdded Reasoning Gym environment integration to OpenEnv, providing 100+ single-step reasoning tasks with verifiable rewards. Key Implementation Details:
Architecture Alignment:
Design Philosophy: Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Client as ReasoningGymEnv<br/>(Client)
participant WS as WebSocket<br/>Connection
participant Server as FastAPI<br/>Server
participant Env as ReasoningGymEnvironment
participant RG as reasoning_gym<br/>Library
Note over Client,RG: Initial Setup & First Episode
Client->>Server: Connect (WebSocket)
Server->>Env: Create environment instance
Client->>WS: reset(dataset_name='leg_counting',<br/>seed=42, size=10)
WS->>Server: Forward reset request
Server->>Env: reset(...)
Env->>RG: create_dataset('leg_counting',<br/>seed=42, size=10)
RG-->>Env: Dataset instance
Env->>Env: Create iterator from dataset
Env->>Env: Get next question from iterator
Env-->>Server: ReasoningGymObservation<br/>(question, done=False)
Server-->>WS: Serialize observation
WS-->>Client: StepResult with question
Note over Client,RG: Agent Answers Question
Client->>WS: step(ReasoningGymAction(answer="4"))
WS->>Server: Forward step request
Server->>Env: step(action)
Env->>RG: score_answer(answer, entry)
RG-->>Env: score (0.0-1.0)
Env-->>Server: ReasoningGymObservation<br/>(score, correct_answer, done=True)
Server-->>WS: Serialize observation
WS-->>Client: StepResult with score
Note over Client,RG: Next Question (Reuse Dataset)
Client->>WS: reset() [no params]
WS->>Server: Forward reset request
Server->>Env: reset()
Env->>Env: Reuse existing dataset
Env->>Env: Get next question from iterator
Note over Env: If iterator exhausted,<br/>wrap around to start
Env-->>Server: ReasoningGymObservation<br/>(question, done=False)
Server-->>WS: Serialize observation
WS-->>Client: StepResult with question
Note over Client,RG: New Dataset Configuration
Client->>WS: reset(dataset_name='composite',<br/>dataset_specs=[...], seed=99, size=30)
WS->>Server: Forward reset request
Server->>Env: reset(...)
Env->>RG: create_dataset('composite',<br/>datasets=specs, seed=99, size=30)
RG-->>Env: New dataset instance
Env->>Env: Create new iterator
Env->>Env: Get first question
Env-->>Server: ReasoningGymObservation<br/>(question, done=False)
Server-->>WS: Serialize observation
WS-->>Client: StepResult with question
|
|
tagging @burtenshaw @Darktex for visibility :) |
Summary
Hey there, I am one of the core contributors of Reasoning Gym - a suite of 100+ environments with verifiable rewards. I would be really happy to contribute this set of procedural data generators to OpenEnv!
Since these are all single-step environments, I went with the following design philosophy:
env.reset(...)creates an environment with the passed arguments:done=True. This time, simply callingenv.reset()with no arguments will yield a new generated sample from the previously instantiated environment.env.reset(...)with new dataset configs, it will re-instantiate a new dataset and continue yielding data from there:Type of Change
Alignment Checklist
Before submitting, verify:
.claude/docs/PRINCIPLES.mdand this PR aligns with our principles.claude/docs/INVARIANTS.mdand no invariants are violated/pre-submit-pr(orbash .claude/hooks/lint.shand tests) and addressed all issuesRFC Status
Test Plan
After building the Docker image, I have created a small script to test out the calls to the environment
Sample script
Script Output
Claude Code Review