Add Rubric system for reward computation (RFC 004) #340

Darktex · 2026-01-29T18:42:58Z

Summary

Implements the Rubric abstraction for reward computation as specified in RFC 004:

Base Rubric class with forward hooks, child registration, introspection, and serialization
Container rubrics: Sequential, Gate, WeightedSum, RubricList, RubricDict
Trajectory rubrics: TrajectoryRubric, ExponentialDiscountingTrajectoryRubric for delayed rewards
Environment integration: rubric attribute, _apply_rubric(), _reset_rubric() helpers

This provides a composable, PyTorch-like API for defining reward signals that can be introspected by training infrastructure.

Changes

New files

src/openenv/core/rubrics/base.py - Base Rubric class
src/openenv/core/rubrics/containers.py - Container rubrics (Sequential, Gate, WeightedSum, etc.)
src/openenv/core/rubrics/trajectory.py - Trajectory-based rubrics for delayed rewards
src/openenv/core/rubrics/__init__.py - Package exports

Modified files

src/openenv/core/env_server/interfaces.py - Added optional rubric attribute and helper methods

Tests

tests/core/test_rubrics/ - 86 comprehensive tests covering all rubric functionality

Test plan

All 86 rubric tests pass
All 143 core tests pass
Code formatted with ruff
No lint errors

Follow-up PRs

PR 2 will add TextArena and Connect4 environment examples using this rubric system

Implements the Rubric abstraction for reward computation as specified in RFC 004: - Base Rubric class with forward hooks, child registration, and serialization - Container rubrics: Sequential, Gate, WeightedSum, RubricList, RubricDict - Trajectory rubrics: TrajectoryRubric, ExponentialDiscountingTrajectoryRubric - Environment integration: rubric attribute, _apply_rubric(), _reset_rubric() This provides a composable, PyTorch-like API for defining reward signals that can be introspected by training infrastructure. 86 tests covering all rubric functionality.

greptile-apps · 2026-01-29T18:46:10Z

Greptile Overview

Greptile Summary

This PR implements RFC 004's Rubric system for composable reward computation in OpenEnv environments. The implementation provides a PyTorch-inspired API where environment authors define rubrics that compute rewards from actions and observations.

Key additions:

Base Rubric class with forward hooks, automatic child registration, and introspection methods (named_rubrics(), get_rubric())
Container rubrics for composition: Sequential (fail-fast gating), Gate (threshold filtering), WeightedSum, RubricList, RubricDict
Trajectory rubrics for delayed rewards: TrajectoryRubric (self-accumulating) and ExponentialDiscountingTrajectoryRubric (gamma-based credit assignment)
Environment integration: Optional rubric attribute in Environment base class with helper methods _apply_rubric() and _reset_rubric()

Design alignment:

Follows "rewards inside environment" principle (RFC 002) - rubrics are server-side only, part of environment boundary
Does not expose any agent-facing APIs - maintains dual API boundary invariant
Uses TYPE_CHECKING guard to avoid circular imports while preserving type hints
Implements optional integration pattern - existing environments continue to work without modification

Test coverage:

86 comprehensive tests covering all rubric functionality
Tests include edge cases, hook behavior, child registration, serialization, and trajectory accumulation
Integration tests demonstrate usage within mock environments

Confidence Score: 5/5

This PR is safe to merge with high confidence - it implements a well-specified RFC with comprehensive tests and no breaking changes
Score reflects excellent RFC alignment, thorough test coverage (86 tests), clean architectural design with no breaking changes to existing code, proper separation of concerns, and successful integration following OpenEnv principles. The optional integration pattern ensures backward compatibility.
No files require special attention - all implementations are clean and well-tested

Important Files Changed

Filename	Overview
src/openenv/core/rubrics/base.py	Implements core `Rubric` base class with PyTorch-like API including forward hooks, child auto-registration, and introspection methods
src/openenv/core/rubrics/containers.py	Implements container rubrics (`Sequential`, `Gate`, `WeightedSum`, `RubricList`, `RubricDict`) for composing reward computations
src/openenv/core/rubrics/trajectory.py	Implements trajectory-based rubrics for delayed rewards with `TrajectoryRubric` base and `ExponentialDiscountingTrajectoryRubric` for credit assignment
src/openenv/core/env_server/interfaces.py	Added optional `rubric` attribute to `Environment` base class with helper methods `_apply_rubric()` and `_reset_rubric()` for integration

Sequence Diagram

sequenceDiagram
    participant Env as Environment
    participant Rubric as Rubric System
    participant Agent as Agent/Infrastructure
    
    Note over Env,Agent: Episode Initialization
    Agent->>Env: reset(seed, episode_id)
    Env->>Rubric: _reset_rubric()
    Rubric->>Rubric: Clear trajectory state
    Env-->>Agent: Initial Observation
    
    Note over Env,Agent: Step Loop
    Agent->>Env: step(action)
    Env->>Env: Execute action logic
    Env->>Env: Create observation
    Env->>Rubric: _apply_rubric(action, obs)
    
    alt Immediate Reward (Base Rubric)
        Rubric->>Rubric: forward(action, obs)
        Rubric->>Rubric: Run pre-forward hooks
        Rubric->>Rubric: Compute reward
        Rubric->>Rubric: Update last_score
        Rubric->>Rubric: Run post-forward hooks
        Rubric-->>Env: Return reward score
    else Trajectory-Based Reward
        Rubric->>Rubric: Accumulate (action, obs)
        alt Not done
            Rubric-->>Env: Return intermediate_reward (0.0)
        else Done
            Rubric->>Rubric: score_trajectory()
            Rubric-->>Env: Return final score
        end
    end
    
    Env->>Env: observation.reward = score
    Env-->>Agent: Observation with reward
    
    opt Infrastructure Introspection
        Agent->>Env: env.rubric.named_rubrics()
        Env-->>Agent: Iterator of (name, rubric)
        loop For each rubric
            Agent->>Rubric: rubric.last_score
            Rubric-->>Agent: Component score
        end
    end
    
    opt Training with Trajectory Rubrics
        Agent->>Rubric: compute_step_rewards()
        Rubric->>Rubric: Apply credit assignment
        Rubric-->>Agent: Per-step rewards list
    end

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 29, 2026

Darktex mentioned this pull request Jan 29, 2026

Add TextArena and Connect4 rubric examples (RFC 004) #341

Open

3 tasks

Darktex mentioned this pull request Jan 30, 2026

[BREAKING CHANGES] Implement async in Core, migrate envs to zoo, migrate to MCP server by default #342

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Rubric system for reward computation (RFC 004) #340

Add Rubric system for reward computation (RFC 004) #340

Uh oh!

Darktex commented Jan 29, 2026

Uh oh!

greptile-apps bot commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add Rubric system for reward computation (RFC 004) #340

Are you sure you want to change the base?

Add Rubric system for reward computation (RFC 004) #340

Uh oh!

Conversation

Darktex commented Jan 29, 2026

Summary

Changes

New files

Modified files

Tests

Test plan

Follow-up PRs

Uh oh!

greptile-apps bot commented Jan 29, 2026

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants