Skip to content

Conversation

@Darktex
Copy link
Contributor

@Darktex Darktex commented Jan 29, 2026

Summary

Implements the Rubric abstraction for reward computation as specified in RFC 004:

  • Base Rubric class with forward hooks, child registration, introspection, and serialization
  • Container rubrics: Sequential, Gate, WeightedSum, RubricList, RubricDict
  • Trajectory rubrics: TrajectoryRubric, ExponentialDiscountingTrajectoryRubric for delayed rewards
  • Environment integration: rubric attribute, _apply_rubric(), _reset_rubric() helpers

This provides a composable, PyTorch-like API for defining reward signals that can be introspected by training infrastructure.

Changes

New files

  • src/openenv/core/rubrics/base.py - Base Rubric class
  • src/openenv/core/rubrics/containers.py - Container rubrics (Sequential, Gate, WeightedSum, etc.)
  • src/openenv/core/rubrics/trajectory.py - Trajectory-based rubrics for delayed rewards
  • src/openenv/core/rubrics/__init__.py - Package exports

Modified files

  • src/openenv/core/env_server/interfaces.py - Added optional rubric attribute and helper methods

Tests

  • tests/core/test_rubrics/ - 86 comprehensive tests covering all rubric functionality

Test plan

  • All 86 rubric tests pass
  • All 143 core tests pass
  • Code formatted with ruff
  • No lint errors

Follow-up PRs

  • PR 2 will add TextArena and Connect4 environment examples using this rubric system

Implements the Rubric abstraction for reward computation as specified in RFC 004:

- Base Rubric class with forward hooks, child registration, and serialization
- Container rubrics: Sequential, Gate, WeightedSum, RubricList, RubricDict
- Trajectory rubrics: TrajectoryRubric, ExponentialDiscountingTrajectoryRubric
- Environment integration: rubric attribute, _apply_rubric(), _reset_rubric()

This provides a composable, PyTorch-like API for defining reward signals that can
be introspected by training infrastructure.

86 tests covering all rubric functionality.
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 29, 2026
@greptile-apps
Copy link

greptile-apps bot commented Jan 29, 2026

Greptile Overview

Greptile Summary

This PR implements RFC 004's Rubric system for composable reward computation in OpenEnv environments. The implementation provides a PyTorch-inspired API where environment authors define rubrics that compute rewards from actions and observations.

Key additions:

  • Base Rubric class with forward hooks, automatic child registration, and introspection methods (named_rubrics(), get_rubric())
  • Container rubrics for composition: Sequential (fail-fast gating), Gate (threshold filtering), WeightedSum, RubricList, RubricDict
  • Trajectory rubrics for delayed rewards: TrajectoryRubric (self-accumulating) and ExponentialDiscountingTrajectoryRubric (gamma-based credit assignment)
  • Environment integration: Optional rubric attribute in Environment base class with helper methods _apply_rubric() and _reset_rubric()

Design alignment:

  • Follows "rewards inside environment" principle (RFC 002) - rubrics are server-side only, part of environment boundary
  • Does not expose any agent-facing APIs - maintains dual API boundary invariant
  • Uses TYPE_CHECKING guard to avoid circular imports while preserving type hints
  • Implements optional integration pattern - existing environments continue to work without modification

Test coverage:

  • 86 comprehensive tests covering all rubric functionality
  • Tests include edge cases, hook behavior, child registration, serialization, and trajectory accumulation
  • Integration tests demonstrate usage within mock environments

Confidence Score: 5/5

  • This PR is safe to merge with high confidence - it implements a well-specified RFC with comprehensive tests and no breaking changes
  • Score reflects excellent RFC alignment, thorough test coverage (86 tests), clean architectural design with no breaking changes to existing code, proper separation of concerns, and successful integration following OpenEnv principles. The optional integration pattern ensures backward compatibility.
  • No files require special attention - all implementations are clean and well-tested

Important Files Changed

Filename Overview
src/openenv/core/rubrics/base.py Implements core Rubric base class with PyTorch-like API including forward hooks, child auto-registration, and introspection methods
src/openenv/core/rubrics/containers.py Implements container rubrics (Sequential, Gate, WeightedSum, RubricList, RubricDict) for composing reward computations
src/openenv/core/rubrics/trajectory.py Implements trajectory-based rubrics for delayed rewards with TrajectoryRubric base and ExponentialDiscountingTrajectoryRubric for credit assignment
src/openenv/core/env_server/interfaces.py Added optional rubric attribute to Environment base class with helper methods _apply_rubric() and _reset_rubric() for integration

Sequence Diagram

sequenceDiagram
    participant Env as Environment
    participant Rubric as Rubric System
    participant Agent as Agent/Infrastructure
    
    Note over Env,Agent: Episode Initialization
    Agent->>Env: reset(seed, episode_id)
    Env->>Rubric: _reset_rubric()
    Rubric->>Rubric: Clear trajectory state
    Env-->>Agent: Initial Observation
    
    Note over Env,Agent: Step Loop
    Agent->>Env: step(action)
    Env->>Env: Execute action logic
    Env->>Env: Create observation
    Env->>Rubric: _apply_rubric(action, obs)
    
    alt Immediate Reward (Base Rubric)
        Rubric->>Rubric: forward(action, obs)
        Rubric->>Rubric: Run pre-forward hooks
        Rubric->>Rubric: Compute reward
        Rubric->>Rubric: Update last_score
        Rubric->>Rubric: Run post-forward hooks
        Rubric-->>Env: Return reward score
    else Trajectory-Based Reward
        Rubric->>Rubric: Accumulate (action, obs)
        alt Not done
            Rubric-->>Env: Return intermediate_reward (0.0)
        else Done
            Rubric->>Rubric: score_trajectory()
            Rubric-->>Env: Return final score
        end
    end
    
    Env->>Env: observation.reward = score
    Env-->>Agent: Observation with reward
    
    opt Infrastructure Introspection
        Agent->>Env: env.rubric.named_rubrics()
        Env-->>Agent: Iterator of (name, rubric)
        loop For each rubric
            Agent->>Rubric: rubric.last_score
            Rubric-->>Agent: Component score
        end
    end
    
    opt Training with Trajectory Rubrics
        Agent->>Rubric: compute_step_rewards()
        Rubric->>Rubric: Apply credit assignment
        Rubric-->>Agent: Per-step rewards list
    end
Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants