Skip to content

Conversation

@Darktex
Copy link
Contributor

@Darktex Darktex commented Jan 29, 2026

Summary

Demonstrates rubric integration patterns with two environments per RFC 004:

TextArena (Wordle)

  • WordleRubric composite rubric with greens, yellows, repetitions, correct sub-rubrics
  • Migrates from legacy RewardProvider to Rubric pattern
  • Full backwards compatibility via get_reward_signals()

Connect4

  • Connect4WinLossRubric trajectory rubric for terminal games
  • Demonstrates exponential discounting for credit assignment
  • Shows reset() integration with environment lifecycle

Changes

New files

  • envs/textarena_env/rubrics.py - Wordle rubric implementation
  • envs/connect4_env/rubrics.py - Connect4 trajectory rubric
  • tests/envs/test_textarena_rubrics.py - 18 tests
  • tests/envs/test_connect4_rubrics.py - 12 tests

Modified files

  • envs/textarena_env/server/environment.py - Use rubric instead of RewardProvider
  • envs/connect4_env/server/connect4_environment.py - Add optional rubric support

Test plan

  • All 30 environment rubric tests pass
  • All 116 total rubric tests pass
  • Code formatted with ruff

Dependencies

This PR depends on #340 (Rubric base system).

Demonstrates rubric integration patterns with two environments:

TextArena (Wordle):
- WordleRubric composite with greens, yellows, repetitions, correct
- Migrates from legacy RewardProvider to Rubric pattern
- Full backwards compatibility via get_reward_signals()

Connect4:
- Connect4WinLossRubric trajectory rubric for terminal games
- Demonstrates exponential discounting for credit assignment
- Shows reset() integration with environment lifecycle

30 tests covering both environment rubrics.
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 29, 2026
@greptile-apps
Copy link

greptile-apps bot commented Jan 29, 2026

Greptile Overview

Greptile Summary

This PR demonstrates successful rubric integration patterns for two environments per RFC 004.

Key Changes

  • TextArena (Wordle): Migrates from legacy RewardProvider to composite WordleRubric with sub-rubrics for greens, yellows, repetitions, and correct scoring. Maintains full backward compatibility via get_reward_signals() method.

  • Connect4: Implements Connect4WinLossRubric as a trajectory rubric using exponential discounting for credit assignment. Returns terminal rewards (1.0 win, 0.0 loss, 0.5 draw) with proper reset() integration.

  • Test Coverage: 30 new tests (18 for TextArena, 12 for Connect4) covering rubric behavior, discounting, serialization, and environment integration.

Architecture Alignment

The implementation correctly follows RFC 004 patterns:

  • Rubrics live inside environments as self.rubric attribute
  • Environments call self._reset_rubric() in reset() to clear trajectory state
  • TrajectoryRubric accumulates steps internally, returns intermediate reward until done
  • compute_step_rewards() provides per-step rewards with discounting for training

Both implementations respect the "rewards inside environment" principle (INVARIANTS.md), keeping reward computation server-side within the environment boundary.

Confidence Score: 5/5

  • This PR is safe to merge - clean implementation of RFC 004 patterns with comprehensive tests and no breaking changes
  • Score reflects excellent code quality: proper RFC 004 implementation, comprehensive test coverage (30 tests), maintains backward compatibility, follows all system invariants, and includes clear documentation
  • No files require special attention

Important Files Changed

Filename Overview
envs/connect4_env/rubrics.py New trajectory rubric for Connect4 win/loss scoring with exponential discounting - clean implementation following RFC 004
envs/connect4_env/server/connect4_environment.py Adds optional rubric support to Connect4 environment with proper reset integration
envs/textarena_env/rubrics.py Migrates Wordle from legacy RewardProvider to new Rubric system with composite scoring for greens/yellows/repetitions/correct
envs/textarena_env/server/environment.py Replaces RewardProvider pattern with Rubric integration, maintains backward compatibility via get_reward_signals()

Sequence Diagram

sequenceDiagram
    participant Training as Training Loop
    participant Env as Environment
    participant Rubric as Rubric
    participant Game as Game Logic

    Note over Training,Game: Episode Start
    Training->>Env: reset()
    Env->>Rubric: reset()
    Note over Rubric: Clear trajectory buffer
    Env->>Game: Initialize game state
    Env-->>Training: Initial observation

    Note over Training,Game: Game Loop
    loop Until done
        Training->>Env: step(action)
        Env->>Game: Apply action
        Game-->>Env: New game state
        Env->>Rubric: __call__(action, observation)
        
        alt Not Done (intermediate step)
            Rubric->>Rubric: Append to trajectory
            Rubric-->>Env: 0.0 (intermediate reward)
        else Done (terminal step)
            Rubric->>Rubric: Append to trajectory
            Rubric->>Rubric: score_trajectory()
            Note over Rubric: Compute final score<br/>(win=1.0, loss=0.0, draw=0.5)
            Rubric-->>Env: Final score
        end
        
        Env-->>Training: Observation with reward
    end

    Note over Training,Game: Episode Complete
    Training->>Rubric: compute_step_rewards()
    Note over Rubric: Apply discounting:<br/>r_t = gamma^(T-1-t) * final_score
    Rubric-->>Training: Per-step rewards for training
Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants