feat(rl): Filter incomplete trajectories that hit max_tokens limit #160

EvanZhuang · 2025-12-10T18:01:34Z

Description

When training with RL, trajectories that terminate due to hitting the max_tokens limit (rather than naturally completing via stop sequence) can introduce noise into training. This PR:

Tracks the stop_reason from the Tinker API through the data pipeline
Adds a configurable filter to exclude incomplete trajectories from training
Logs metrics and warnings when trajectories are filtered

Disabled by default - set filter_incomplete_trajectories=true to enable.

Changes

Files Modified (5 files)

tinker_cookbook/completers.py
- Added StopReason type alias (Literal["length", "stop"])
- Added stop_reason field to TokensWithLogprobs dataclass (default "stop" for backward compatibility)
- Added is_complete property that returns True if generation hit stop sequence
- Updated TinkerTokenCompleter to capture and propagate stop_reason from the Tinker API
tinker_cookbook/rl/types.py
- Added is_complete: bool = True field to Transition dataclass
tinker_cookbook/rl/rollouts.py
- Updated do_single_rollout() to pass is_complete=ac_with_logprobs.is_complete when creating Transition
tinker_cookbook/rl/data_processing.py
- Added filter_incomplete_trajectories() function that:
  - Filters out any trajectory where any transition hit max_tokens
  - Returns filtered groups + statistics dict for metrics
  - Logs warning when trajectories are filtered
tinker_cookbook/rl/train.py
- Added filter_incomplete_trajectories: bool = False config option
- Updated prepare_minibatch() to filter incomplete trajectories when enabled
- Added handling for cases where all trajectories are filtered

Usage

The filtering is disabled by default (filter_incomplete_trajectories=False). To enable:

# In your training config
config = Config(
    # ... other config ...
    filter_incomplete_trajectories=True,  # Enable filtering
)

Backward Compatibility

All changes are backward compatible:

stop_reason defaults to "stop" (assumes complete)
is_complete defaults to True on Transition
filter_incomplete_trajectories config defaults to False

EvanZhuang added 2 commits December 10, 2025 09:55

add an option to filter out incomplete rollouts in training

54c91fd

ruff format fix

11885f5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(rl): Filter incomplete trajectories that hit max_tokens limit #160

feat(rl): Filter incomplete trajectories that hit max_tokens limit #160

Uh oh!

EvanZhuang commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat(rl): Filter incomplete trajectories that hit max_tokens limit #160

Are you sure you want to change the base?

feat(rl): Filter incomplete trajectories that hit max_tokens limit #160

Uh oh!

Conversation

EvanZhuang commented Dec 10, 2025

Description

Changes

Files Modified (5 files)

Usage

Backward Compatibility

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant