Skip to content

Conversation

@EvanZhuang
Copy link

Description

When training with RL, trajectories that terminate due to hitting the max_tokens limit (rather than naturally completing via stop sequence) can introduce noise into training. This PR:

  1. Tracks the stop_reason from the Tinker API through the data pipeline
  2. Adds a configurable filter to exclude incomplete trajectories from training
  3. Logs metrics and warnings when trajectories are filtered

Disabled by default - set filter_incomplete_trajectories=true to enable.

Changes

Files Modified (5 files)

  1. tinker_cookbook/completers.py

    • Added StopReason type alias (Literal["length", "stop"])
    • Added stop_reason field to TokensWithLogprobs dataclass (default "stop" for backward compatibility)
    • Added is_complete property that returns True if generation hit stop sequence
    • Updated TinkerTokenCompleter to capture and propagate stop_reason from the Tinker API
  2. tinker_cookbook/rl/types.py

    • Added is_complete: bool = True field to Transition dataclass
  3. tinker_cookbook/rl/rollouts.py

    • Updated do_single_rollout() to pass is_complete=ac_with_logprobs.is_complete when creating Transition
  4. tinker_cookbook/rl/data_processing.py

    • Added filter_incomplete_trajectories() function that:
      • Filters out any trajectory where any transition hit max_tokens
      • Returns filtered groups + statistics dict for metrics
      • Logs warning when trajectories are filtered
  5. tinker_cookbook/rl/train.py

    • Added filter_incomplete_trajectories: bool = False config option
    • Updated prepare_minibatch() to filter incomplete trajectories when enabled
    • Added handling for cases where all trajectories are filtered

Usage

The filtering is disabled by default (filter_incomplete_trajectories=False). To enable:

# In your training config
config = Config(
    # ... other config ...
    filter_incomplete_trajectories=True,  # Enable filtering
)

Backward Compatibility

All changes are backward compatible:

  • stop_reason defaults to "stop" (assumes complete)
  • is_complete defaults to True on Transition
  • filter_incomplete_trajectories config defaults to False

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant