feat(rl): Filter incomplete trajectories that hit max_tokens limit #160
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
When training with RL, trajectories that terminate due to hitting the
max_tokenslimit (rather than naturally completing via stop sequence) can introduce noise into training. This PR:stop_reasonfrom the Tinker API through the data pipelineDisabled by default - set
filter_incomplete_trajectories=trueto enable.Changes
Files Modified (5 files)
tinker_cookbook/completers.pyStopReasontype alias (Literal["length", "stop"])stop_reasonfield toTokensWithLogprobsdataclass (default"stop"for backward compatibility)is_completeproperty that returnsTrueif generation hit stop sequenceTinkerTokenCompleterto capture and propagatestop_reasonfrom the Tinker APItinker_cookbook/rl/types.pyis_complete: bool = Truefield toTransitiondataclasstinker_cookbook/rl/rollouts.pydo_single_rollout()to passis_complete=ac_with_logprobs.is_completewhen creatingTransitiontinker_cookbook/rl/data_processing.pyfilter_incomplete_trajectories()function that:max_tokenstinker_cookbook/rl/train.pyfilter_incomplete_trajectories: bool = Falseconfig optionprepare_minibatch()to filter incomplete trajectories when enabledUsage
The filtering is disabled by default (
filter_incomplete_trajectories=False). To enable:Backward Compatibility
All changes are backward compatible:
stop_reasondefaults to"stop"(assumes complete)is_completedefaults toTrueonTransitionfilter_incomplete_trajectoriesconfig defaults toFalse