[Bugfix][Reasoning] Properly detect reasoning end when using thinking_token_budget#43210
Open
schoennenbeck wants to merge 2 commits into
Open
[Bugfix][Reasoning] Properly detect reasoning end when using thinking_token_budget#43210schoennenbeck wants to merge 2 commits into
schoennenbeck wants to merge 2 commits into
Conversation
Signed-off-by: Sebastian Schönnenbeck <sebastian.schoennenbeck@comma-soft.com>
Contributor
There was a problem hiding this comment.
Code Review
This pull request introduces a mechanism to distinguish between a user-configured reasoning_end_str and a reasoning parser's intrinsic end marker. It adds parser_reasoning_end_token_ids to ReasoningConfig and updates ThinkingBudgetStateHolder to use these IDs for detecting the end of reasoning. Validation logic is also added to ensure that the configured end string contains the parser's intrinsic marker. A typo was identified in a ValueError message where 'overrun' and 'Ensure' were concatenated without a space.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes issue: #39697
When setting a
thinking_token_budgetthere is currently a missmatch between the reasoning parser and budget enforcer considering what it means to be in thinking mode. Specifically the budget enforcer is looking for its fullreasoning_end_strin the output. However, this string might include a transition phase or otherwise differ from the end string of the reasoning parser. In this case the budget enforcer does not notice when reasoning ends naturally (i.e. reasoning ended before exceeding the budget). This leads to thereasoning_end_strbeing forcibly added to the output when the model is already producing content and generally undesirable behaviour.This PR fixes this behaviour by making the reasoning parsers end string (if it exists) available to the budget enforcer and using it to determine if thinking has ended while still injecting the full configured
reasoning_end_strif the budget is exceeded. It also adds a sanity check to make sure that thereasoning_end_strwill actually be recognized by the parser as an end to the reasoning.Disclosure
This PR was authored with the help of Claude Code (Opus 4.7). All code has been checked by me.