Skip to content

[Bugfix][Reasoning] Properly detect reasoning end when using thinking_token_budget#43210

Open
schoennenbeck wants to merge 2 commits into
vllm-project:mainfrom
schoennenbeck:fix/thinking_token_budget
Open

[Bugfix][Reasoning] Properly detect reasoning end when using thinking_token_budget#43210
schoennenbeck wants to merge 2 commits into
vllm-project:mainfrom
schoennenbeck:fix/thinking_token_budget

Conversation

@schoennenbeck
Copy link
Copy Markdown
Contributor

Fixes issue: #39697

When setting a thinking_token_budget there is currently a missmatch between the reasoning parser and budget enforcer considering what it means to be in thinking mode. Specifically the budget enforcer is looking for its full reasoning_end_str in the output. However, this string might include a transition phase or otherwise differ from the end string of the reasoning parser. In this case the budget enforcer does not notice when reasoning ends naturally (i.e. reasoning ended before exceeding the budget). This leads to the reasoning_end_str being forcibly added to the output when the model is already producing content and generally undesirable behaviour.

This PR fixes this behaviour by making the reasoning parsers end string (if it exists) available to the budget enforcer and using it to determine if thinking has ended while still injecting the full configured reasoning_end_str if the budget is exceeded. It also adds a sanity check to make sure that the reasoning_end_str will actually be recognized by the parser as an end to the reasoning.

Disclosure

This PR was authored with the help of Claude Code (Opus 4.7). All code has been checked by me.

Signed-off-by: Sebastian Schönnenbeck <sebastian.schoennenbeck@comma-soft.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to distinguish between a user-configured reasoning_end_str and a reasoning parser's intrinsic end marker. It adds parser_reasoning_end_token_ids to ReasoningConfig and updates ThinkingBudgetStateHolder to use these IDs for detecting the end of reasoning. Validation logic is also added to ensure that the configured end string contains the parser's intrinsic marker. A typo was identified in a ValueError message where 'overrun' and 'Ensure' were concatenated without a space.

Comment thread vllm/config/reasoning.py Outdated
Signed-off-by: Sebastian Schönnenbeck <sebastian.schoennenbeck@comma-soft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant