[Bugfix][Reasoning] Properly detect reasoning end when using thinking_token_budget by schoennenbeck · Pull Request #43210 · vllm-project/vllm

schoennenbeck · 2026-05-20T12:44:13Z

Fixes issue: #39697

When setting a thinking_token_budget there is currently a missmatch between the reasoning parser and budget enforcer considering what it means to be in thinking mode. Specifically the budget enforcer is looking for its full reasoning_end_str in the output. However, this string might include a transition phase or otherwise differ from the end string of the reasoning parser. In this case the budget enforcer does not notice when reasoning ends naturally (i.e. reasoning ended before exceeding the budget). This leads to the reasoning_end_str being forcibly added to the output when the model is already producing content and generally undesirable behaviour.

This PR fixes this behaviour by making the reasoning parsers end string (if it exists) available to the budget enforcer and using it to determine if thinking has ended while still injecting the full configured reasoning_end_str if the budget is exceeded. It also adds a sanity check to make sure that the reasoning_end_str will actually be recognized by the parser as an end to the reasoning.

Disclosure

This PR was authored with the help of Claude Code (Opus 4.7). All code has been checked by me.

Signed-off-by: Sebastian Schönnenbeck <sebastian.schoennenbeck@comma-soft.com>

gemini-code-assist

Code Review

This pull request introduces a mechanism to distinguish between a user-configured reasoning_end_str and a reasoning parser's intrinsic end marker. It adds parser_reasoning_end_token_ids to ReasoningConfig and updates ThinkingBudgetStateHolder to use these IDs for detecting the end of reasoning. Validation logic is also added to ensure that the configured end string contains the parser's intrinsic marker. A typo was identified in a ValueError message where 'overrun' and 'Ensure' were concatenated without a space.

Signed-off-by: Sebastian Schönnenbeck <sebastian.schoennenbeck@comma-soft.com>

Use reasoning parser config to detect end of reasoning

253a802

Signed-off-by: Sebastian Schönnenbeck <sebastian.schoennenbeck@comma-soft.com>

schoennenbeck requested review from 22quinn, ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, njhill, robertgshaw2-redhat, tlrmchlsmth, yewentao256 and youkaichao as code owners May 20, 2026 12:44

mergify Bot added v1 bug Something isn't working labels May 20, 2026

gemini-code-assist Bot reviewed May 20, 2026

View reviewed changes

Comment thread vllm/config/reasoning.py Outdated

Fix typo

8379b30

Signed-off-by: Sebastian Schönnenbeck <sebastian.schoennenbeck@comma-soft.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix][Reasoning] Properly detect reasoning end when using thinking_token_budget#43210

[Bugfix][Reasoning] Properly detect reasoning end when using thinking_token_budget#43210
schoennenbeck wants to merge 2 commits into
vllm-project:mainfrom
schoennenbeck:fix/thinking_token_budget

schoennenbeck commented May 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

schoennenbeck commented May 20, 2026

Disclosure

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant