[Bugfix] Fix missing first token in tool calls during reasoning-to-tool transition #30671

mondaylord · 2025-12-15T07:08:58Z

Purpose

This PR refactors the streaming logic in the generation handler to fix an issue where the first token of a tool call could be dropped or set to None during the transition from Reasoning to Tool Calling.

The Problem

Previously, the logic used a mutually exclusive if-else structure:

if not reasoning_end_arr[i]:
    # Handle reasoning...
    if is_end: reasoning_end_arr[i] = True
else:
    # Handle tool calls...

In streaming scenarios, if the "Reasoning End" token (e.g., </think>) appeared in the current iteration, the reasoning_end_arr flag was set to True, but the else block was skipped for that iteration. This resulted in the immediate next token being generated by the engine but dropped from the streaming response.

Evidence (Log Analysis)

The following SSE logs demonstrate the issue before the fix. Observe the second chunk: the model generated the token "Ap" (visible in logprobs), but the delta.content was set to null. The next chunk continues with "ologies". Result: The user receives "ologies" instead of "Apologies".

// 1. Initial chunk
data: {"id": "...", "choices": [{"delta": {"role": "assistant", "content": "", "reasoning_content": null}}]}

// 2. THE BUG: Transition happens here.
// 'token' is "Ap" (logprob exists), but 'delta.content' is null.
data: {"id": "...", "choices": [{"delta": {"content": null, "reasoning_content": null}, "logprobs": {"content": [{"token": "Ap", "logprob": -8.55, ...}]}}]}

// 3. Next chunk continues, missing the start.
data: {"id": "...", "choices": [{"delta": {"content": "ologies", "reasoning_content": null}, "logprobs": {"content": [{"token": "ologies", "logprob": -0.0008, ...}]}}]}

The Fix

The logic is now changed to sequential if statements:

Sequential Processing: After checking/processing reasoning, the code immediately checks if reasoning_end_arr[i]:. This ensures that if reasoning finishes in the current step, the tool_parser is immediately invoked to process the remaining tokens in the same iteration.
Optimization: The check for prompt_token_ids (disabling reasoning via prompt) is moved before the expensive extract_reasoning_streaming call, avoiding unnecessary processing when thinking is disabled.

Test Plan

Just directly test tool-call + reasoning case, such as

'{
  "model": "deepseek-ai/DeepSeek-V3.2",
  "stream": true,
  "stream_options": {
    "include_usage": true
  },
  "messages": [
    {
      "role": "user",
      "content": "What is the weather like in Boston in fahrenheit?"
    }
  ],
  "max_tokens": 65536,
  "temperature": 1,
  "top_p": 1,
  "frequency_penalty": 0,
  "presence_penalty": 0,
  "seed": null,
  "min_p": 0,
  "repetition_penalty": 1,
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_current_weather",
        "description": "Get the current weather in a given location",
        "strict": true,
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The city and state, e.g. San Francisco, CA"
            },
            "unit": {
              "type": "string",
              "enum": [
                "celsius",
                "fahrenheit"
              ]
            }
          },
          "additionalProperties": false,
          "required": [
            "location",
            "unit"
          ]
        }
      }
    }
  ]
}'

Test Result

The first token is generated normally, without setting to None or just dropped.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

chatgpt-codex-connector · 2025-12-15T07:09:03Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

gemini-code-assist

Code Review

The pull request refactors the logic within the chat_completion_stream_generator function, specifically concerning the handling of reasoning extraction and the detection of reasoning completion. The reasoning_parser.extract_reasoning_streaming call, along with its associated delta_message processing, is moved to be conditional, now only executing when reasoning is actively ongoing and not yet marked as ended. This change ensures that current_text is not updated unnecessarily when reasoning ends via prompt token IDs, and clarifies that tool calls are processed only after the reasoning phase has concluded.

github-actions · 2025-12-15T07:28:11Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

chaunceyjiang

Please provide a minimal reproducible example so that I can reproduce it on my local environment.

mondaylord · 2025-12-15T12:46:24Z

Please provide a minimal reproducible example so that I can reproduce it on my local environment.

The deployment config is the same as https://docs.vllm.ai/projects/recipes/en/latest/DeepSeek/DeepSeek-V3_2.html#launching-deepseek-v32, and a reproducible example is shown below

'{
  "model": "deepseek-ai/DeepSeek-V3.2",
  "stream": true,
  "stream_options": {
    "include_usage": true
  },
  "messages": [
    {
      "role": "user",
      "content": "What is the weather like in Boston in fahrenheit?"
    }
  ],
  "max_tokens": 65536,
  "temperature": 1,
  "top_p": 1,
  "frequency_penalty": 0,
  "presence_penalty": 0,
  "seed": null,
  "min_p": 0,
  "repetition_penalty": 1,
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_current_weather",
        "description": "Get the current weather in a given location",
        "strict": true,
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The city and state, e.g. San Francisco, CA"
            },
            "unit": {
              "type": "string",
              "enum": [
                "celsius",
                "fahrenheit"
              ]
            }
          },
          "additionalProperties": false,
          "required": [
            "location",
            "unit"
          ]
        }
      }
    }
  ]
}'

You can run multiple times to check if it's missing the first token. In my tests, nearly 90% tests will lose the first token.

chaunceyjiang · 2025-12-15T13:49:14Z

Thanks~ @mondaylord

you need to DCO

chaunceyjiang

Thanks~

Signed-off-by: mondaylord <[email protected]>

…ol transition (vllm-project#30671) Signed-off-by: mondaylord <[email protected]> Signed-off-by: Joachim Studnia <[email protected]>

mondaylord requested review from aarnphm and chaunceyjiang as code owners December 15, 2025 07:08

mergify bot added the frontend label Dec 15, 2025

gemini-code-assist bot reviewed Dec 15, 2025

View reviewed changes

chaunceyjiang reviewed Dec 15, 2025

View reviewed changes

mondaylord force-pushed the fix_dsv32_ignore_first_token branch from ddd7919 to a7ae345 Compare December 15, 2025 13:54

chaunceyjiang approved these changes Dec 15, 2025

View reviewed changes

fix: dsv3.2 will set first token to none sometimes

24e0084

Signed-off-by: mondaylord <[email protected]>

mondaylord force-pushed the fix_dsv32_ignore_first_token branch from a7ae345 to 24e0084 Compare December 15, 2025 13:57

chaunceyjiang added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 15, 2025

chaunceyjiang enabled auto-merge (squash) December 15, 2025 13:57

chaunceyjiang self-assigned this Dec 15, 2025

chaunceyjiang merged commit 17fec3a into vllm-project:main Dec 15, 2025
49 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Fix missing first token in tool calls during reasoning-to-tool transition #30671

[Bugfix] Fix missing first token in tool calls during reasoning-to-tool transition #30671

mondaylord commented Dec 15, 2025 •

edited by github-actions bot

Loading

Uh oh!

chatgpt-codex-connector bot commented Dec 15, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

github-actions bot commented Dec 15, 2025

Uh oh!

chaunceyjiang left a comment

Uh oh!

mondaylord commented Dec 15, 2025

Uh oh!

chaunceyjiang commented Dec 15, 2025

Uh oh!

chaunceyjiang left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[Bugfix] Fix missing first token in tool calls during reasoning-to-tool transition #30671

[Bugfix] Fix missing first token in tool calls during reasoning-to-tool transition #30671

Conversation

mondaylord commented Dec 15, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

The Problem

Evidence (Log Analysis)

The Fix

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector bot commented Dec 15, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

github-actions bot commented Dec 15, 2025

Uh oh!

chaunceyjiang left a comment

Choose a reason for hiding this comment

Uh oh!

mondaylord commented Dec 15, 2025

Uh oh!

chaunceyjiang commented Dec 15, 2025

Uh oh!

chaunceyjiang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mondaylord commented Dec 15, 2025 •

edited by github-actions bot

Loading