Skip to content

Reasoning content leaks into content field when tool calls are present (Nemotron) #161

@Thump604

Description

@Thump604

Description

When Nemotron 3 Super generates tool calls with thinking enabled, the <think> content leaks into the content field instead of being separated into reasoning_content. The qwen3 reasoning parser doesn't fully extract reasoning when tool calls are also present in the response.

Reproduction

# Server: vllm-mlx serving Nemotron-3-Super-120B-A12B with:
#   --tool-call-parser nemotron --reasoning-parser qwen3 --enable-auto-tool-choice

curl http://127.0.0.1:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nemotron",
    "messages": [{"role": "user", "content": "What is the weather in Paris?"}],
    "tools": [{"type": "function", "function": {"name": "get_weather", "description": "Get weather", "parameters": {"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"]}}}],
    "max_tokens": 200
  }'

Expected

{
  "content": null,
  "reasoning_content": "The user is asking for the current weather...",
  "tool_calls": [{"function": {"name": "get_weather", "arguments": "{\"city\": \"Paris\"}"}}]
}

Actual

{
  "content": "<think>The user is asking for the current weather in Paris. I have access to the get_weather function...</think>",
  "reasoning_content": null,
  "tool_calls": [{"function": {"name": "get_weather", "arguments": "{\"city\": \"Paris\"}"}}]
}

Environment

  • vllm-mlx 0.2.6
  • mlx-lm 0.31.0 (with PRs #988 + #992 applied)
  • Model: Nemotron-3-Super-120B-A12B (4.5-bit MLX)
  • Mac Studio M2 Ultra 128GB

Analysis

The qwen3 reasoning parser correctly handles reasoning extraction for non-tool-call responses. The issue appears to be in the interaction between the reasoning parser and the tool call parser — when the tool call parser processes the response, it may consume the output before the reasoning parser has a chance to extract the <think> tags, or the two parsers don't coordinate on the same response text.

This also affects Qwen 3.5 models in MLLM mode when tool calls are present, so it's not Nemotron-specific.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions