-
Notifications
You must be signed in to change notification settings - Fork 161
Description
Description
When Nemotron 3 Super generates tool calls with thinking enabled, the <think> content leaks into the content field instead of being separated into reasoning_content. The qwen3 reasoning parser doesn't fully extract reasoning when tool calls are also present in the response.
Reproduction
# Server: vllm-mlx serving Nemotron-3-Super-120B-A12B with:
# --tool-call-parser nemotron --reasoning-parser qwen3 --enable-auto-tool-choice
curl http://127.0.0.1:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "nemotron",
"messages": [{"role": "user", "content": "What is the weather in Paris?"}],
"tools": [{"type": "function", "function": {"name": "get_weather", "description": "Get weather", "parameters": {"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"]}}}],
"max_tokens": 200
}'Expected
{
"content": null,
"reasoning_content": "The user is asking for the current weather...",
"tool_calls": [{"function": {"name": "get_weather", "arguments": "{\"city\": \"Paris\"}"}}]
}Actual
{
"content": "<think>The user is asking for the current weather in Paris. I have access to the get_weather function...</think>",
"reasoning_content": null,
"tool_calls": [{"function": {"name": "get_weather", "arguments": "{\"city\": \"Paris\"}"}}]
}Environment
- vllm-mlx 0.2.6
- mlx-lm 0.31.0 (with PRs #988 + #992 applied)
- Model: Nemotron-3-Super-120B-A12B (4.5-bit MLX)
- Mac Studio M2 Ultra 128GB
Analysis
The qwen3 reasoning parser correctly handles reasoning extraction for non-tool-call responses. The issue appears to be in the interaction between the reasoning parser and the tool call parser — when the tool call parser processes the response, it may consume the output before the reasoning parser has a chance to extract the <think> tags, or the two parsers don't coordinate on the same response text.
This also affects Qwen 3.5 models in MLLM mode when tool calls are present, so it's not Nemotron-specific.