Reasoning content leaks into content field when tool calls are present (Nemotron)

## Description

When Nemotron 3 Super generates tool calls with thinking enabled, the `<think>` content leaks into the `content` field instead of being separated into `reasoning_content`. The `qwen3` reasoning parser doesn't fully extract reasoning when tool calls are also present in the response.

## Reproduction

```bash
# Server: vllm-mlx serving Nemotron-3-Super-120B-A12B with:
#   --tool-call-parser nemotron --reasoning-parser qwen3 --enable-auto-tool-choice

curl http://127.0.0.1:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nemotron",
    "messages": [{"role": "user", "content": "What is the weather in Paris?"}],
    "tools": [{"type": "function", "function": {"name": "get_weather", "description": "Get weather", "parameters": {"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"]}}}],
    "max_tokens": 200
  }'
```

## Expected

```json
{
  "content": null,
  "reasoning_content": "The user is asking for the current weather...",
  "tool_calls": [{"function": {"name": "get_weather", "arguments": "{\"city\": \"Paris\"}"}}]
}
```

## Actual

```json
{
  "content": "<think>The user is asking for the current weather in Paris. I have access to the get_weather function...</think>",
  "reasoning_content": null,
  "tool_calls": [{"function": {"name": "get_weather", "arguments": "{\"city\": \"Paris\"}"}}]
}
```

## Environment

- vllm-mlx 0.2.6
- mlx-lm 0.31.0 (with PRs #988 + #992 applied)
- Model: Nemotron-3-Super-120B-A12B (4.5-bit MLX)
- Mac Studio M2 Ultra 128GB

## Analysis

The `qwen3` reasoning parser correctly handles reasoning extraction for non-tool-call responses. The issue appears to be in the interaction between the reasoning parser and the tool call parser — when the tool call parser processes the response, it may consume the output before the reasoning parser has a chance to extract the `<think>` tags, or the two parsers don't coordinate on the same response text.

This also affects Qwen 3.5 models in MLLM mode when tool calls are present, so it's not Nemotron-specific.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reasoning content leaks into content field when tool calls are present (Nemotron) #161

Description

Reproduction

Expected

Actual

Environment

Analysis

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Reasoning content leaks into content field when tool calls are present (Nemotron) #161

Description

Description

Reproduction

Expected

Actual

Environment

Analysis

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions