-
Notifications
You must be signed in to change notification settings - Fork 161
Description
vllm-mlx version: latest (0.2.6)
Models affected: Qwen2.5--Instruct, Llama-3.-Instruct (all models with tool-calling in their chat template)
Server flags used: none — plain vllm-mlx serve --port 8080
Describe the bug
When sending a /v1/chat/completions request withtool_choice: "none" and either no tools field or tools: [], the model still returns finish_reason: "tool_calls" with tool_call objects whose arguments are always "{}". The model is encoding output data (e.g. theme names, analysis results) as tool function names instead of as text content.
Minimal reproduction
vllm-mlx serve mlx-community/Qwen2.5-14B-Instruct-4bit --port 8080
pythonimport requests
r = requests.post("http://localhost:8080/v1/chat/completions", json={
"model": "mlx-community/Qwen2.5-14B-Instruct-4bit",
"messages": [{"role": "user", "content": "List 3 fruits as a JSON array"}],
"tool_choice": "none",
"tools": [],
"max_tokens": 200
})
print(r.json()["choices"][0])
# finish_reason: "tool_calls", content: null, tool_calls: [{"function": {"name": "Apple", "arguments": "{}"}}]Expected behaviour
finish_reason: "stop", content contains the JSON array, tool_calls is absent.
Root cause
The chat template handler passes tools=[] (or the request's tools list) into apply_chat_template() even when tool_choice="none". Qwen2.5 and Llama 3.x have tool-calling jinja templates that activate when a tools key is present — even an empty list triggers the tool-call token branch in some tokenizer versions. Upstream vLLM handles this with --exclude-tools-when-tool-choice-none (which strips tools from the template context when tool_choice="none"), but this flag was never ported to vllm-mlx.
Proposed fix
In the chat completions handler, before calling apply_chat_template, check: if tool_choice == "none", pass tools=None to the template regardless of what the request contained.