Skip to content

Commit 1f7bb8f

Browse files
Your Nameclaude
andcommitted
fix: streaming crash with --no-thinking (enable_thinking kwarg leak)
stream_chat() passed enable_thinking through **kwargs to stream_generate() → MLXLanguageModel.stream_generate() which doesn't accept it, causing TypeError on every streaming request. Now popped from kwargs before passing downstream, matching the non-streaming path which already did this correctly. Also fixed MLLM stream_chat path for the same issue. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent d6d0c50 commit 1f7bb8f

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

vllm_mlx/engine/simple.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -505,6 +505,8 @@ async def stream_chat(
505505
token_count = 0
506506

507507
# Run the synchronous generator in a thread
508+
# Pop enable_thinking — MLLM models don't support it
509+
kwargs.pop("enable_thinking", None)
508510
sync_gen = self._model.stream_chat(
509511
messages=messages,
510512
max_tokens=max_tokens,
@@ -540,7 +542,7 @@ async def stream_chat(
540542
return
541543

542544
# For LLM, apply chat template and stream
543-
enable_thinking = kwargs.get("enable_thinking")
545+
enable_thinking = kwargs.pop("enable_thinking", None)
544546
prompt = shared_apply_chat_template(
545547
self._model.tokenizer,
546548
messages,

0 commit comments

Comments
 (0)