-
Notifications
You must be signed in to change notification settings - Fork 161
Description
Description
load_model_with_fallback() in vllm_mlx/utils/tokenizer.py lacks a return statement when mlx_lm.load() succeeds. Every return lives inside the except ValueError branches. When the model loads without error, the function falls off the end and returns None. The caller unpacks None and crashes:
TypeError: cannot unpack non-iterable NoneType object
Reproduction
Load any model that mlx_lm.load() accepts on the first try — no ValueError, no fallback needed. NexVeridian/Qwen3.5-9B-8bit is one such model: its weights are already clean, so load() succeeds, and the function discards the result.
INFO:vllm_mlx.models.llm:Loading model: NexVeridian/Qwen3.5-9B-8bit
INFO:vllm_mlx.models.llm:Qwen3 detected: setting eos_token to <|im_end|>
ERROR:vllm_mlx.models.llm:Failed to load model: cannot unpack non-iterable NoneType object
Root Cause
# vllm_mlx/utils/tokenizer.py, load_model_with_fallback()
try:
model, tokenizer = load(model_name, tokenizer_config=tokenizer_config)
except ValueError as e:
if "TokenizersBackend" in str(e) or "Tokenizer class" in str(e):
return _load_with_tokenizer_fallback(model_name)
if "parameters not in model" in str(e):
return _load_strict_false(model_name, tokenizer_config)
raise
# nothing here — function returns NoneFix
Add the missing return and the MTP injection that the _load_strict_false path already performs:
try:
model, tokenizer = load(model_name, tokenizer_config=tokenizer_config)
except ValueError as e:
if "TokenizersBackend" in str(e) or "Tokenizer class" in str(e):
return _load_with_tokenizer_fallback(model_name)
if "parameters not in model" in str(e):
return _load_strict_false(model_name, tokenizer_config)
raise
_try_inject_mtp_post_load(model, model_name)
return model, tokenizerWithout _try_inject_mtp_post_load, models whose MTP weights sanitize() stripped during normal loading will lose speculative decoding silently.
Environment
- vllm-mlx @
d235c37(HEAD, 2026-03-23) - macOS, Apple Silicon M2 Max
- Python 3.12