Skip to content

load_model_with_fallback never returns on the happy path #212

@rretsiem

Description

@rretsiem

Description

load_model_with_fallback() in vllm_mlx/utils/tokenizer.py lacks a return statement when mlx_lm.load() succeeds. Every return lives inside the except ValueError branches. When the model loads without error, the function falls off the end and returns None. The caller unpacks None and crashes:

TypeError: cannot unpack non-iterable NoneType object

Reproduction

Load any model that mlx_lm.load() accepts on the first try — no ValueError, no fallback needed. NexVeridian/Qwen3.5-9B-8bit is one such model: its weights are already clean, so load() succeeds, and the function discards the result.

INFO:vllm_mlx.models.llm:Loading model: NexVeridian/Qwen3.5-9B-8bit
INFO:vllm_mlx.models.llm:Qwen3 detected: setting eos_token to <|im_end|>
ERROR:vllm_mlx.models.llm:Failed to load model: cannot unpack non-iterable NoneType object

Root Cause

# vllm_mlx/utils/tokenizer.py, load_model_with_fallback()

    try:
        model, tokenizer = load(model_name, tokenizer_config=tokenizer_config)
    except ValueError as e:
        if "TokenizersBackend" in str(e) or "Tokenizer class" in str(e):
            return _load_with_tokenizer_fallback(model_name)
        if "parameters not in model" in str(e):
            return _load_strict_false(model_name, tokenizer_config)
        raise
    # nothing here — function returns None

Fix

Add the missing return and the MTP injection that the _load_strict_false path already performs:

    try:
        model, tokenizer = load(model_name, tokenizer_config=tokenizer_config)
    except ValueError as e:
        if "TokenizersBackend" in str(e) or "Tokenizer class" in str(e):
            return _load_with_tokenizer_fallback(model_name)
        if "parameters not in model" in str(e):
            return _load_strict_false(model_name, tokenizer_config)
        raise

    _try_inject_mtp_post_load(model, model_name)
    return model, tokenizer

Without _try_inject_mtp_post_load, models whose MTP weights sanitize() stripped during normal loading will lose speculative decoding silently.

Environment

  • vllm-mlx @ d235c37 (HEAD, 2026-03-23)
  • macOS, Apple Silicon M2 Max
  • Python 3.12

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions