Skip to content

TypeError: cannot unpack non-iterable NoneType object when serving Qwen3-8B-4bit (Silent failure in load_model_with_fallback) #211

@gavmor

Description

@gavmor

Description:
When attempting to serve mlx-community/Qwen3-8B-4bit using vllm-mlx, the server crashes during initialization with a TypeError. The internal load_model_with_fallback function appears to fail silently when loading the model and returns None, which then causes a crash when the code attempts to unpack the model and tokenizer.

Steps to Reproduce:

  1. Install vllm-mlx (installed via uv).
  2. Run the following command:
vllm-mlx serve mlx-community/Qwen3-8B-4bit

Expected Behavior:
The model should load successfully and start the server, or vllm-mlx should throw a descriptive error explaining why the model configuration is incompatible.

Actual Behavior:
The model downloads completely, but the server crashes with a NoneType unpacking error.

Traceback / Logs:

...
INFO:vllm_mlx.server:Loading model with SimpleEngine: mlx-community/Qwen3-8B-4bit
INFO:vllm_mlx.models.llm:Loading model: mlx-community/Qwen3-8B-4bit
INFO:vllm_mlx.models.llm:Qwen3 detected: setting eos_token to <|im_end|>
...
Download complete: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 4.61G/4.61G [09:46<00:00, 7.85MB/s]
ERROR:vllm_mlx.models.llm:Failed to load model: cannot unpack non-iterable NoneType object
Traceback (most recent call last):
  ...
  File "/Users/gavin/.local/share/uv/tools/vllm-mlx/lib/python3.12/site-packages/vllm_mlx/engine/simple.py", line 152, in start
    self._model.load()
  File "/Users/gavin/.local/share/uv/tools/vllm-mlx/lib/python3.12/site-packages/vllm_mlx/models/llm.py", line 92, in load
    self.model, self.tokenizer = load_model_with_fallback(
    ^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: cannot unpack non-iterable NoneType object

Isolation Testing / Additional Context:
I verified that the issue is strictly isolated to vllm-mlx. Loading the exact same model directly with the core mlx-lm library works perfectly:

from mlx_lm import load
model, tokenizer = load("mlx-community/Qwen3-8B-4bit")
# Success! The core MLX library reads this model without issue.

Because mlx-lm works, the model files are uncorrupted. The issue seems to be how vllm-mlx handles Qwen3 architectures specifically, causing load_model_with_fallback to swallow the underlying error and return None.

Environment:

  • OS: macOS 15.7.2
  • Installation Method: uv tool
  • Python Version: 3.12.8
  • vllm-mlx Version: v0.2.6

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions