`TypeError: cannot unpack non-iterable NoneType object` when serving Qwen3-8B-4bit (Silent failure in `load_model_with_fallback`)

**Description:**
When attempting to serve `mlx-community/Qwen3-8B-4bit` using `vllm-mlx`, the server crashes during initialization with a `TypeError`. The internal `load_model_with_fallback` function appears to fail silently when loading the model and returns `None`, which then causes a crash when the code attempts to unpack the model and tokenizer.

**Steps to Reproduce:**
1. Install `vllm-mlx` (installed via `uv`).
2. Run the following command:
```bash
vllm-mlx serve mlx-community/Qwen3-8B-4bit
```

**Expected Behavior:**
The model should load successfully and start the server, or `vllm-mlx` should throw a descriptive error explaining why the model configuration is incompatible. 

**Actual Behavior:**
The model downloads completely, but the server crashes with a `NoneType` unpacking error. 

**Traceback / Logs:**
```text
...
INFO:vllm_mlx.server:Loading model with SimpleEngine: mlx-community/Qwen3-8B-4bit
INFO:vllm_mlx.models.llm:Loading model: mlx-community/Qwen3-8B-4bit
INFO:vllm_mlx.models.llm:Qwen3 detected: setting eos_token to <|im_end|>
...
Download complete: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 4.61G/4.61G [09:46<00:00, 7.85MB/s]
ERROR:vllm_mlx.models.llm:Failed to load model: cannot unpack non-iterable NoneType object
Traceback (most recent call last):
  ...
  File "/Users/gavin/.local/share/uv/tools/vllm-mlx/lib/python3.12/site-packages/vllm_mlx/engine/simple.py", line 152, in start
    self._model.load()
  File "/Users/gavin/.local/share/uv/tools/vllm-mlx/lib/python3.12/site-packages/vllm_mlx/models/llm.py", line 92, in load
    self.model, self.tokenizer = load_model_with_fallback(
    ^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: cannot unpack non-iterable NoneType object
```

**Isolation Testing / Additional Context:**
I verified that the issue is strictly isolated to `vllm-mlx`. Loading the exact same model directly with the core `mlx-lm` library works perfectly:

```python
from mlx_lm import load
model, tokenizer = load("mlx-community/Qwen3-8B-4bit")
# Success! The core MLX library reads this model without issue.
```

Because `mlx-lm` works, the model files are uncorrupted. The issue seems to be how `vllm-mlx` handles Qwen3 architectures specifically, causing `load_model_with_fallback` to swallow the underlying error and return `None`.

**Environment:**
* **OS:** macOS 15.7.2
* **Installation Method:** `uv tool`
* **Python Version:** 3.12.8
* **vllm-mlx Version:** v0.2.6


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`TypeError: cannot unpack non-iterable NoneType object` when serving Qwen3-8B-4bit (Silent failure in `load_model_with_fallback`) #211

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

TypeError: cannot unpack non-iterable NoneType object when serving Qwen3-8B-4bit (Silent failure in load_model_with_fallback) #211

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

`TypeError: cannot unpack non-iterable NoneType object` when serving Qwen3-8B-4bit (Silent failure in `load_model_with_fallback`) #211