load_model_with_fallback never returns on the happy path

### Description

`load_model_with_fallback()` in `vllm_mlx/utils/tokenizer.py` lacks a `return` statement when `mlx_lm.load()` succeeds. Every return lives inside the `except ValueError` branches. When the model loads without error, the function falls off the end and returns `None`. The caller unpacks `None` and crashes:

```
TypeError: cannot unpack non-iterable NoneType object
```

### Reproduction

Load any model that `mlx_lm.load()` accepts on the first try — no `ValueError`, no fallback needed. `NexVeridian/Qwen3.5-9B-8bit` is one such model: its weights are already clean, so `load()` succeeds, and the function discards the result.

```
INFO:vllm_mlx.models.llm:Loading model: NexVeridian/Qwen3.5-9B-8bit
INFO:vllm_mlx.models.llm:Qwen3 detected: setting eos_token to <|im_end|>
ERROR:vllm_mlx.models.llm:Failed to load model: cannot unpack non-iterable NoneType object
```

### Root Cause

```python
# vllm_mlx/utils/tokenizer.py, load_model_with_fallback()

    try:
        model, tokenizer = load(model_name, tokenizer_config=tokenizer_config)
    except ValueError as e:
        if "TokenizersBackend" in str(e) or "Tokenizer class" in str(e):
            return _load_with_tokenizer_fallback(model_name)
        if "parameters not in model" in str(e):
            return _load_strict_false(model_name, tokenizer_config)
        raise
    # nothing here — function returns None
```

### Fix

Add the missing return and the MTP injection that the `_load_strict_false` path already performs:

```python
    try:
        model, tokenizer = load(model_name, tokenizer_config=tokenizer_config)
    except ValueError as e:
        if "TokenizersBackend" in str(e) or "Tokenizer class" in str(e):
            return _load_with_tokenizer_fallback(model_name)
        if "parameters not in model" in str(e):
            return _load_strict_false(model_name, tokenizer_config)
        raise

    _try_inject_mtp_post_load(model, model_name)
    return model, tokenizer
```

Without `_try_inject_mtp_post_load`, models whose MTP weights `sanitize()` stripped during normal loading will lose speculative decoding silently.

### Environment

- vllm-mlx @ `d235c37` (HEAD, 2026-03-23)
- macOS, Apple Silicon M2 Max
- Python 3.12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

load_model_with_fallback never returns on the happy path #212

Description

Reproduction

Root Cause

Fix

Environment

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

load_model_with_fallback never returns on the happy path #212

Description

Description

Reproduction

Root Cause

Fix

Environment

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions