continuous batching incompatible with MLLM (ArraysCache vs KVCache)

## Problem

`--continuous-batching` + `--mllm` fails at runtime. The MLLM path creates `ArraysCache` (from mlx_vlm) but `MLLMBatchGenerator._run_prefill_batch()` requires `KVCache` (from mlx_lm) for `KVCache.merge()`.

**Error:**
```
ERROR:vllm_mlx.mllm_scheduler:Error in MLLM process loop: MLLM continuous batching requires standard KVCache but got ArraysCache. Disable --kv-cache-quantization when using multimodal models with --continuous-batching.
```

**Location:** `mllm_batch_generator.py:675`

```python
from mlx_lm.models.cache import KVCache

sample_cache = per_request_caches[0][0]
if not isinstance(sample_cache, KVCache):
    raise ValueError(
        f"MLLM continuous batching requires standard KVCache but got "
        f"{type(sample_cache).__name__}. ..."
    )
```

## Context

- Model: Qwen3.5-35B-A3B-8bit (has vision tower, requires `--mllm` to load)
- Without `--mllm`: model fails to load (`language_model.vision_tower.*` weights rejected by mlx_lm)
- Without `--continuous-batching`: concurrent requests crash Metal GPU (`_MTLCommandBuffer` assertion failure)
- vllm-mlx version: 0.2.5 (git main as of 2026-03-14)

## Impact

Any multimodal model (or model with vision weights like Qwen3.5) cannot use continuous batching. This means:
- Concurrent requests crash the server (Metal GPU assertion)
- Multi-user scenarios are broken for MLLM models
- Agents/tools that send parallel requests kill the server

## Suggested Fix

Support `ArraysCache` in `MLLMBatchGenerator._run_prefill_batch()` — either by:
1. Adding an `ArraysCache.merge()` method analogous to `KVCache.merge()`
2. Converting `ArraysCache` → `KVCache` before merge
3. Implementing a separate batch merge path for `ArraysCache`

## Environment

- macOS 15.5, Mac Studio M2 Ultra (128GB)
- MLX, mlx_vlm, mlx_lm (latest)
- Python 3.12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

continuous batching incompatible with MLLM (ArraysCache vs KVCache) #159

Problem

Context

Impact

Suggested Fix

Environment

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

continuous batching incompatible with MLLM (ArraysCache vs KVCache) #159

Description

Problem

Context

Impact

Suggested Fix

Environment

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions