Skip to content

continuous batching incompatible with MLLM (ArraysCache vs KVCache) #159

@Thump604

Description

@Thump604

Problem

--continuous-batching + --mllm fails at runtime. The MLLM path creates ArraysCache (from mlx_vlm) but MLLMBatchGenerator._run_prefill_batch() requires KVCache (from mlx_lm) for KVCache.merge().

Error:

ERROR:vllm_mlx.mllm_scheduler:Error in MLLM process loop: MLLM continuous batching requires standard KVCache but got ArraysCache. Disable --kv-cache-quantization when using multimodal models with --continuous-batching.

Location: mllm_batch_generator.py:675

from mlx_lm.models.cache import KVCache

sample_cache = per_request_caches[0][0]
if not isinstance(sample_cache, KVCache):
    raise ValueError(
        f"MLLM continuous batching requires standard KVCache but got "
        f"{type(sample_cache).__name__}. ..."
    )

Context

  • Model: Qwen3.5-35B-A3B-8bit (has vision tower, requires --mllm to load)
  • Without --mllm: model fails to load (language_model.vision_tower.* weights rejected by mlx_lm)
  • Without --continuous-batching: concurrent requests crash Metal GPU (_MTLCommandBuffer assertion failure)
  • vllm-mlx version: 0.2.5 (git main as of 2026-03-14)

Impact

Any multimodal model (or model with vision weights like Qwen3.5) cannot use continuous batching. This means:

  • Concurrent requests crash the server (Metal GPU assertion)
  • Multi-user scenarios are broken for MLLM models
  • Agents/tools that send parallel requests kill the server

Suggested Fix

Support ArraysCache in MLLMBatchGenerator._run_prefill_batch() — either by:

  1. Adding an ArraysCache.merge() method analogous to KVCache.merge()
  2. Converting ArraysCacheKVCache before merge
  3. Implementing a separate batch merge path for ArraysCache

Environment

  • macOS 15.5, Mac Studio M2 Ultra (128GB)
  • MLX, mlx_vlm, mlx_lm (latest)
  • Python 3.12

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions