Supported Models

vllm-metal currently focuses on text-only language models on Apple Silicon. Multi-modal (vision / audio input) models are not yet supported.

Legend

Symbol	Meaning
✅	Supported model/feature
🔵	Experimental supported model/feature
❌	Not supported model/feature
🟡	Not tested or verified

Text-Only Language Models

Automatic Prefix Cache describes the default behavior when the user does not pass --enable-prefix-caching. After #283, unified paged-KV models on Metal can reuse shared prefixes by default. Upstream vLLM still keeps the default off for hybrid/Mamba models, so those rows remain ❌ unless prefix caching is explicitly forced. These values describe the default engine behavior, not exhaustive model-by-model benchmarking on Metal. Qwen3 is explicitly covered by the paged prefix-cache e2e test.

Model	Support	Attention Kernel	Automatic Prefix Cache	PRs	Notes
Qwen3	✅	GQA (paged)	✅	#232, #237, #283	Validated by the paged prefix-cache e2e test
Qwen3.5	✅	Hybrid SDPA + GDN linear	❌	#210, #226, #230, #235, #239, #243, #259, #265, #194	Upstream keeps automatic prefix caching off for hybrid/Mamba models
Qwen3.6	✅	Hybrid SDPA + GDN linear (MoE)	❌		Upstream keeps automatic prefix caching off for hybrid/Mamba models
Qwen3-Next	✅	Hybrid SDPA + GDN linear	❌	#240	Upstream keeps automatic prefix caching off for hybrid/Mamba models
Gemma 4	🔵	GQA + per-layer sliding window + YOCO	✅	#251, #260, #269, #275, #277, #278, #282, #276, #279, #281, #283	Default-on for non-hybrid paged models; overall model support remains experimental
Gemma 3	🟡	GQA (paged)	✅	#283	Default-on by upstream policy; model support not separately verified on Metal
Llama 3	✅	GQA (paged)	✅	#294	tested on llama3.2-1B
Mistral-Small-24B	🔵	GQA (paged)	✅	#166, #190, #283	Default-on for non-hybrid paged models
GPT-OSS	🔵	Sink attention (paged)	✅	#190, #221, #212, #283	Default-on for non-hybrid paged models
GLM-4.5	🟡	MLA (paged latent cache, MLX SDPA — no Metal kernel)	🟡	#213, #233	Automatic prefix caching is not yet verified on the MLX MLA path
GLM-4.7-Flash	🔵	GQA (paged)	✅	#190, #283	Default-on for non-hybrid paged models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supported Models

Legend

Text-Only Language Models

FilesExpand file tree

supported_models.md

Latest commit

History

supported_models.md

File metadata and controls

Supported Models

Legend

Text-Only Language Models