Supported models

This page summarizes model-family support in the v0.0.27 source tree. The runtime source of truth is the code, not this prose page:

detection: src/models/detection.rs
ModelType enum and module exports: src/models/mod.rs
loading policy: src/model_metadata.rs
VLM loading routes: src/loading/vlm*.rs

As of v0.0.27, ModelType contains 92 variants: 70 text/non-VLM variants and 22 VLM variants. These are architecture/runtime variants, not a guarantee that every checkpoint under a marketing family name is supported.

Text and hybrid model families

Implemented model families include:

Llama-family and Mistral-style dense decoders
Llama 4 text
Qwen 2 / 2.5 / 3 / 3.5, Qwen MoE, Qwen3 Next
Gemma 1 / 2 / 3 / 3n / 4 text variants
Phi, Phi-3, Phi-3 Small, PhiMoE
Mixtral and other MoE families
DeepSeek v1 / v2 / v3 / v3.2
Cohere / Cohere2
InternLM 2 / 3
GLM 4, GLM MoE, GLM MoE DSA
ERNIE 4.5 and ERNIE 4.5 MoE
Hunyuan dense and MoE variants
ExaOne / ExaOne 4 / ExaOne MoE / Solar Open
OLMo / OLMo2 / OLMo3 / OLMoE
StarCoder2, StableLM, SmolLM3, Baichuan, MiniCPM, MiniCPM3, MiniMax, Ministral3, Mistral4, Nemotron, Nemotron-NAS, Step 3.5, MiMo
Mamba, Mamba2, RWKV7, Recurrent Gemma, Jamba, Nemotron-H
Kimi Linear, LongCat Flash, LongCat Flash N-gram
GPT-OSS

Many of these families have checkpoint-specific config or weight-layout requirements. If a checkpoint fails detection or loading, inspect its config.json::model_type first and compare it with src/models/detection.rs.

Vision-language and multimodal variants

Implemented VLM variants include:

Gemma 3 VL, Gemma 3n VL, Gemma 4 VL
Llama 4 VLM
LLaVA and LLaVA-Bunny
Aya Vision and PaliGemma
Pixtral and Mistral 3 VLM wrappers
Qwen2-VL, Qwen2.5-VL, Qwen3-VL, Qwen3.5-VL, and Qwen3-VL MoE
Youtu-VL
MiniCPM-O
Moondream 3
Phi-3 Vision, Phi4MM, Phi4 SigLIP VLM
Molmo2 and Molmo-Point
Nemotron-H Nano Omni

Audio/video capability is model-specific. The server request types include image_url, video_url, and input_audio content blocks, but a loaded model must advertise support for the corresponding modality. Video frame extraction uses the system ffmpeg/ffprobe binaries at runtime.

Quantization formats

Format	Status	Notes
FP16 / BF16	supported	BF16 handling is platform/model dependent; Apple Silicon paths commonly convert to FP16 for execution.
4-bit affine MLX checkpoints	supported	Primary path for many `mlx-community` checkpoints. CUDA coverage depends on MLX kernel support for the target GPU.
8-bit affine	supported	Used for weights and/or KV cache depending on path.
NVFP4 / MXFP4 / MXFP8	supported where implemented	Used by specific families such as GPT-OSS and recent quantized checkpoints.

Do not infer quality or speed from the ability to load a quantized checkpoint. Run a smoke test and, for release claims, a benchmark/quality gate.

Distributed support summary

Capability	Current summary
Tensor parallelism	Advertised for selected dense text families such as Llama, Qwen, Gemma text, ERNIE 4.5, and Hunyuan dense. Validate per model/rank count.
Pipeline parallelism	Best validated for Llama-family text models; stage executors exist for more families with less operator coverage.
VLM under TP/PP	Partial. Vision tower / projector partitioning is not uniformly supported.
Disaggregated inference	Infrastructure exists; validate per topology and workload.

Speculative decoding

Drafter	Target families	Notes
MTP	Gemma 4 target paths	Available through shared speculative decoding flags.
DFlash	Qwen 3.5 text/VLM paths	Available through shared speculative decoding flags.

Use auto-detection by default. Override only when you know the target and drafter checkpoint pair are compatible.

Known non-goals / caveats

A supported architecture does not imply every community checkpoint variant is supported.
VLM and video/audio paths require additional runtime dependencies and prompt preparation beyond text-only generation.
TurboQuant, TP, PP, and speculative decoding are not uniformly validated for every family.
The mlxcel list output is a CLI summary and may lag the detailed enum count; the canonical source remains src/models/mod.rs and src/models/detection.rs.

Adding support

See Adding a new model for the registration, loading, and test checklist.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supported models

Text and hybrid model families

Vision-language and multimodal variants

Quantization formats

Distributed support summary

Speculative decoding

Known non-goals / caveats

Adding support

FilesExpand file tree

supported-models.md

Latest commit

History

supported-models.md

File metadata and controls

Supported models

Text and hybrid model families

Vision-language and multimodal variants

Quantization formats

Distributed support summary

Speculative decoding

Known non-goals / caveats

Adding support