Performance benchmarks for vllm-mlx on Apple Silicon.
- LLM Benchmarks - Text generation performance
- Image Benchmarks - Image understanding performance
- Video Benchmarks - Video understanding performance
# LLM benchmark
vllm-mlx-bench --model mlx-community/Qwen3-0.6B-8bit
# Image benchmark
vllm-mlx-bench --model mlx-community/Qwen3-VL-8B-Instruct-4bit
# Video benchmark
vllm-mlx-bench --model mlx-community/Qwen3-VL-8B-Instruct-4bit --videoStandalone benchmark test scripts have built-in default models, so you can run:
python tests/test_continuous_batching.py
python tests/test_prefix_cache.pyDefaults:
tests/test_continuous_batching.py→mlx-community/Qwen3-8B-6bittests/test_prefix_cache.py→mlx-community/Qwen3-0.6B-8bit
To test different models, use the optional --model flag:
python tests/test_continuous_batching.py --model mlx-community/Qwen3-0.6B-8bit
python tests/test_prefix_cache.py --model mlx-community/Qwen3-8B-6bitBenchmarks have been collected on the following Apple Silicon configurations:
| Chip | Memory | Python |
|---|---|---|
| Apple M4 Max | 128 GB unified | 3.13 |
| Apple M1 Max | 64 GB unified | 3.12 |
Results will vary on different Apple Silicon chips.
If you have a different Apple Silicon chip, please share your results:
vllm-mlx-bench --model mlx-community/Qwen3-0.6B-8bit --output results.jsonOpen an issue with your results at GitHub Issues.