Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

README.md

Benchmarks

Performance benchmarks for vllm-mlx on Apple Silicon.

Benchmark Types

Quick Commands

# LLM benchmark
vllm-mlx-bench --model mlx-community/Qwen3-0.6B-8bit

# Image benchmark
vllm-mlx-bench --model mlx-community/Qwen3-VL-8B-Instruct-4bit

# Video benchmark
vllm-mlx-bench --model mlx-community/Qwen3-VL-8B-Instruct-4bit --video

Standalone Test Defaults

Standalone benchmark test scripts have built-in default models, so you can run:

python tests/test_continuous_batching.py
python tests/test_prefix_cache.py

Defaults:

  • tests/test_continuous_batching.pymlx-community/Qwen3-8B-6bit
  • tests/test_prefix_cache.pymlx-community/Qwen3-0.6B-8bit

To test different models, use the optional --model flag:

python tests/test_continuous_batching.py --model mlx-community/Qwen3-0.6B-8bit
python tests/test_prefix_cache.py --model mlx-community/Qwen3-8B-6bit

Hardware

Benchmarks have been collected on the following Apple Silicon configurations:

Chip Memory Python
Apple M4 Max 128 GB unified 3.13
Apple M1 Max 64 GB unified 3.12

Results will vary on different Apple Silicon chips.

Contributing Benchmarks

If you have a different Apple Silicon chip, please share your results:

vllm-mlx-bench --model mlx-community/Qwen3-0.6B-8bit --output results.json

Open an issue with your results at GitHub Issues.