- macOS on Apple Silicon (M1/M2/M3/M4)
- Python 3.10+
git clone https://github.com/waybarrios/vllm-mlx.git
cd vllm-mlx
uv pip install -e .git clone https://github.com/waybarrios/vllm-mlx.git
cd vllm-mlx
pip install -e .For video processing with transformers:
pip install -e ".[vision]"pip install mlx-audiopip install mlx-embeddingsmlx,mlx-lm,mlx-vlm- MLX framework and model librariestransformers,tokenizers- HuggingFace librariesopencv-python- Video processinggradio- Chat UIpsutil- Resource monitoringmlx-audio(optional) - Speech-to-Text and Text-to-Speechmlx-embeddings(optional) - Text embeddings
# Check CLI commands
vllm-mlx --help
vllm-mlx-bench --help
vllm-mlx-chat --help
# Test with a small model
vllm-mlx-bench --model mlx-community/Llama-3.2-1B-Instruct-4bit --prompts 1Ensure you're on Apple Silicon:
uname -m # Should output "arm64"Check your internet connection and HuggingFace access. Some models require authentication:
huggingface-cli loginUse a smaller quantized model:
vllm-mlx serve mlx-community/Llama-3.2-1B-Instruct-4bit