MLX LLM Example

LLM model inference on Apple Silicon Mac using the Apple MLX Framework.

Environment

Hardware

Apple MacBook Pro (13-inch, M2, 2022)
Apple M2 chips (8 cores CPU, 10 cores GPU)
16GB RAM, 256GB SSD
macOS Sequoia 15.3.1

Software

Python 3.10.16
mlx-lm 0.21.4

Installation

Create Virtual Environment

python3.10 -m venv .venv
source .venv/bin/activate

Install Dependencies

pip install -U pip setuptools pip-autoremove
pip install -r requirements.txt

Run

Model Download

Args	Type	Required	Default	Description
`--repo_id`	`str`	Required		Path or Hugging Face Repository ID
`--token`	`str`	Optional		Hugging Face API Token
`--cache_dir`	`str`	Optional	`~/.cache/huggingface/hub`	Cache directory for the model

source .venv/bin/activate

# Download the model from Hugging Face Hub
python model_download.py --repo_id "mlx-community/gemma-2-9b-it-4bit"

# Download the model from Hugging Face Hub with custom cache directory
python model_download.py --repo_id "mlx-community/gemma-2-9b-it-4bit" --cache_dir "/tmp/huggingface/hub"

# Download the model from Hugging Face Hub with custom hugging face token
python model_download.py --repo_id "mlx-community/gemma-2-9b-it-4bit" --token "YOUR_HUGGING_FACE_API_TOKEN"

Streaming Inference

Args	Type	Required	Default	Description
`-m`, `--model`	`str`	Required		Path to the model
`--prompt`	`str`	Required		Prompt for the LLM model
`--max_tokens`	`int`	Optional	`512`	Maximum tokens to generate
`--verbose`	`bool`	Optional		Verbose mode

source .venv/bin/activate

# Run the stream inference with default values
python inference.py

# Run the stream inference with verbose mode
python inference.py --verbose

# Run the stream inference with custom model
python inference.py --model "mlx-community/DeepSeek-Coder-V2-Lite-Instruct-4bit-mlx"

# Run the stream inference with custom prompt
python inference.py --prompt "What is the capital of France?"

# Run the stream inference with custom max tokens
python inference.py --max_tokens 1024

Convert Hugging Face Model to MLX Model Format

Args	Type	Required	Default	Description
`-m`, `--model`	`str`	Required		Path to the model
`--quantize`	`bool`	Required		Whether Quantize model
`--quantize_level`	`int`	Optional	4	Quantize level (bits)
`--verbose`	`bool`	Optional		Verbose mode

source .venv/bin/activate

# Convert Hugging Face model to MLX model format
python convert.py --model "google/gemma-2-9b-it"

# Convert Hugging Face model to MLX model format with verbose mode
python convert.py --model "google/gemma-2-9b-it" --verbose

# Convert Hugging Face model to MLX model format with quantization
python convert.py --model "google/gemma-2-9b-it" --quantize

# Convert Hugging Face model to MLX model format with quantization and custom quantize level
python convert.py --model "google/gemma-2-9b-it" --quantize --quantize_level 8

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
images		images
.envExample		.envExample
.gitignore		.gitignore
README.md		README.md
convert.py		convert.py
convert_vlm.py		convert_vlm.py
inference.py		inference.py
model_download.py		model_download.py
requirements.txt		requirements.txt
vlm_inference.py		vlm_inference.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLX LLM Example

Environment

Hardware

Software

Installation

Create Virtual Environment

Install Dependencies

Run

Model Download

Streaming Inference

Convert Hugging Face Model to MLX Model Format

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

MLX LLM Example

Environment

Hardware

Software

Installation

Create Virtual Environment

Install Dependencies

Run

Model Download

Streaming Inference

Convert Hugging Face Model to MLX Model Format

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 1

Languages

Packages