#

mlx-lm

Here are 41 public repositories matching this topic...

cubist38 / mlx-openai-server

A high-performance API server that provides OpenAI-compatible endpoints for MLX models. Developed using Python and powered by the FastAPI framework, it provides an efficient, scalable, and user-friendly solution for running MLX-based vision and language models locally with an OpenAI-compatible interface.

flux queue speech-recognition image-generation whisper vision-api mlx fastapi multi-models apple-silicon tool-calling structured-outputs mlx-lm mlx-vlm openai-compatible

Updated Apr 13, 2026
Python

JosefAlbers / Vim-LM

AI Copilot for Vim/NeoVim

vim chat autocomplete neovim vim-plugin neovim-plugin copilot mlx llm fill-in-the-middle mlx-lm

Updated Feb 28, 2025
Python

lordmathis / llamactl

Unified management and routing for llama.cpp, MLX and vLLM models with web dashboard.

self-hosted mlx openai-api llm llamacpp llama-cpp vllm llm-inference localllm localllama llama-server llm-router mlx-lm

Updated Apr 13, 2026
Go

chainstacklabs / web3-ai-trading-agent

Build an Autonomous Web3 AI Trading Agent (BASE + Uniswap V4 example)

Updated Aug 21, 2025
Python

cagataycali / strands-mlx

Experimental: MLX model provider for Strands Agents - Build, train, and deploy AI agents on Apple Silicon.

agents mlx mlx-lm mlx-vlm strands-agents

Updated Apr 14, 2026
Python

teilomillet / textpolicy

Reinforcement learning for text generation on MLX (Apple Silicon)

reinforcement-learning text-generation lora mlx apple-silicon gspo qlora mlx-lm grpo

Updated Feb 15, 2026
Python

NeptuneIsTheBest / chat-with-mlx

An all-in-one LLMs chat Web UI based on the MLX framework, designed for Apple Silicon.

python gradio mlx llm mlx-lm mlx-vlm

Updated Apr 3, 2026
Python

dmitryryabkov / local-ai-mac

Practical guide to running LLMs locally on Apple Silicon, with a focus on architecture, memory scaling, and agent workflows.

opencode mlx apple-silicon llamacpp llama-cpp local-llm local-ai lm-studio agentic-workflow agentic-ai agentic-workflows mlx-lm claude-code

Updated Apr 12, 2026

dorukgezici / pydantic-ai-mlx

Add MLX support to Pydantic AI through LM Studio or mlx-lm, run MLX compatible HF models on Apple silicon.

mlx pydantic-ai mlx-lm

Updated Mar 9, 2025
Python

ethicalabs-ai / BlossomTuneLLM-MLX

Federated Fine-Tuning of LLMs on Apple Silicon with Flower.ai and MLX-LM

macosx mlx federated-learning huggingface apple-silicon llm llms supervised-finetuning small-language-models mlx-lm flower-ai

Updated Sep 2, 2025
Python

JoeJoe1313 / LLMs-Journey

Various LLM resources and experiments

python machine-learning agents mlx vlm rag apple-silicon llm agentic-ai mlx-lm mlx-vlm

Updated Apr 6, 2026
Jupyter Notebook

IAIAYN / MacOSLocalAPI

在 macOS 上把本地模型包装成 OpenAI 风格 API（Chat + TTS），方便你的应用/SDK 直接接入。｜Wrap local models into an OpenAI-style API (Chat + TTS) on macOS, so any OpenAI-compatible client can connect.

python macos uv fastapi openai-api mlx-lm

Updated Feb 15, 2026
Python

lpalbou / ForgeLLM

A comprehensive toolkit for end-to-end continued pre-training, fine-tuning, monitoring, testing and publishing of language models with MLX-LM

machine-learning apple language-model mlx fine-tuning model-monitoring pretraining llm llm-training continued-pretraining mlx-lm

Updated Jul 20, 2025
Python

mu-hashmi / mlx-moe

Run large Mixture-of-Experts LLMs that exceed system RAM on Apple Silicon by loading only router-selected experts from SSD with MLX. Includes OpenAI/Anthropic-compatible serving for local agentic coding.

macos metal moe caching-strategies mlx model-serving mixture-of-experts on-device-ai local-llm agentic sparse-llm mlx-lm coding-agent

Updated Feb 28, 2026
Python

llmostlabs / llmost

llmost - the most easy llm host

openai-api ai-tools llm localllm localllama mlx-lm mlx-vlm

Updated Apr 13, 2026

uplg / summary-swift

Swift app to summarize YouTube content using AI

swift whisper mlx swiftui mlx-lm

Updated Sep 27, 2025
Swift

NoSkillGuy / gemma-on-mac-mlx-vs-llama.cpp

Benchmark Gemma 4 E2B on Apple Silicon: MLX (mlx-lm) vs GGUF (llama-server), with TTFT, tokens/sec, and memory.

python macos benchmark machine-learning metal inference gemma mlx apple-silicon llama-cpp gguf llama-server mlx-lm gemma-4

Updated Apr 6, 2026
Python

adeelahmad / mlx-guided-grpo

Train reasoning models on your Mac. GRPO training framework for Apple Silicon with curriculum learning.

macos machine-learning lora m2 m m3 reasoning mlx m1 curriculum-learning fine-tuning apple-silicon llm rlhf sllm mlx-lm deepseek-r1 grpo

Updated Feb 6, 2026
Python

Smilefounder / TurboMLX

Drop-in KV cache compression for MLX on Apple Silicon. Brings PolarQuant (Google, ICLR 2026) to mlx-lm with first-class Gemma 4 support: MatFormer, dual head_dim, hybrid sliding/global attention, cross-layer KV sharing. 3-bit → 4.8× smaller cache, 0.995 logit cosine @ 4-bit.

memory-efficient gemma mlx on-device-ai kv-cache apple-silicon llm kv-cache-compression mlx-lm turboquant polarquant gemma-4

Updated Apr 12, 2026
Python

leoho0722 / mlx-llm-example

LLM model inference on Apple Silicon Mac using the Apple MLX Framework.

apple inference apple-silicon llm mlx-lm

Updated Jun 28, 2025
Python

Improve this page

Add a description, image, and links to the mlx-lm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the mlx-lm topic, visit your repo's landing page and select "manage topics."