Skip to content

Releases: scouzi1966/maclocal-api

afm-next (20260303 · 410d7e5)

03 Mar 02:54

Choose a tag to compare

Pre-release

Nightly build from main branch.

  • Commit: 410d7e5
  • Date: 20260303
  • Version: 0.9.5-next.410d7e5.20260303

This is an unstable development build. For the latest stable release, use brew install scouzi1966/afm/afm.

Changes since last build (cd2941e)

  • Auto-clone homebrew-afm tap repo if missing during nightly publish (410d7e5)
  • Add ownership guard to build-afm-nightly-publish skill (1b71d95)
  • Add build-afm-nightly-publish skill (bc777d6)
  • Fix Jinja crash on nullable tool schemas (closes #32) (f4c80cc)
  • Simplify build-afm Step 3 to report only binary path and version (d96c786)
  • Add fork-first instruction to vibe coding callout (7f58591)
  • Add vibe coding callout to README for non-Swift developers (75af8fc)
  • Reorder build-afm prerequisites by dependency chain (2d498f1)
  • Add prerequisite validation to build-afm skill (1c27148)
  • Add skills, test reports, qwen3_5 registry alias, and bench tooling (50ed40f)
  • Add script paths and commands to report index page (038ba82)
  • Show invocation command in report header for reproducibility (c7cdfd3)
  • Fix index.html links to use htmlpreview for HTML reports (7d8187c)
  • Remove GitHub Actions Pages workflow (Actions disabled on repo) (7b4fff1)
  • Add GitHub Actions workflow for Pages deployment (f63791b)
  • Add GitHub Pages index for test reports (2e3f7bf)
  • Add MLX patch comparison report (3-ref with upstream-only detection) (7d38bf9)
  • Add YAML frontmatter to CLI help for AI agent discovery (9f755d8)
  • Add structured help with YAML frontmatter for AI agent discovery (a31ccd5)
  • Add repeatable MLX patch comparison report generator (8bbec6f)
  • Add cached_tokens to usage response, assertion test suite, and test-macafm skill (037c657)
  • Reorder test report: description above link (f9cff88)
  • Move test report link below title, add description with judge methodology (b8f7728)
  • Add afm-next nightly test report link for Qwen3.5-35B-A3B (90dfc8d)
  • Test harness: template mode, smart scoring fixes, report improvements (aefe39a)
  • Merge pull request #28 from scouzi1966/feature/optimise-metal (4d58e3f)
  • Fix review: QKV mode check, QuantizedKVCache mode, dead perf code (2ca6495)
  • Fix GLM-5 OOM + gemma-3 crash + MoE argPartition optimization (5b8e88c)
  • Auto-detect VLM models + fix model discovery for HF cache dirs (02a7008)
  • Perf: Metal kernel fusions + graph optimizations (107.6→130 tok/s, +21%) (9e9fcf3)
  • Perf: beat Python mlx-lm throughput (95.7→107.6 tok/s, +12%) (b11fa4a)
  • Add assumption for previous afm installation (afb62d6)
  • Update model reference in README.md (eb17f6c)
  • Update README with new features in nightly build (3f4e859)
  • Update README with new features for nightly build (d9c804b)
  • Update README with new API features and parameters (1da2ad3)

Install / Upgrade via Homebrew

Fresh install (first time):

brew tap scouzi1966/afm
brew install scouzi1966/afm/afm-next

Upgrade (already installed):

brew upgrade afm-next

If you have stable afm installed, unlink it first:

brew unlink afm
brew install scouzi1966/afm/afm-next

Switch back to stable:

brew unlink afm-next
brew link afm

Force reinstall (same version, new build):

brew reinstall afm-next

afm-next (20260226 · cd2941e)

26 Feb 03:24
cd2941e

Choose a tag to compare

Pre-release

Nightly build from main branch.

  • Commit: cd2941e
  • Date: 20260226
  • Version: 0.9.5-next.cd2941e.20260226

This is an unstable development build. For the latest stable release, use brew install scouzi1966/afm/afm.

Changes since last build (0cfba17)

  • Fix Metal kernel fallback and temp media file cleanup (d84ab02)
  • Add --media flag for VLM single-prompt mode + base64 data URL support (a834638)
  • Qwen3.5 perf: add VLM Metal kernel + default to LLM loading (42→95 tok/s) (ec88798)
  • Add test scripts, reports, and benchmark strict=False fix (cdceabf)
  • Fix afm args quoting with shlex.split, add benchmark script (7905be4)
  • Fix afm: args quoting in test harness (read -ra → eval) (0ddb545)
  • Clean up README: consolidate install sections, add stable/nightly table (c78e5d5)
  • Exclude test report HTML/JSONL from GitHub language stats (a4c9c7d)
  • Fix CLI --stop not passed to Server in MlxCommand (f3607af)
  • Fix stop sequences in thinking models, add CLI --stop flag, fix JSON schema injection (050e836)
  • Fix claude nested session issue and regenerate report with both AI analyses (dd419d8)
  • Add Qwen3.5-35B-A3B-4bit test suite and report (129/132 passed) (b93db37)
  • Move stable install instructions below latest release link (747a67c)
  • Update afm-next heading wording (385e738)
  • Change afm-next heading to 'Available NOW' (697abf0)
  • Update README with Qwen3.5-35B-A3B support and afm-next install instructions (50fcd13)

Install / Upgrade via Homebrew

Fresh install (first time):

brew tap scouzi1966/afm
brew install scouzi1966/afm/afm-next

Upgrade (already installed):

brew upgrade afm-next

If you have stable afm installed, unlink it first:

brew unlink afm
brew install scouzi1966/afm/afm-next

Switch back to stable:

brew unlink afm-next
brew link afm

Force reinstall (same version, new build):

brew reinstall afm-next

afm-next (20260225 · 0cfba17)

25 Feb 02:54

Choose a tag to compare

Pre-release

Nightly build from main branch.

  • Commit: 0cfba17
  • Date: 20260225
  • Version: 0.9.5-next.0cfba17.20260225

This is an unstable development build. For the latest stable release, use brew install scouzi1966/afm/afm.

RUN Qwen3.5-35B-A3B!

 afm mlx -m mlx-community/Qwen3.5-35B-A3B-4bit -w

Changes since last build (199cded)

  • Download *.jinja files and fix missing chat_template fallback (0cfba17)
  • Gate verbose colored logs behind --very-verbose flag (7514638)

Install / Upgrade via Homebrew

Fresh install (first time):

brew tap scouzi1966/afm
brew install scouzi1966/afm/afm-next

Upgrade (already installed):

brew upgrade afm-next

If you have stable afm installed, unlink it first:

brew unlink afm
brew install scouzi1966/afm/afm-next

Switch back to stable:

brew unlink afm-next
brew link afm

Force reinstall (same version, new build):

brew reinstall afm-next

afm-next (20260225 · 199cded)

25 Feb 00:12
199cded

Choose a tag to compare

Pre-release

Nightly build from main branch.

  • Commit: 199cded
  • Date: 20260225
  • Version: 0.9.5-next.199cded.20260225

This is an unstable development build. For the latest stable release, use brew install scouzi1966/afm/afm.

Changes since last build (04f8b52)

  • Merge pull request #26 from alantmiller/fix/vision-async-dispatch (199cded)
  • Use 'Changes since last build (SHA)' format in release notes (7f4b7e7)
  • Show 'changes since' commit SHA in nightly release notes (62a2a1f)
  • Add --since flag to publish-next.sh for changelog control (5a44659)
  • Expand install/upgrade instructions in nightly release notes (2913362)
  • Include commit SHA in brew version string (969ca10)
  • Add commit field and fullVersion to BuildInfo.swift (586bbea)
  • Preserve nightly release history with unique tags (745e5cd)
  • Add git commit SHA to --version output (e.g. v0.9.5-abc1234) (0e5a8b6)
  • Add changelog and install instructions to local publish script (493c94f)
  • Fix MXFP4 quantization crash, token counting, gemma3n routing, and test harness (d79a1d3)
  • Update README with nightly build installation instructions (f1d2813)
  • fix: vision subcommand dispatches async run() correctly (04566f0)
  • Fix bare JSON tool call detection and add ToolCallFormat.swift patch (f6efa24)
  • Change nightly build to manual trigger only (af238ab)
  • Add nightly build workflow for afm-next (8892329)
  • Merge origin/main into feature/mlx-prompt-caching (146030f)
  • Add tool call parser test results (26/26 pass) (694abbb)
  • Fix review findings: zero-arg JSON, prefix caching default, fallback tag detection (1ee1ede)
  • Add hermes, llama3_json, gemma, and mistral tool call parsers (341f35d)
  • Merge pull request #24 from scouzi1966/feature/structured-outputs (a7b3b38)
  • Address PR review: nullable types, null rejection, guided streaming deltas (e25d3ef)
  • Add structured outputs, --guided-json CLI flag, and comprehensive test suite (3b30fa0)
  • Add incremental streaming tool call arguments and fix parameter name mapping (9b27cac)
  • Update README for v0.9.5 features (519d35f)
  • Update README with new features and MLX support (fe7bd74)
  • Add token-level streaming tool call detection and update CLAUDE.md (ec46e88)
  • Add tool calling, stop sequences, response_format, and real token counts (71e2c68)
  • Save test reports to test-reports/ with JSONL data and add Kimi brief prompt (bee6a72)
  • Add logprobs support, --max-logprobs switch, and dynamic system_fingerprint (b37fdff)
  • Bump version to v0.9.5 and add sampling params test report (4abbdee)
  • Add top_k, min_p, presence_penalty, and seed sampling parameters (ce69ba0)
  • Checkpoint: OpenClaw config, verbose logging, max_completion_tokens, and streaming improvements (b69b9a5)
  • Revise README for clarity on v0.9.4 features (c66764e)
  • Add Qwen3.5-MoE VLM support, reasoning extraction, --raw flag, and stream cancellation (6cf821c)
  • Checkpoint: pre Qwen3.5-397B-A17B-4bit reclassify (8f35675)
  • Update OpenCode usage instructions in README (c72a023)
  • Revise installation methods in README (73c1640)
  • Swap OpenCode setup steps: configure first, then start afm (694f456)
  • Add detailed OpenCode /connect instructions to README (b15801b)
  • Add OpenCode integration guide to README (90af069)
  • Wire all MLX CLI params to server mode, enhance generation logging (1bc007f)
  • Revise installation command formatting in README (38c25d2)
  • Update README with model repo environment variable (51ef4b4)
  • Update README.md (5bc112f)
  • Revise README for v0.9.4 feature announcement (3d6c296)
  • Update README with feature listing and API access (d7b3bfa)
  • Revise README for MLX model support and commands (c89f191)
  • Add MLX excitement and quick install to README hero section (4f9b374)
  • Add MLX models screenshot to README (0d1bd07)
  • Add files via upload (f979ab4)
  • Update README with MLX local model support and new v0.9.4 features (9d86386)
  • Add regression test report: 61/61 passed (680c465)
  • Add MLX model test report: 27/28 passed, Kimi-K2.5 interrupted (fa84cf7)
  • Fix MLX metallib resolution for relocated binaries (5059f1b)

Install / Upgrade via Homebrew

Fresh install (first time):

brew tap scouzi1966/afm
brew install scouzi1966/afm/afm-next

Upgrade (already installed):

brew upgrade afm-next

If you have stable afm installed, unlink it first:

brew unlink afm
brew install scouzi1966/afm/afm-next

Switch back to stable:

brew unlink afm-next
brew link afm

Force reinstall (same version, new build):

brew reinstall afm-next

afm-next (20260224 · 7f4b7e7)

24 Feb 23:30

Choose a tag to compare

Pre-release

Nightly build from main branch.

  • Commit: 7f4b7e7
  • Date: 20260224
  • Version: 0.9.5-next.7f4b7e7.20260224

This is an unstable development build. For the latest stable release, use brew install scouzi1966/afm/afm.

Changes since last build (04f8b52)

  • Use 'Changes since last build (SHA)' format in release notes (7f4b7e7)
  • Show 'changes since' commit SHA in nightly release notes (62a2a1f)
  • Add --since flag to publish-next.sh for changelog control (5a44659)
  • Expand install/upgrade instructions in nightly release notes (2913362)
  • Include commit SHA in brew version string (969ca10)
  • Add commit field and fullVersion to BuildInfo.swift (586bbea)
  • Preserve nightly release history with unique tags (745e5cd)
  • Add git commit SHA to --version output (e.g. v0.9.5-abc1234) (0e5a8b6)
  • Add changelog and install instructions to local publish script (493c94f)
  • Fix MXFP4 quantization crash, token counting, gemma3n routing, and test harness (d79a1d3)
  • Update README with nightly build installation instructions (f1d2813)
  • Fix bare JSON tool call detection and add ToolCallFormat.swift patch (f6efa24)
  • Change nightly build to manual trigger only (af238ab)
  • Add nightly build workflow for afm-next (8892329)
  • Merge origin/main into feature/mlx-prompt-caching (146030f)
  • Add tool call parser test results (26/26 pass) (694abbb)
  • Fix review findings: zero-arg JSON, prefix caching default, fallback tag detection (1ee1ede)
  • Add hermes, llama3_json, gemma, and mistral tool call parsers (341f35d)
  • Merge pull request #24 from scouzi1966/feature/structured-outputs (a7b3b38)
  • Address PR review: nullable types, null rejection, guided streaming deltas (e25d3ef)
  • Add structured outputs, --guided-json CLI flag, and comprehensive test suite (3b30fa0)
  • Add incremental streaming tool call arguments and fix parameter name mapping (9b27cac)
  • Update README for v0.9.5 features (519d35f)
  • Update README with new features and MLX support (fe7bd74)
  • Add token-level streaming tool call detection and update CLAUDE.md (ec46e88)
  • Add tool calling, stop sequences, response_format, and real token counts (71e2c68)
  • Save test reports to test-reports/ with JSONL data and add Kimi brief prompt (bee6a72)
  • Add logprobs support, --max-logprobs switch, and dynamic system_fingerprint (b37fdff)
  • Bump version to v0.9.5 and add sampling params test report (4abbdee)
  • Add top_k, min_p, presence_penalty, and seed sampling parameters (ce69ba0)
  • Checkpoint: OpenClaw config, verbose logging, max_completion_tokens, and streaming improvements (b69b9a5)
  • Revise README for clarity on v0.9.4 features (c66764e)
  • Add Qwen3.5-MoE VLM support, reasoning extraction, --raw flag, and stream cancellation (6cf821c)
  • Checkpoint: pre Qwen3.5-397B-A17B-4bit reclassify (8f35675)
  • Update OpenCode usage instructions in README (c72a023)
  • Revise installation methods in README (73c1640)
  • Swap OpenCode setup steps: configure first, then start afm (694f456)
  • Add detailed OpenCode /connect instructions to README (b15801b)
  • Add OpenCode integration guide to README (90af069)
  • Wire all MLX CLI params to server mode, enhance generation logging (1bc007f)
  • Revise installation command formatting in README (38c25d2)
  • Update README with model repo environment variable (51ef4b4)
  • Update README.md (5bc112f)
  • Revise README for v0.9.4 feature announcement (3d6c296)
  • Update README with feature listing and API access (d7b3bfa)
  • Revise README for MLX model support and commands (c89f191)
  • Add MLX excitement and quick install to README hero section (4f9b374)
  • Add MLX models screenshot to README (0d1bd07)
  • Add files via upload (f979ab4)
  • Update README with MLX local model support and new v0.9.4 features (9d86386)
  • Add regression test report: 61/61 passed (680c465)
  • Add MLX model test report: 27/28 passed, Kimi-K2.5 interrupted (fa84cf7)
  • Fix MLX metallib resolution for relocated binaries (5059f1b)

Install / Upgrade via Homebrew

Fresh install (first time):

brew tap scouzi1966/afm
brew install scouzi1966/afm/afm-next

Upgrade (already installed):

brew upgrade afm-next

If you have stable afm installed, unlink it first:

brew unlink afm
brew install scouzi1966/afm/afm-next

Switch back to stable:

brew unlink afm-next
brew link afm

Force reinstall (same version, new build):

brew reinstall afm-next

v0.9.4 — MLX Local Models, All Swift, No Python. WebUI Supported.

20 Feb 04:04

Choose a tag to compare

What's New in v0.9.4

MLX Local Model Support

  • Run any Hugging Face MLX model locally — all Swift, no Python dependency
  • Full WebUI support (afm mlx -m <model> -w)
  • Interactive model picker for downloaded models
  • 28 models tested and verified (Qwen3, Gemma 3/3n, GLM-4/5, DeepSeek V3, LFM2, SmolLM3, Llama 3.2, MiniMax M2.5, Nemotron, Kimi-K2.5, and more)
  • Single prompt and pipe mode (afm mlx -m <model> -s "prompt")

API Gateway Mode

  • afm -w -g auto-discovers and proxies Ollama, LM Studio, Jan, and other local backends
  • Single URL, single WebUI, all your models

Also New

  • Vision OCR subcommand (afm vision)
  • Reasoning model support (Qwen, DeepSeek, gpt-oss)
  • WebUI auto-selects the right model on startup
  • Increase max_tokens to 5000
  • Increase model load timeout to 360s for large models

Install

pip install macafm
# or
brew install scouzi1966/afm/afm

v0.9.3

29 Jan 17:32

Choose a tag to compare

What's New in v0.9.3

API Gateway Mode (afm -w -g)

  • Auto-discovers and proxies to Ollama, LM Studio, Jan, llama.cpp, and other local LLM backends
  • Unified model selector — all backend models appear in a single dropdown
  • Model info strip — shows backend name, capabilities (Vision, Tools), and context window size
  • LM Studio loaded state detection — correctly identifies loaded vs unloaded models

Reasoning Model Support

  • GPT-OSS / DeepSeek / Qwen reasoning — normalizes reasoning field to reasoning_content for WebUI compatibility
  • <think> tag extraction — extracts reasoning from <think>...</think> blocks in streaming responses

WebUI Enhancements

  • Apple Intelligence branding in single-model mode with SF Symbol icon
  • Startup flash fix — page hidden until branding applied, no more llama.cpp → AFM transition flicker
  • Single-model mode fix — model selector dropdown no longer pops up when only Foundation model is available
  • llama.cpp webui subtitle — shows "llama.cpp webui" under AFM heading

Streaming & Stats

  • stream_options.include_usage sent to all backends (LM Studio, Ollama, Jan) for real token counts
  • Estimated token counting fallback for backends without usage data
  • Native streaming for Apple Foundation Model

Other

  • pip installpip install macafm now available as alternative to Homebrew
  • Pre-warm — model pre-warmed on server startup for faster first response
  • Multimodal OCR — vision support for Foundation Model
  • Fix: WebUI bundled in pip packageafm -w now correctly opens browser when installed via pip

Installation

# Homebrew
brew tap scouzi1966/afm
brew install afm

brew update  
 brew upgrade afm   (if previously installed with brew)

# pip
pip install macafm

pip install --upgrade macafm   (if installed with pip earlier)

Quick Start

afm           # API server only
afm -w        # API server + WebUI
afm -w -g     # WebUI + gateway (auto-discovers all local backends)

AFM v0.9.1 with WebUI!

23 Jan 21:04

Choose a tag to compare

What's New

Bug Fixes

  • Improved error handling for context window exceeded errors - When the conversation exceeds Apple Foundation Models' 4096 token limit, the WebUI now displays a clear, formatted error message in the chat instead of silently stopping
  • Errors are shown as visible assistant messages with emoji formatting
  • Proper stream termination ensures the WebUI exits its waiting state

Changes

  • Added contextWindowExceeded error type with token count extraction
  • Error messages include specific token counts when available
  • OpenAI-compatible context_length_exceeded error type for API responses

Installation

Homebrew (recommended)

brew tap scouzi1966/tap
brew install afm

Manual

tar -xzf afm-v0.9.1-arm64.tar.gz
cd afm-v0.9.1-arm64
./install.sh

Requirements

  • macOS 26+ with Apple Intelligence enabled
  • Apple Silicon Mac (M1/M2/M3/M4 series)

v0.9.0 - llama.cpp WebUI Integration

23 Jan 03:10
b9d38ad

Choose a tag to compare

What's New

llama.cpp WebUI Integration

  • Add -w/--webui flag to enable llama.cpp webui and open browser automatically
  • WebUI provides a chat interface at http://localhost:9999 when enabled
  • Rebranded as "Apple Foundation Models" in the UI

API Improvements

  • Add /props endpoint for webui compatibility
  • Make model field optional in chat completion requests

Build System

  • Add llama.cpp as git submodule (pinned to specific commit for reproducibility)
  • Add Makefile targets: submodules, webui, build-with-webui
  • WebUI included in distribution packages

Usage

# Start server with webui
afm -w
afm --webui

# Start on custom port
afm -w --port 8080

Requirements

  • macOS 26+ with Apple Intelligence enabled
  • Apple Silicon Mac (M1/M2/M3/M4 series)
  • Node.js (only for building webui from source)

Installation

Homebrew

brew tap scouzi1966/afm
brew install afm

Manual

Download afm-v0.9.0-arm64.tar.gz, extract, and run ./install.sh

v0.8.0 - Permissive Guardrails

18 Oct 01:28

Choose a tag to compare

🚀 AFM v0.8.0 - Permissive Guardrails by Dan Fabulich (https://github.com/dfabulich)

This release introduces permissive guardrails support, allowing you to process potentially unsafe content for legitimate use cases like content moderation and inspection.

✨ What's New

🛡️ Permissive Guardrails Support

Enable Apple's .permissiveContentTransformations API to allow processing of content that would normally be blocked by default safety guardrails:

# Enable permissive guardrails mode (the following will flag unsafe content without permissive guardrails option)
afm --permissive-guardrails -s "I want to be a porn star"
afm -P -s "I want to be a porn star"

# Use in server mode
afm -P --port 9999

# Combine with other parameters
afm -P -t 0.7 -r "random:top-p=0.9" -s "Analysis task"

Use Cases:

  • Content moderation systems
  • Safety research and testing
  • Content inspection and classification
  • Educational/research applications

⚠️ Important: Only use permissive guardrails for legitimate use cases. This mode bypasses Apple's default safety transformations.

🐛 Bug Fixes

  • Fixed server startup in background/daemon mode
  • Fixed missing default parameter values for ArgumentParser
  • Improved stdin detection for pipe mode
  • Better error handling and validation

🧪 Comprehensive Test Suite

New test-all-features.sh script with 60+ automated tests:

  • CLI parameter validation
  • Single-prompt mode testing
  • Temperature and randomness parameters
  • Permissive guardrails behavior
  • Server mode and API endpoints
  • Realistic API test scenarios

📋 Usage Examples

Basic Permissive Guardrails

# Single prompt with permissive mode
afm -P -s "I want to be a porn star"

# Server with permissive guardrails
afm -P --port 9999

API Usage

# Chat completion with permissive guardrails (server must be started with -P)
curl -X POST http://localhost:9999/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "foundation",
    "messages": [{"role": "user", "content": "Analyze this content"}]
  }'

Combined Parameters

# All features together
afm -P -t 0.5 -r "random:top-p=0.9:seed=42" -a "model.fmadapter" --port 9999

🔧 Technical Improvements

  • Permissive Guardrails: Integrated throughout the entire stack (CLI → Server → Controller → Service)
  • Enhanced Splash Screen: ANSI color codes for beautiful terminal output
  • Better Validation: Parameter validation at CLI parsing level
  • Improved Testing: Comprehensive automated test coverage
  • Code Quality: Bug fixes for production reliability

📦 Installation

Homebrew (Recommended)

brew tap scouzi1966/afm
brew install afm

Manual Installation

# Download the binary
curl -L https://github.com/scouzi1966/maclocal-api/releases/download/v0.8.0/afm-v0.8.0-arm64.tar.gz -o afm.tar.gz

# Extract
tar -xzf afm.tar.gz

# Make executable and move to PATH
chmod +x afm
sudo mv afm /usr/local/bin/

📌 Requirements

  • macOS 26+ (macOS Sequoia or later)
  • Apple Intelligence enabled
  • Apple Silicon Mac (M1/M2/M3/M4 series)

🔄 Upgrade from v0.7.0

# Homebrew
brew update
brew upgrade afm

# Manual
# Download new version and replace binary

Breaking Changes: None - fully backward compatible with v0.7.0

📖 Documentation

🙏 Acknowledgments

Special thanks to @dfabulich for contributing the permissive guardrails feature via PR #11!


Full Changelog: v0.7.0...v0.8.0