Skip to content

Releases: scouzi1966/maclocal-api

afm 0.9.8

29 Mar 00:53
392de4a

Choose a tag to compare

afm 0.9.8

Apple Foundation Models + MLX local models — OpenAI-compatible API, WebUI, all Swift.

Changes since v0.9.7

Batch dispatch & concurrency:

  • OpenAI-compatible /v1/batches and /v1/files endpoints
  • SSE multiplex endpoint /v1/batch/completions
  • Batched prefill for concurrent requests (B=N single forward pass)
  • Auto-promotion/teardown lifecycle for batch mode
  • Multi-slot reservation, cancellation, and active count

GPU profiling:

  • --gpu-profile, --gpu-trace, --gpu-capture, --gpu-profile-bw modes
  • Per-request X-AFM-Profile API
  • Native IOReport GPU power monitoring (no mactop dependency)
  • Auto-detect shader-enabled Instruments template

Tool calling improvements:

  • Mistral [ARGS] fallback parser, gemma3_text format
  • Tool call parsing improvements for ToolCall-15 benchmark
  • Batched tool call runtime and constrained tooling parity

Stability & performance:

  • Release original weights after fusion to save 11 GB GPU memory
  • Fix relocated binary crash (pip install / Homebrew)
  • Fix SSM Metal kernel group index for batch_size > 1
  • Grammar-constrained decoding via OpenAI strict: true
  • RadixTreeCache always created on model load
  • truncateToOffset() for efficient KV cache management

Install / Upgrade via Homebrew

Fresh install:

brew tap scouzi1966/afm
brew install scouzi1966/afm/afm

Upgrade:

brew upgrade afm

Install via PyPI

pip install macafm==0.9.8

afm-next (20260327 · 62395ab)

27 Mar 18:58
392de4a

Choose a tag to compare

Pre-release

Nightly build from main branch.

  • Commit: 62395ab
  • Date: 20260327
  • Version: 0.9.8-next.62395ab.20260327

This is an unstable development build. For the latest stable release, use brew install scouzi1966/afm/afm.

Changes since last build (a0371cc)

  • Add promptfoo agentic evals to build and test skills (62395ab)
  • fix: SSM Metal kernel group index for batch_size > 1 (c3ad99a)
  • fix: Mistral [ARGS] fallback parser, gemma3_text format, tool parser logging (f6bcfe3)
  • fix: gemma3_text tool format, tool name resolution, controller fallback (d5d5acf)
  • fix: tool call fallback parsing in batch scheduler non-streaming path (ab71925)
  • fix: CacheList serial generation, zero-width KV, bare XML function fallback (64811b6)
  • fix: SSM segsum mask broadcast crash in batch mode for hybrid models (b63ab84)
  • fix: tool call parsing improvements for ToolCall-15 benchmark integration (a6cd3b8)
  • feat: batched prefill for concurrent requests (#63) (0007f91)
  • test: add unit tests for PR review fixes (e5f2d6d)
  • fix: address PR review — cancel, race condition, JSONL validation (687197c)
  • fix: batch all non-multimodal requests + guard path existence check (96abf34)
  • feat: batched prefill for concurrent requests (B=N single forward pass) (c4fa145)
  • docs: update test inventory and skill with post-processing parity tests (7c6fc96)
  • test(batch): add unit tests for post-processing parity (b3462f5)
  • fix(batch): add post-processing parity to batch controllers (ffbd692)
  • fix: add batch protocol stubs to FakeMLXChatService test helper (83c44f3)
  • feat(batch): add Section 15 batch dispatch tests to assertion harness (5fb4349)
  • feat(batch): register batch API and SSE multiplex routes (8b6c53f)
  • feat(batch): add SSE multiplex endpoint /v1/batch/completions (d13d78d)
  • feat(batch): add OpenAI-compatible /v1/batches and /v1/files endpoints (1b90d0b)
  • fix: route non-streaming requests through BatchScheduler when concurrent mode active (365f57b)
  • feat(batch): add auto-promotion/teardown lifecycle for batch mode (701c39e)
  • feat(batch): add BatchStore actor for in-memory file and batch state (2dd672e)
  • feat(batch): add multi-slot reservation, cancellation, and active count to BatchScheduler (e8a89b1)
  • feat(batch): add request and response types for batch dispatch API (cb6df75)
  • Add batch dispatch implementation plan (2fc9904)
  • Address spec review findings: race conditions, slot reservation, cancellation (97ff926)
  • Add batch dispatch API design spec (3503f16)
  • fix: release original weights after fusion to save 11 GB GPU memory (19af692)
  • cleanup: remove LLMModel.swift from patch system (no changes from upstream) (e20f92e)
  • revert: remove ineffective Memory.clearCache() and sync eval changes (8d189c8)
  • fix: add Memory.clearCache() after prefill, add LLMModel.swift to patch system (38f7c31)
  • fix: skip vision_tower weights when loading VLM safetensors as LLM (db64fd9)
  • feat: add peak_memory_gib to usage, reset per request, fix serial prefix caching (7632199)
  • Wire OpenAI strict: true to grammar-constrained decoding (v2) (7aed2ef)
  • Update benchmark config for high-concurrency testing (B=180) (5f08f1f)
  • feat: X-AFM-Profile API for per-request GPU profiling (#60) (e0e9bb6)
  • Merge pull request #58 from scouzi1966/codex/feature/codex-batched-tooling (4410ad9)
  • Fix batched tooling review issues (98868c3)
  • Fix graceful shutdown crash and improve benchmark harness (8805fb0)
  • feat: native IOReport GPU power monitoring (no mactop dependency) (92dab29)
  • Add --afm-only mode, --afm CLI flag, and smart graph titles to benchmark harness (4b696ef)
  • Add scheduler-native constrained tooling parity (0cdf0c7)
  • docs: add GPU shader profiling to test-macafm skill and test inventory (6dafee0)
  • feat: add gpu-profile-report.py harness and command line in profile output (ded3029)
  • docs: update CLAUDE.md GPU profiling section with measured results and shader template (4c9a950)
  • feat: auto-detect shader-enabled Instruments template for per-kernel GPU profiling (c656e10)
  • feat: add GPU shader profiling tools (--gpu-profile, --gpu-trace, --gpu-capture, --gpu-profile-bw) (c787f77)
  • Add batched tool call runtime (30b64ba)
  • Merge pull request #57 from scouzi1966/codex/feature/codex-promptfoo-suite (8682af7)
  • Remove promptfoo report artifacts from PR (498c63f)
  • Add Sourcery PR review workflow (b7c9bf0)
  • Add batched tooling feasibility note (77396d2)
  • Add paged attention feasibility note (59d52e9)
  • Update promptfoo output defaults (ce9034a)
  • Add primary-source agent framework suites (f5b0132)
  • Add promptfoo agentic reports for codex next 0.9.8 (8f0dcc2)
  • Fix promptfoo suite runner exit handling (7783783)
  • Add Promptfoo agentic eval suite (cd77762)
  • Document exact replay investigation (30e33f0)
  • Update cache validation logs (b583c07)
  • Add cache save logs (4c6a9d8)
  • Add cache replay diagnostics (6c06167)
  • Fix cache profiling export (0314c52)
  • Update nightly release link to 20260320-a0371cc (0b25850)

Install / Upgrade

Homebrew

brew tap scouzi1966/afm
brew install scouzi1966/afm/afm-next    # fresh install
brew upgrade afm-next                    # upgrade existing
brew reinstall afm-next                  # force reinstall (same version, new build)

pip

pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-next

Switching between stable and nightly

# Homebrew
brew unlink afm && brew install scouzi1966/afm/afm-next   # switch to nightly
brew unlink afm-next && brew link afm                      # switch back to stable

# pip
pip install macafm          # stable
pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-next   # nightly

afm-next (20260320 · a0371cc)

20 Mar 01:52

Choose a tag to compare

Pre-release

Nightly build from main branch.

  • Commit: a0371cc
  • Date: 20260320
  • Version: 0.9.8-next.a0371cc.20260320

This is an unstable development build. For the latest stable release, use brew install scouzi1966/afm/afm.

Changes since last build (072340b)

  • Update nightly release link to 20260320-90693a7 (a0371cc)
  • Bump version to v0.9.8 (90693a7)
  • test: add comprehensive test suite, roadmap docs, and test reports (8e3148f)
  • test: add performance baseline comparison tests (7b4907c)
  • perf: replace state=state round-trip with truncateToOffset() (8526aa8)
  • feat: always create RadixTreeCache on model load (b55e458)
  • feat: add truncateToOffset() to BaseKVCache and KVCacheSimple (76a25d0)
  • Add AFM vs mlx-lm concurrency benchmark script and results (a80484a)
  • Update README to reflect MLX LLM terminology (d73620a)
  • Note that stable and nightly are currently at the same level (8b0d70a)
  • Release v0.9.7: promote nightly to stable (107ccbd)
  • Add mandatory clean-slate install testing to promote skill (414c144)
  • Make WebUI mandatory in all release skills (0d1889d)
  • Test no-reply attribution (718c918)
  • Add repo-local Codex skills (6e50f55)
  • Bump nightly wheel version to 0.9.7.dev20260316 (0b7e5ff)
  • Update nightly release link to 20260316-072340b (1d982a6)

Install / Upgrade

Homebrew

brew tap scouzi1966/afm
brew install scouzi1966/afm/afm-next    # fresh install
brew upgrade afm-next                    # upgrade existing
brew reinstall afm-next                  # force reinstall (same version, new build)

pip

pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-next

Switching between stable and nightly

# Homebrew
brew unlink afm && brew install scouzi1966/afm/afm-next   # switch to nightly
brew unlink afm-next && brew link afm                      # switch back to stable

# pip
pip install macafm          # stable
pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-next   # nightly

afm 0.9.7

17 Mar 12:16

Choose a tag to compare

afm 0.9.7

Apple Foundation Models + MLX local models — OpenAI-compatible API, WebUI, all Swift.

Highlights

  • Concurrent batch decoding — pipelined batch decode with round-robin interleaving and shared prefix cache (--concurrent N)
  • Telegram bridge — remote chat via Telegram bot (replaces iMessage bridge)
  • XGrammar structured output — native C++ grammar constraints for tool calls (EBNF-first, enabled by default)
  • Radix tree prefix cache — multi-slot prefix caching replaces single-slot PromptCacheBox
  • --help-json — AI capability cards for tool-using agents (model/feature discovery)
  • New model support — Nemotron H latent MoE, Qwen3.5 MoE/dense, GLM4/5 MoE

Bug Fixes

  • Fix XML tool call params serialized as strings instead of arrays/objects (#36, #37)
  • Fix qwen3_5 dense model auto-detection for Qwen3.5-9B
  • Fix qwen3_5_moe tool call format detection for Qwen3.5-35B-A3B
  • Fix VLM prefix cache crash: reshape suffix tokens and fix hybrid cache offset (#41)
  • Fix SmallVector crash on sequential MLX requests
  • Fix prefix cache broadcast_shapes crash (#47)
  • Fix 503 rejection: move capacity check from middleware to controller
  • Fix streaming tool call arg leak, grammar reset, and hybrid XML parser
  • Fix WebUI path resolution for external launches

Testing & Quality

  • Multi-model assertion test runner with XML tool call deep validation (Section 11)
  • Grammar constraint tests (Section 13) and unit test tier
  • Pipelined batch decode benchmarks
  • 7 flaky assertion test fixes

Install / Upgrade

Homebrew:

brew tap scouzi1966/afm
brew install scouzi1966/afm/afm
# or upgrade:
brew upgrade afm

PyPI:

pip install macafm==0.9.7

afm-next (20260316 · 072340b)

16 Mar 13:28

Choose a tag to compare

Pre-release

Nightly build from main branch.

  • Commit: 072340b
  • Date: 20260316
  • Version: 0.9.7-next.072340b.20260316

This is an unstable development build. For the latest stable release, use brew install scouzi1966/afm/afm.

Changes since last build (a49c207)

  • Fix sampling params test: replace log file check with API validation (072340b)
  • Fix test scripts: handle empty-choices usage chunk and thinking models (4f7e3c1)
  • Cancel in-flight Telegram requests on reset (6522279)
  • Improve Telegram empty-response diagnostics (731e469)
  • Address Telegram PR review feedback (9711017)
  • Harden Telegram state storage (4867f86)
  • Make Telegram the sole remote bridge (ab4d199)
  • Add experimental iMessage bridge (e2c8184)
  • Fix review: pass logprobs in empty-text stop-sequence chunk (7b030ca)
  • Fix 7 flaky assertion tests: stoppedBySequence signal + test robustness (2cb4848)
  • Merge pull request #50 from scouzi1966/feature/mlx-concurrent-batch (151bc0d)
  • Merge pull request #49 from scouzi1966/feature/codex-optimize-api (e0e2390)
  • Fix 503 rejection: move capacity check from broken middleware to controller (ecb2377)
  • Disable thinking for guided JSON (95a964f)
  • Fix --concurrent help text: remove misleading "default 4" (eaadfab)
  • Validate guided JSON before MLX startup (c3e2958)
  • Address review feedback on evals and CLI output (437ba48)
  • Add --concurrent N safeguards: max concurrency limit, 503 rejection, serial fallback (49ef55b)
  • Fix WebUI path resolution for external launches (a6b2548)
  • Support Nemotron H latent MoE variant (3469eb5)
  • Improve API compatibility evals and finish reasons (4b49ef6)
  • checkpoint: pre-deferred-batch-promotion (33e42dc)
  • Update benchmark results with pipelined decode numbers (a25b5b4)
  • Pipelined batch decode: dispatch previous step's tokens while computing next (d070fad)
  • Optimize batch decode: lazy eval + reduced actor yield (27e2df3)
  • Phase 2: dense batched decoding with BatchKVCacheSimple (20fb2e1)
  • Phase 1: concurrent generation with round-robin interleaving and shared prefix cache (9b5fd1b)
  • Add xgrammar v0.1.32 constexpr linker fix to patch system (239f369)
  • Add repository contributor guide (8b0ddff)
  • Add roadmap: incremental delta.tool_calls argument streaming (80e84e6)
  • Add pip install method to release notes and nightly publish skill (f25a2f5)
  • Add nightly wheel distribution via pip from kruks.ai (ae833d2)
  • Add changelog filtering and README update step to nightly publish skill (aef5232)
  • Update nightly release link to 20260312-a49c207 (44ca769)

Install / Upgrade

Homebrew

brew tap scouzi1966/afm
brew install scouzi1966/afm/afm-next    # fresh install
brew upgrade afm-next                    # upgrade existing
brew reinstall afm-next                  # force reinstall (same version, new build)

pip

pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-next

Switching between stable and nightly

# Homebrew
brew unlink afm && brew install scouzi1966/afm/afm-next   # switch to nightly
brew unlink afm-next && brew link afm                      # switch back to stable

# pip
pip install macafm          # stable
pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-next   # nightly

afm-next (20260312 · a49c207)

12 Mar 13:59

Choose a tag to compare

Pre-release

Nightly build from main branch.

  • Commit: a49c207
  • Date: 20260312
  • Version: 0.9.7-next.a49c207.20260312

This is an unstable development build. For the latest stable release, use brew install scouzi1966/afm/afm.

Changes since last build (61ba012)

  • Fix prefix cache save path: add state round-trip after trim (#47) (a49c207)
  • Restore JSON object/array parsing in XML params, accept object arguments in multi-turn (dee050b)
  • Fix streaming tool call arg leak, grammar reset, and hybrid XML parser (912ee29)
  • Add SSE-level tool call logging, fix JSON-as-object bug, add max_context_length to /v1/models (a8cd4ef)
  • Fix prefix cache broadcast_shapes crash (#47), add cache HIT/MISS logging, fix log formatting (31a55af)
  • Add grammar constraints visibility, prefix cache fixes, toolcall matrix testing, and realworld workload generator (b144a4a)
  • Add changelog baseline selection to nightly publish skill (c036fe0)
  • Add post-build verification, true clean build, and test/fix/rebuild loop to nightly publish skill (cc5807e)
  • Add unit test tier, grammar constraint tests (Section 13), and test index/coverage badges (3d71b40)
  • Make grammar constraints opt-in, add decodeJSONEscapes for model pre-escaping (ac340b3)
  • Add --vv trace logging, EBNF named required params, fix AnyCodableValue cast bug (9d89e30)
  • Fix OpenCode log flags: quote DEBUG in --log-level "DEBUG" --print-logs (6efa346)
  • Add OpenCode log location docs and XML entity decoding regression test (057255d)
  • Skip NSXMLParser for tool call parsing — use regex-only to fix bare < and & in code content (8848584)
  • Clean up log formatting: remove blank lines, compact log handler, reorder tool-call-parser help (b26a96e)
  • Add grammar diagnostic logging, coercion logging, and XMLParser body preview (1b762ca)
  • Dynamic think tags, EBNF-first grammar, incremental type coercion (37a42f7)
  • Add fuzzy tool name correction for hallucinated names in fallback parsers (563a098)
  • Add xgrammar StructuralTag constraint + vLLM-style reasoner gating for tool calls (42578ff)
  • Fix xgrammar stop-token warning, array/object coercion, add type coercion tests (ba6bd14)
  • Change default prefill step size to 1024, add multi-turn benchmark (edf6935)
  • Wire --enable-prefix-caching flag, add prefix cache benchmark script (ce5bed9)
  • Enable xgrammar tool constraint by default, gate with DISABLE flag (76f71d1)
  • Gate xgrammar tool constraint behind compile flag ENABLE_XGRAMMAR_TOOL_CONSTRAINT (7412265)
  • Rename llamacpp_tool_parser to afm_adaptive_xml (b7baada)
  • Add llamacpp_tool_parser: JSON-in-XML fallback, type coercion, tool_choice=none (8e77b69)
  • cleanup: remove XGrammar Python subprocess bridge (d624f50)
  • feat: wire XGrammarService into generation pipeline (66eb8c4)
  • feat: add XGrammarService with native C++ grammar matching (051ff54)
  • feat: add CXGrammar SPM target with C wrapper around xgrammar C++ (e5d1d4b)
  • vendor: add xgrammar C++ library as submodule (v0.1.17) (e180429)
  • docs: add XGrammar C++ interop implementation plan (ca21613)
  • docs: add XGrammar C++ interop design (d33cd3a)
  • Add inference optimizations testing design document (6b962b2)
  • fix: address code review findings (85e4248)
  • feat: add RequestScheduler for fair request scheduling (5238e9b)
  • test: add KV cache eviction test suite (6fc67dc)
  • feat: add --kv-eviction streaming for StreamingLLM-style context handling (f571a61)
  • test: add json_schema constrained decoding test (52d70ad)
  • feat: add Swift XGrammarBridge client for subprocess communication (be6cab5)
  • feat: add XGrammar Python bridge for structured output (88d70d8)
  • fix: allow radix cache hits on partial edge matches (d716c7c)
  • test: add prefix cache multi-hit test to assertions suite (731c667)
  • feat: replace single-slot PromptCacheBox with RadixTreeCache (c3df386)
  • feat: add RadixTreeCache data structure for multi-slot prefix caching (5bdd6fd)
  • Add detailed implementation plan for inference optimizations (5d3c11d)
  • Add configuration & flags section to optimization design (aa8e78d)
  • Replace custom FSM with XGrammar for structured output (02c518e)
  • Remove speculative decoding from optimization plan (93277f2)
  • Add inference optimizations design document (7af6f00)
  • Fix VLM prefix cache crash: reshape suffix tokens to [1,N] and fix hybrid cache offset (#41) (2269dde)
  • Fix: Resolve SmallVector crash on sequential MLX requests (004e33c)
  • Update README: nightly v0.9.7-next release notes and link (5e03179)

Install / Upgrade

Homebrew

brew tap scouzi1966/afm
brew install scouzi1966/afm/afm-next    # fresh install
brew upgrade afm-next                    # upgrade existing
brew reinstall afm-next                  # force reinstall (same version, new build)

pip

pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-next

Switching between stable and nightly

# Homebrew
brew unlink afm && brew install scouzi1966/afm/afm-next   # switch to nightly
brew unlink afm-next && brew link afm                      # switch back to stable

# pip
pip install macafm          # stable
pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-next   # nightly

brew tap scouzi1966/afm
brew install scouzi1966/afm/afm-next


**Upgrade** (already installed):

brew upgrade afm-next


**If you have stable `afm` installed**, unlink it first:

brew unlink afm
brew install scouzi1966/afm/afm-next


**Switch back to stable**:

brew unlink afm-next
brew link afm


**Force reinstall** (same version, new build):

brew reinstall afm-next

afm-next (20260307 · 61ba012)

07 Mar 01:34

Choose a tag to compare

Pre-release

Nightly build from main branch.

  • Commit: 61ba012
  • Date: 20260307
  • Version: 0.9.7-next.61ba012.20260307

This is an unstable development build. For the latest stable release, use brew install scouzi1966/afm/afm.

Changes since last build (9e978c5)

  • Add nightly test reports for 2026-03-06: multi-model assertions + comprehensive suite (61ba012)
  • Add XML tool call deep validation, fix qwen3_5 dense auto-detection, multi-model test runner (c85b53f)
  • Fix --models discovery parsing after list-models.sh size column addition (c83d238)
  • Increase TIMEOUT_LOAD from 6min to 15min in mlx-model-test.sh (5714b9f)
  • v0.9.7: Add --help-json AI capability cards, fix model picker, add PR regression tests (4b07fb3)
  • Add missing nightly report mlx-model-report-20260306_134309.html (6a69ff9)
  • Fix qwen3_5_moe tool call detection, update test suite and skill, add nightly reports (12f75ec)
  • Fix XML tool call params serialized as strings instead of arrays/objects (closes #36) (#37) (d4132df)
  • Update README with package deployment note (94466e7)
  • Reset "What's new in afm-next" after v0.9.6 stable release (58152b1)
  • Update promote skill: build from main HEAD or nightly, add smoke tests (312b979)
  • Release v0.9.6: update README versions, add smoke tests to promote skill (07c905a)
  • Add rollback procedure to promote-nightly skill (3013567)
  • Add safeguard: preserve nightly release when promoting to stable (1e991ca)
  • Add afm-build-promote-nightly skill for promoting nightly to stable (f6d9aaf)

Install / Upgrade via Homebrew

Fresh install (first time):

brew tap scouzi1966/afm
brew install scouzi1966/afm/afm-next

Upgrade (already installed):

brew upgrade afm-next

If you have stable afm installed, unlink it first:

brew unlink afm
brew install scouzi1966/afm/afm-next

Switch back to stable:

brew unlink afm-next
brew link afm

Force reinstall (same version, new build):

brew reinstall afm-next

afm 0.9.6

05 Mar 01:32

Choose a tag to compare

afm 0.9.6

Apple Foundation Models + MLX local models — OpenAI-compatible API, WebUI, all Swift.

Changes since v0.9.5

  • Read nightly version from BuildInfo.swift instead of hardcoding (9e978c5)
  • Add real MLX performance stats to API responses and console logging (20f175b)
  • Bump version to 0.9.6 (495363f)
  • Fix broken PyPI package: add missing cli.py, stage assets in publish script (8c95fcd)
  • Merge pull request #35 from scouzi1966/fix/chat-template-kwargs-issue-34 (9e6a073)
  • Update test-macafm skill with Kwargs section and checklist items (af4c10c)
  • Address code review: fail on invalid --default-chat-template-kwargs JSON (c59af14)
  • Support chat_template_kwargs API parameter (fixes #34) (acc7b61)
  • Update README with local experimentation instructions (809d3a0)
  • Move legacy scripts and release artifacts to archive/ (d73074b)
  • Bump version to 0.9.5, add publish-stable script (5e57324)

Install / Upgrade via Homebrew

Fresh install:

brew tap scouzi1966/afm
brew install scouzi1966/afm/afm

Upgrade:

brew upgrade afm

Install via PyPI

pip install macafm==0.9.6

afm-next (20260304 · 9e978c5)

04 Mar 00:50

Choose a tag to compare

Pre-release

Nightly build from main branch.

  • Commit: 9e978c5
  • Date: 20260304
  • Version: 0.9.6-next.9e978c5.20260304

This is an unstable development build. For the latest stable release, use brew install scouzi1966/afm/afm.

Changes since last build (410d7e5)

  • Read nightly version from BuildInfo.swift instead of hardcoding (9e978c5)
  • Add real MLX performance stats to API responses and console logging (20f175b)
  • Bump version to 0.9.6 (495363f)
  • Fix broken PyPI package: add missing cli.py, stage assets in publish script (8c95fcd)
  • Merge pull request #35 from scouzi1966/fix/chat-template-kwargs-issue-34 (9e6a073)
  • Update test-macafm skill with Kwargs section and checklist items (af4c10c)
  • Address code review: fail on invalid --default-chat-template-kwargs JSON (c59af14)
  • Support chat_template_kwargs API parameter (fixes #34) (acc7b61)
  • Update README with local experimentation instructions (809d3a0)
  • Move legacy scripts and release artifacts to archive/ (d73074b)
  • Bump version to 0.9.5, add publish-stable script (5e57324)
  • Enhance README with Vibe coding details (9d1cc38)

Install / Upgrade via Homebrew

Fresh install (first time):

brew tap scouzi1966/afm
brew install scouzi1966/afm/afm-next

Upgrade (already installed):

brew upgrade afm-next

If you have stable afm installed, unlink it first:

brew unlink afm
brew install scouzi1966/afm/afm-next

Switch back to stable:

brew unlink afm-next
brew link afm

Force reinstall (same version, new build):

brew reinstall afm-next

afm 0.9.5

03 Mar 03:09
9d1cc38

Choose a tag to compare

afm 0.9.5

Apple Foundation Models + MLX local models — OpenAI-compatible API, WebUI, all Swift.

Changes since v0.9.4

  • Auto-clone homebrew-afm tap repo if missing during nightly publish (410d7e5)
  • Add ownership guard to build-afm-nightly-publish skill (1b71d95)
  • Add build-afm-nightly-publish skill (bc777d6)
  • Fix Jinja crash on nullable tool schemas (closes #32) (f4c80cc)
  • Simplify build-afm Step 3 to report only binary path and version (d96c786)
  • Add fork-first instruction to vibe coding callout (7f58591)
  • Add vibe coding callout to README for non-Swift developers (75af8fc)
  • Reorder build-afm prerequisites by dependency chain (2d498f1)
  • Add prerequisite validation to build-afm skill (1c27148)
  • Add skills, test reports, qwen3_5 registry alias, and bench tooling (50ed40f)
  • Add script paths and commands to report index page (038ba82)
  • Show invocation command in report header for reproducibility (c7cdfd3)
  • Fix index.html links to use htmlpreview for HTML reports (7d8187c)
  • Remove GitHub Actions Pages workflow (Actions disabled on repo) (7b4fff1)
  • Add GitHub Actions workflow for Pages deployment (f63791b)
  • Add GitHub Pages index for test reports (2e3f7bf)
  • Add MLX patch comparison report (3-ref with upstream-only detection) (7d38bf9)
  • Add YAML frontmatter to CLI help for AI agent discovery (9f755d8)
  • Add structured help with YAML frontmatter for AI agent discovery (a31ccd5)
  • Add repeatable MLX patch comparison report generator (8bbec6f)
  • Add cached_tokens to usage response, assertion test suite, and test-macafm skill (037c657)
  • Reorder test report: description above link (f9cff88)
  • Move test report link below title, add description with judge methodology (b8f7728)
  • Add afm-next nightly test report link for Qwen3.5-35B-A3B (90dfc8d)
  • Test harness: template mode, smart scoring fixes, report improvements (aefe39a)
  • Merge pull request #28 from scouzi1966/feature/optimise-metal (4d58e3f)
  • Fix review: QKV mode check, QuantizedKVCache mode, dead perf code (2ca6495)
  • Fix GLM-5 OOM + gemma-3 crash + MoE argPartition optimization (5b8e88c)
  • Auto-detect VLM models + fix model discovery for HF cache dirs (02a7008)
  • Perf: Metal kernel fusions + graph optimizations (107.6→130 tok/s, +21%) (9e9fcf3)
  • Perf: beat Python mlx-lm throughput (95.7→107.6 tok/s, +12%) (b11fa4a)
  • Add assumption for previous afm installation (afb62d6)
  • Update model reference in README.md (eb17f6c)
  • Update README with new features in nightly build (3f4e859)
  • Update README with new features for nightly build (d9c804b)
  • Update README with new API features and parameters (1da2ad3)
  • Fix Metal kernel fallback and temp media file cleanup (d84ab02)
  • Add --media flag for VLM single-prompt mode + base64 data URL support (a834638)
  • Qwen3.5 perf: add VLM Metal kernel + default to LLM loading (42→95 tok/s) (ec88798)
  • Add test scripts, reports, and benchmark strict=False fix (cdceabf)
  • Fix afm args quoting with shlex.split, add benchmark script (7905be4)
  • Fix afm: args quoting in test harness (read -ra → eval) (0ddb545)
  • Clean up README: consolidate install sections, add stable/nightly table (c78e5d5)
  • Exclude test report HTML/JSONL from GitHub language stats (a4c9c7d)
  • Fix CLI --stop not passed to Server in MlxCommand (f3607af)
  • Fix stop sequences in thinking models, add CLI --stop flag, fix JSON schema injection (050e836)
  • Fix claude nested session issue and regenerate report with both AI analyses (dd419d8)
  • Add Qwen3.5-35B-A3B-4bit test suite and report (129/132 passed) (b93db37)
  • Move stable install instructions below latest release link (747a67c)
  • Update afm-next heading wording (385e738)
  • Change afm-next heading to 'Available NOW' (697abf0)
  • Update README with Qwen3.5-35B-A3B support and afm-next install instructions (50fcd13)
  • Download *.jinja files and fix missing chat_template fallback (0cfba17)
  • Gate verbose colored logs behind --very-verbose flag (7514638)
  • Merge pull request #26 from alantmiller/fix/vision-async-dispatch (199cded)
  • Use 'Changes since last build (SHA)' format in release notes (7f4b7e7)
  • Show 'changes since' commit SHA in nightly release notes (62a2a1f)
  • Add --since flag to publish-next.sh for changelog control (5a44659)
  • Expand install/upgrade instructions in nightly release notes (2913362)
  • Include commit SHA in brew version string (969ca10)
  • Add commit field and fullVersion to BuildInfo.swift (586bbea)
  • Preserve nightly release history with unique tags (745e5cd)
  • Add git commit SHA to --version output (e.g. v0.9.5-abc1234) (0e5a8b6)
  • Add changelog and install instructions to local publish script (493c94f)
  • Fix MXFP4 quantization crash, token counting, gemma3n routing, and test harness (d79a1d3)
  • Update README with nightly build installation instructions (f1d2813)
  • fix: vision subcommand dispatches async run() correctly (04566f0)
  • Fix bare JSON tool call detection and add ToolCallFormat.swift patch (f6efa24)
  • Change nightly build to manual trigger only (af238ab)
  • Add nightly build workflow for afm-next (8892329)
  • Merge origin/main into feature/mlx-prompt-caching (146030f)
  • Add tool call parser test results (26/26 pass) (694abbb)
  • Fix review findings: zero-arg JSON, prefix caching default, fallback tag detection (1ee1ede)
  • Add hermes, llama3_json, gemma, and mistral tool call parsers (341f35d)
  • Merge pull request #24 from scouzi1966/feature/structured-outputs (a7b3b38)
  • Address PR review: nullable types, null rejection, guided streaming deltas (e25d3ef)
  • Add structured outputs, --guided-json CLI flag, and comprehensive test suite (3b30fa0)
  • Add incremental streaming tool call arguments and fix parameter name mapping (9b27cac)
  • Update README for v0.9.5 features (519d35f)
  • Update README with new features and MLX support (fe7bd74)
  • Add token-level streaming tool call detection and update CLAUDE.md (ec46e88)
  • Add tool calling, stop sequences, response_format, and real token counts (71e2c68)
  • Save test reports to test-reports/ with JSONL data and add Kimi brief prompt (bee6a72)
  • Add logprobs support, --max-logprobs switch, and dynamic system_fingerprint (b37fdff)
  • Bump version to v0.9.5 and add sampling params test report (4abbdee)
  • Add top_k, min_p, presence_penalty, and seed sampling parameters (ce69ba0)
  • Checkpoint: OpenClaw config, verbose logging, max_completion_tokens, and streaming improvements (b69b9a5)
  • Revise README for clarity on v0.9.4 features (c66764e)
  • Add Qwen3.5-MoE VLM support, reasoning extraction, --raw flag, and stream cancellation (6cf821c)
  • Checkpoint: pre Qwen3.5-397B-A17B-4bit reclassify (8f35675)
  • Update OpenCode usage instructions in README (c72a023)
  • Revise installation methods in README (73c1640)
  • Swap OpenCode setup steps: configure first, then start afm (694f456)
  • Add detailed OpenCode /connect instructions to README (b15801b)
  • Add OpenCode integration guide to README (90af069)
  • Wire all MLX CLI params to server mode, enhance generation logging (1bc007f)
  • Revise installation command formatting in README (38c25d2)
  • Update README with model repo environment variable (51ef4b4)
  • Update README.md (5bc112f)
  • Revise README for v0.9.4 feature announcement (3d6c296)
  • Update README with feature listing and API access (d7b3bfa)
  • Revise README for MLX model support and commands (c89f191)
  • Add MLX excitement and quick install to README hero section (4f9b374)
  • Add MLX models screenshot to README (0d1bd07)
  • Add files via upload (f979ab4)
  • Update README with MLX local model support and new v0.9.4 features (9d86386)
  • Add regression test report: 61/61 passed (680c465)
  • Add MLX model test report: 27/28 passed, Kimi-K2.5 interrupted (fa84cf7)
  • Fix MLX metallib resolution for relocated binaries (5059f1b)

Install / Upgrade via Homebrew

Fresh install:

brew tap scouzi1966/afm
brew install scouzi1966/afm/afm

Upgrade:

brew upgrade afm

Install via PyPI

pip install macafm==0.9.5