Releases · scouzi1966/maclocal-api

29 Mar 00:53

scouzi1966

v0.9.8

392de4a

afm 0.9.8

Apple Foundation Models + MLX local models — OpenAI-compatible API, WebUI, all Swift.

Changes since v0.9.7

Batch dispatch & concurrency:

OpenAI-compatible /v1/batches and /v1/files endpoints
SSE multiplex endpoint /v1/batch/completions
Batched prefill for concurrent requests (B=N single forward pass)
Auto-promotion/teardown lifecycle for batch mode
Multi-slot reservation, cancellation, and active count

GPU profiling:

--gpu-profile, --gpu-trace, --gpu-capture, --gpu-profile-bw modes
Per-request X-AFM-Profile API
Native IOReport GPU power monitoring (no mactop dependency)
Auto-detect shader-enabled Instruments template

Tool calling improvements:

Mistral [ARGS] fallback parser, gemma3_text format
Tool call parsing improvements for ToolCall-15 benchmark
Batched tool call runtime and constrained tooling parity

Stability & performance:

Release original weights after fusion to save 11 GB GPU memory
Fix relocated binary crash (pip install / Homebrew)
Fix SSM Metal kernel group index for batch_size > 1
Grammar-constrained decoding via OpenAI strict: true
RadixTreeCache always created on model load
truncateToOffset() for efficient KV cache management

Install / Upgrade via Homebrew

Fresh install:

brew tap scouzi1966/afm
brew install scouzi1966/afm/afm

Upgrade:

brew upgrade afm

Install via PyPI

pip install macafm==0.9.8

Assets 3

27 Mar 18:58

scouzi1966

nightly-20260327-62395ab

392de4a

afm-next (20260327 · 62395ab) Pre-release

Pre-release

Nightly build from main branch.

Commit: 62395ab
Date: 20260327
Version: 0.9.8-next.62395ab.20260327

This is an unstable development build. For the latest stable release, use brew install scouzi1966/afm/afm.

Changes since last build (`a0371cc`)

Add promptfoo agentic evals to build and test skills (62395ab)
fix: SSM Metal kernel group index for batch_size > 1 (c3ad99a)
fix: Mistral [ARGS] fallback parser, gemma3_text format, tool parser logging (f6bcfe3)
fix: gemma3_text tool format, tool name resolution, controller fallback (d5d5acf)
fix: tool call fallback parsing in batch scheduler non-streaming path (ab71925)
fix: CacheList serial generation, zero-width KV, bare XML function fallback (64811b6)
fix: SSM segsum mask broadcast crash in batch mode for hybrid models (b63ab84)
fix: tool call parsing improvements for ToolCall-15 benchmark integration (a6cd3b8)
feat: batched prefill for concurrent requests (#63) (0007f91)
test: add unit tests for PR review fixes (e5f2d6d)
fix: address PR review — cancel, race condition, JSONL validation (687197c)
fix: batch all non-multimodal requests + guard path existence check (96abf34)
feat: batched prefill for concurrent requests (B=N single forward pass) (c4fa145)
docs: update test inventory and skill with post-processing parity tests (7c6fc96)
test(batch): add unit tests for post-processing parity (b3462f5)
fix(batch): add post-processing parity to batch controllers (ffbd692)
fix: add batch protocol stubs to FakeMLXChatService test helper (83c44f3)
feat(batch): add Section 15 batch dispatch tests to assertion harness (5fb4349)
feat(batch): register batch API and SSE multiplex routes (8b6c53f)
feat(batch): add SSE multiplex endpoint /v1/batch/completions (d13d78d)
feat(batch): add OpenAI-compatible /v1/batches and /v1/files endpoints (1b90d0b)
fix: route non-streaming requests through BatchScheduler when concurrent mode active (365f57b)
feat(batch): add auto-promotion/teardown lifecycle for batch mode (701c39e)
feat(batch): add BatchStore actor for in-memory file and batch state (2dd672e)
feat(batch): add multi-slot reservation, cancellation, and active count to BatchScheduler (e8a89b1)
feat(batch): add request and response types for batch dispatch API (cb6df75)
Add batch dispatch implementation plan (2fc9904)
Address spec review findings: race conditions, slot reservation, cancellation (97ff926)
Add batch dispatch API design spec (3503f16)
fix: release original weights after fusion to save 11 GB GPU memory (19af692)
cleanup: remove LLMModel.swift from patch system (no changes from upstream) (e20f92e)
revert: remove ineffective Memory.clearCache() and sync eval changes (8d189c8)
fix: add Memory.clearCache() after prefill, add LLMModel.swift to patch system (38f7c31)
fix: skip vision_tower weights when loading VLM safetensors as LLM (db64fd9)
feat: add peak_memory_gib to usage, reset per request, fix serial prefix caching (7632199)
Wire OpenAI strict: true to grammar-constrained decoding (v2) (7aed2ef)
Update benchmark config for high-concurrency testing (B=180) (5f08f1f)
feat: X-AFM-Profile API for per-request GPU profiling (#60) (e0e9bb6)
Merge pull request #58 from scouzi1966/codex/feature/codex-batched-tooling (4410ad9)
Fix batched tooling review issues (98868c3)
Fix graceful shutdown crash and improve benchmark harness (8805fb0)
feat: native IOReport GPU power monitoring (no mactop dependency) (92dab29)
Add --afm-only mode, --afm CLI flag, and smart graph titles to benchmark harness (4b696ef)
Add scheduler-native constrained tooling parity (0cdf0c7)
docs: add GPU shader profiling to test-macafm skill and test inventory (6dafee0)
feat: add gpu-profile-report.py harness and command line in profile output (ded3029)
docs: update CLAUDE.md GPU profiling section with measured results and shader template (4c9a950)
feat: auto-detect shader-enabled Instruments template for per-kernel GPU profiling (c656e10)
feat: add GPU shader profiling tools (--gpu-profile, --gpu-trace, --gpu-capture, --gpu-profile-bw) (c787f77)
Add batched tool call runtime (30b64ba)
Merge pull request #57 from scouzi1966/codex/feature/codex-promptfoo-suite (8682af7)
Remove promptfoo report artifacts from PR (498c63f)
Add Sourcery PR review workflow (b7c9bf0)
Add batched tooling feasibility note (77396d2)
Add paged attention feasibility note (59d52e9)
Update promptfoo output defaults (ce9034a)
Add primary-source agent framework suites (f5b0132)
Add promptfoo agentic reports for codex next 0.9.8 (8f0dcc2)
Fix promptfoo suite runner exit handling (7783783)
Add Promptfoo agentic eval suite (cd77762)
Document exact replay investigation (30e33f0)
Update cache validation logs (b583c07)
Add cache save logs (4c6a9d8)
Add cache replay diagnostics (6c06167)
Fix cache profiling export (0314c52)
Update nightly release link to 20260320-a0371cc (0b25850)

Install / Upgrade

Homebrew

brew tap scouzi1966/afm
brew install scouzi1966/afm/afm-next    # fresh install
brew upgrade afm-next                    # upgrade existing
brew reinstall afm-next                  # force reinstall (same version, new build)

pip

pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-next

Switching between stable and nightly

# Homebrew
brew unlink afm && brew install scouzi1966/afm/afm-next   # switch to nightly
brew unlink afm-next && brew link afm                      # switch back to stable

# pip
pip install macafm          # stable
pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-next   # nightly

Assets 4

20 Mar 01:52

scouzi1966

nightly-20260320-a0371cc

a0371cc

afm-next (20260320 · a0371cc) Pre-release

Pre-release

Nightly build from main branch.

Commit: a0371cc
Date: 20260320
Version: 0.9.8-next.a0371cc.20260320

This is an unstable development build. For the latest stable release, use brew install scouzi1966/afm/afm.

Changes since last build (`072340b`)

Update nightly release link to 20260320-90693a7 (a0371cc)
Bump version to v0.9.8 (90693a7)
test: add comprehensive test suite, roadmap docs, and test reports (8e3148f)
test: add performance baseline comparison tests (7b4907c)
perf: replace state=state round-trip with truncateToOffset() (8526aa8)
feat: always create RadixTreeCache on model load (b55e458)
feat: add truncateToOffset() to BaseKVCache and KVCacheSimple (76a25d0)
Add AFM vs mlx-lm concurrency benchmark script and results (a80484a)
Update README to reflect MLX LLM terminology (d73620a)
Note that stable and nightly are currently at the same level (8b0d70a)
Release v0.9.7: promote nightly to stable (107ccbd)
Add mandatory clean-slate install testing to promote skill (414c144)
Make WebUI mandatory in all release skills (0d1889d)
Test no-reply attribution (718c918)
Add repo-local Codex skills (6e50f55)
Bump nightly wheel version to 0.9.7.dev20260316 (0b7e5ff)
Update nightly release link to 20260316-072340b (1d982a6)

Install / Upgrade

Homebrew

brew tap scouzi1966/afm
brew install scouzi1966/afm/afm-next    # fresh install
brew upgrade afm-next                    # upgrade existing
brew reinstall afm-next                  # force reinstall (same version, new build)

pip

pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-next

Switching between stable and nightly

# Homebrew
brew unlink afm && brew install scouzi1966/afm/afm-next   # switch to nightly
brew unlink afm-next && brew link afm                      # switch back to stable

# pip
pip install macafm          # stable
pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-next   # nightly

Assets 4

17 Mar 12:16

scouzi1966

v0.9.7

072340b

afm 0.9.7

Apple Foundation Models + MLX local models — OpenAI-compatible API, WebUI, all Swift.

Highlights

Concurrent batch decoding — pipelined batch decode with round-robin interleaving and shared prefix cache (--concurrent N)
Telegram bridge — remote chat via Telegram bot (replaces iMessage bridge)
XGrammar structured output — native C++ grammar constraints for tool calls (EBNF-first, enabled by default)
Radix tree prefix cache — multi-slot prefix caching replaces single-slot PromptCacheBox
--help-json — AI capability cards for tool-using agents (model/feature discovery)
New model support — Nemotron H latent MoE, Qwen3.5 MoE/dense, GLM4/5 MoE

Bug Fixes

Fix XML tool call params serialized as strings instead of arrays/objects (#36, #37)
Fix qwen3_5 dense model auto-detection for Qwen3.5-9B
Fix qwen3_5_moe tool call format detection for Qwen3.5-35B-A3B
Fix VLM prefix cache crash: reshape suffix tokens and fix hybrid cache offset (#41)
Fix SmallVector crash on sequential MLX requests
Fix prefix cache broadcast_shapes crash (#47)
Fix 503 rejection: move capacity check from middleware to controller
Fix streaming tool call arg leak, grammar reset, and hybrid XML parser
Fix WebUI path resolution for external launches

Testing & Quality

Multi-model assertion test runner with XML tool call deep validation (Section 11)
Grammar constraint tests (Section 13) and unit test tier
Pipelined batch decode benchmarks
7 flaky assertion test fixes

Install / Upgrade

Homebrew:

brew tap scouzi1966/afm
brew install scouzi1966/afm/afm
# or upgrade:
brew upgrade afm

PyPI:

pip install macafm==0.9.7

Assets 3

16 Mar 13:28

scouzi1966

nightly-20260316-072340b

072340b

afm-next (20260316 · 072340b) Pre-release

Pre-release

Nightly build from main branch.

Commit: 072340b
Date: 20260316
Version: 0.9.7-next.072340b.20260316

This is an unstable development build. For the latest stable release, use brew install scouzi1966/afm/afm.

Changes since last build (`a49c207`)

Fix sampling params test: replace log file check with API validation (072340b)
Fix test scripts: handle empty-choices usage chunk and thinking models (4f7e3c1)
Cancel in-flight Telegram requests on reset (6522279)
Improve Telegram empty-response diagnostics (731e469)
Address Telegram PR review feedback (9711017)
Harden Telegram state storage (4867f86)
Make Telegram the sole remote bridge (ab4d199)
Add experimental iMessage bridge (e2c8184)
Fix review: pass logprobs in empty-text stop-sequence chunk (7b030ca)
Fix 7 flaky assertion tests: stoppedBySequence signal + test robustness (2cb4848)
Merge pull request #50 from scouzi1966/feature/mlx-concurrent-batch (151bc0d)
Merge pull request #49 from scouzi1966/feature/codex-optimize-api (e0e2390)
Fix 503 rejection: move capacity check from broken middleware to controller (ecb2377)
Disable thinking for guided JSON (95a964f)
Fix --concurrent help text: remove misleading "default 4" (eaadfab)
Validate guided JSON before MLX startup (c3e2958)
Address review feedback on evals and CLI output (437ba48)
Add --concurrent N safeguards: max concurrency limit, 503 rejection, serial fallback (49ef55b)
Fix WebUI path resolution for external launches (a6b2548)
Support Nemotron H latent MoE variant (3469eb5)
Improve API compatibility evals and finish reasons (4b49ef6)
checkpoint: pre-deferred-batch-promotion (33e42dc)
Update benchmark results with pipelined decode numbers (a25b5b4)
Pipelined batch decode: dispatch previous step's tokens while computing next (d070fad)
Optimize batch decode: lazy eval + reduced actor yield (27e2df3)
Phase 2: dense batched decoding with BatchKVCacheSimple (20fb2e1)
Phase 1: concurrent generation with round-robin interleaving and shared prefix cache (9b5fd1b)
Add xgrammar v0.1.32 constexpr linker fix to patch system (239f369)
Add repository contributor guide (8b0ddff)
Add roadmap: incremental delta.tool_calls argument streaming (80e84e6)
Add pip install method to release notes and nightly publish skill (f25a2f5)
Add nightly wheel distribution via pip from kruks.ai (ae833d2)
Add changelog filtering and README update step to nightly publish skill (aef5232)
Update nightly release link to 20260312-a49c207 (44ca769)

Install / Upgrade

Homebrew

brew tap scouzi1966/afm
brew install scouzi1966/afm/afm-next    # fresh install
brew upgrade afm-next                    # upgrade existing
brew reinstall afm-next                  # force reinstall (same version, new build)

pip

pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-next

Switching between stable and nightly

# Homebrew
brew unlink afm && brew install scouzi1966/afm/afm-next   # switch to nightly
brew unlink afm-next && brew link afm                      # switch back to stable

# pip
pip install macafm          # stable
pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-next   # nightly

Assets 4

12 Mar 13:59

scouzi1966

nightly-20260312-a49c207

a49c207

afm-next (20260312 · a49c207) Pre-release

Pre-release

Nightly build from main branch.

Commit: a49c207
Date: 20260312
Version: 0.9.7-next.a49c207.20260312

This is an unstable development build. For the latest stable release, use brew install scouzi1966/afm/afm.

Changes since last build (`61ba012`)

Fix prefix cache save path: add state round-trip after trim (#47) (a49c207)
Restore JSON object/array parsing in XML params, accept object arguments in multi-turn (dee050b)
Fix streaming tool call arg leak, grammar reset, and hybrid XML parser (912ee29)
Add SSE-level tool call logging, fix JSON-as-object bug, add max_context_length to /v1/models (a8cd4ef)
Fix prefix cache broadcast_shapes crash (#47), add cache HIT/MISS logging, fix log formatting (31a55af)
Add grammar constraints visibility, prefix cache fixes, toolcall matrix testing, and realworld workload generator (b144a4a)
Add changelog baseline selection to nightly publish skill (c036fe0)
Add post-build verification, true clean build, and test/fix/rebuild loop to nightly publish skill (cc5807e)
Add unit test tier, grammar constraint tests (Section 13), and test index/coverage badges (3d71b40)
Make grammar constraints opt-in, add decodeJSONEscapes for model pre-escaping (ac340b3)
Add --vv trace logging, EBNF named required params, fix AnyCodableValue cast bug (9d89e30)
Fix OpenCode log flags: quote DEBUG in --log-level "DEBUG" --print-logs (6efa346)
Add OpenCode log location docs and XML entity decoding regression test (057255d)
Skip NSXMLParser for tool call parsing — use regex-only to fix bare < and & in code content (8848584)
Clean up log formatting: remove blank lines, compact log handler, reorder tool-call-parser help (b26a96e)
Add grammar diagnostic logging, coercion logging, and XMLParser body preview (1b762ca)
Dynamic think tags, EBNF-first grammar, incremental type coercion (37a42f7)
Add fuzzy tool name correction for hallucinated names in fallback parsers (563a098)
Add xgrammar StructuralTag constraint + vLLM-style reasoner gating for tool calls (42578ff)
Fix xgrammar stop-token warning, array/object coercion, add type coercion tests (ba6bd14)
Change default prefill step size to 1024, add multi-turn benchmark (edf6935)
Wire --enable-prefix-caching flag, add prefix cache benchmark script (ce5bed9)
Enable xgrammar tool constraint by default, gate with DISABLE flag (76f71d1)
Gate xgrammar tool constraint behind compile flag ENABLE_XGRAMMAR_TOOL_CONSTRAINT (7412265)
Rename llamacpp_tool_parser to afm_adaptive_xml (b7baada)
Add llamacpp_tool_parser: JSON-in-XML fallback, type coercion, tool_choice=none (8e77b69)
cleanup: remove XGrammar Python subprocess bridge (d624f50)
feat: wire XGrammarService into generation pipeline (66eb8c4)
feat: add XGrammarService with native C++ grammar matching (051ff54)
feat: add CXGrammar SPM target with C wrapper around xgrammar C++ (e5d1d4b)
vendor: add xgrammar C++ library as submodule (v0.1.17) (e180429)
docs: add XGrammar C++ interop implementation plan (ca21613)
docs: add XGrammar C++ interop design (d33cd3a)
Add inference optimizations testing design document (6b962b2)
fix: address code review findings (85e4248)
feat: add RequestScheduler for fair request scheduling (5238e9b)
test: add KV cache eviction test suite (6fc67dc)
feat: add --kv-eviction streaming for StreamingLLM-style context handling (f571a61)
test: add json_schema constrained decoding test (52d70ad)
feat: add Swift XGrammarBridge client for subprocess communication (be6cab5)
feat: add XGrammar Python bridge for structured output (88d70d8)
fix: allow radix cache hits on partial edge matches (d716c7c)
test: add prefix cache multi-hit test to assertions suite (731c667)
feat: replace single-slot PromptCacheBox with RadixTreeCache (c3df386)
feat: add RadixTreeCache data structure for multi-slot prefix caching (5bdd6fd)
Add detailed implementation plan for inference optimizations (5d3c11d)
Add configuration & flags section to optimization design (aa8e78d)
Replace custom FSM with XGrammar for structured output (02c518e)
Remove speculative decoding from optimization plan (93277f2)
Add inference optimizations design document (7af6f00)
Fix VLM prefix cache crash: reshape suffix tokens to [1,N] and fix hybrid cache offset (#41) (2269dde)
Fix: Resolve SmallVector crash on sequential MLX requests (004e33c)
Update README: nightly v0.9.7-next release notes and link (5e03179)

Install / Upgrade

Homebrew

brew tap scouzi1966/afm
brew install scouzi1966/afm/afm-next    # fresh install
brew upgrade afm-next                    # upgrade existing
brew reinstall afm-next                  # force reinstall (same version, new build)

pip

pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-next

Switching between stable and nightly

# Homebrew
brew unlink afm && brew install scouzi1966/afm/afm-next   # switch to nightly
brew unlink afm-next && brew link afm                      # switch back to stable

# pip
pip install macafm          # stable
pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-next   # nightly

brew tap scouzi1966/afm
brew install scouzi1966/afm/afm-next


**Upgrade** (already installed):

brew upgrade afm-next


**If you have stable `afm` installed**, unlink it first:

brew unlink afm
brew install scouzi1966/afm/afm-next


**Switch back to stable**:

brew unlink afm-next
brew link afm


**Force reinstall** (same version, new build):

brew reinstall afm-next

Assets 5

07 Mar 01:34

scouzi1966

nightly-20260307-61ba012

61ba012

afm-next (20260307 · 61ba012) Pre-release

Pre-release

Nightly build from main branch.

Commit: 61ba012
Date: 20260307
Version: 0.9.7-next.61ba012.20260307

This is an unstable development build. For the latest stable release, use brew install scouzi1966/afm/afm.

Changes since last build (`9e978c5`)

Add nightly test reports for 2026-03-06: multi-model assertions + comprehensive suite (61ba012)
Add XML tool call deep validation, fix qwen3_5 dense auto-detection, multi-model test runner (c85b53f)
Fix --models discovery parsing after list-models.sh size column addition (c83d238)
Increase TIMEOUT_LOAD from 6min to 15min in mlx-model-test.sh (5714b9f)
v0.9.7: Add --help-json AI capability cards, fix model picker, add PR regression tests (4b07fb3)
Add missing nightly report mlx-model-report-20260306_134309.html (6a69ff9)
Fix qwen3_5_moe tool call detection, update test suite and skill, add nightly reports (12f75ec)
Fix XML tool call params serialized as strings instead of arrays/objects (closes #36) (#37) (d4132df)
Update README with package deployment note (94466e7)
Reset "What's new in afm-next" after v0.9.6 stable release (58152b1)
Update promote skill: build from main HEAD or nightly, add smoke tests (312b979)
Release v0.9.6: update README versions, add smoke tests to promote skill (07c905a)
Add rollback procedure to promote-nightly skill (3013567)
Add safeguard: preserve nightly release when promoting to stable (1e991ca)
Add afm-build-promote-nightly skill for promoting nightly to stable (f6d9aaf)

Install / Upgrade via Homebrew

Fresh install (first time):

brew tap scouzi1966/afm
brew install scouzi1966/afm/afm-next

Upgrade (already installed):

brew upgrade afm-next

If you have stable afm installed, unlink it first:

brew unlink afm
brew install scouzi1966/afm/afm-next

Switch back to stable:

brew unlink afm-next
brew link afm

Force reinstall (same version, new build):

brew reinstall afm-next

Assets 3

05 Mar 01:32

scouzi1966

v0.9.6

9e978c5

afm 0.9.6

Apple Foundation Models + MLX local models — OpenAI-compatible API, WebUI, all Swift.

Changes since v0.9.5

Read nightly version from BuildInfo.swift instead of hardcoding (9e978c5)
Add real MLX performance stats to API responses and console logging (20f175b)
Bump version to 0.9.6 (495363f)
Fix broken PyPI package: add missing cli.py, stage assets in publish script (8c95fcd)
Merge pull request #35 from scouzi1966/fix/chat-template-kwargs-issue-34 (9e6a073)
Update test-macafm skill with Kwargs section and checklist items (af4c10c)
Address code review: fail on invalid --default-chat-template-kwargs JSON (c59af14)
Support chat_template_kwargs API parameter (fixes #34) (acc7b61)
Update README with local experimentation instructions (809d3a0)
Move legacy scripts and release artifacts to archive/ (d73074b)
Bump version to 0.9.5, add publish-stable script (5e57324)

Install / Upgrade via Homebrew

Fresh install:

brew tap scouzi1966/afm
brew install scouzi1966/afm/afm

Upgrade:

brew upgrade afm

Install via PyPI

pip install macafm==0.9.6

Assets 3

04 Mar 00:50

scouzi1966

nightly-20260304-9e978c5

9e978c5

afm-next (20260304 · 9e978c5) Pre-release

Pre-release

Nightly build from main branch.

Commit: 9e978c5
Date: 20260304
Version: 0.9.6-next.9e978c5.20260304

This is an unstable development build. For the latest stable release, use brew install scouzi1966/afm/afm.

Changes since last build (`410d7e5`)

Read nightly version from BuildInfo.swift instead of hardcoding (9e978c5)
Add real MLX performance stats to API responses and console logging (20f175b)
Bump version to 0.9.6 (495363f)
Fix broken PyPI package: add missing cli.py, stage assets in publish script (8c95fcd)
Merge pull request #35 from scouzi1966/fix/chat-template-kwargs-issue-34 (9e6a073)
Update test-macafm skill with Kwargs section and checklist items (af4c10c)
Address code review: fail on invalid --default-chat-template-kwargs JSON (c59af14)
Support chat_template_kwargs API parameter (fixes #34) (acc7b61)
Update README with local experimentation instructions (809d3a0)
Move legacy scripts and release artifacts to archive/ (d73074b)
Bump version to 0.9.5, add publish-stable script (5e57324)
Enhance README with Vibe coding details (9d1cc38)

Install / Upgrade via Homebrew

Fresh install (first time):

brew tap scouzi1966/afm
brew install scouzi1966/afm/afm-next

Upgrade (already installed):

brew upgrade afm-next

If you have stable afm installed, unlink it first:

brew unlink afm
brew install scouzi1966/afm/afm-next

Switch back to stable:

brew unlink afm-next
brew link afm

Force reinstall (same version, new build):

brew reinstall afm-next

Assets 3

03 Mar 03:09

scouzi1966

v0.9.5

9d1cc38

afm 0.9.5

Apple Foundation Models + MLX local models — OpenAI-compatible API, WebUI, all Swift.

Changes since v0.9.4

Auto-clone homebrew-afm tap repo if missing during nightly publish (410d7e5)
Add ownership guard to build-afm-nightly-publish skill (1b71d95)
Add build-afm-nightly-publish skill (bc777d6)
Fix Jinja crash on nullable tool schemas (closes #32) (f4c80cc)
Simplify build-afm Step 3 to report only binary path and version (d96c786)
Add fork-first instruction to vibe coding callout (7f58591)
Add vibe coding callout to README for non-Swift developers (75af8fc)
Reorder build-afm prerequisites by dependency chain (2d498f1)
Add prerequisite validation to build-afm skill (1c27148)
Add skills, test reports, qwen3_5 registry alias, and bench tooling (50ed40f)
Add script paths and commands to report index page (038ba82)
Show invocation command in report header for reproducibility (c7cdfd3)
Fix index.html links to use htmlpreview for HTML reports (7d8187c)
Remove GitHub Actions Pages workflow (Actions disabled on repo) (7b4fff1)
Add GitHub Actions workflow for Pages deployment (f63791b)
Add GitHub Pages index for test reports (2e3f7bf)
Add MLX patch comparison report (3-ref with upstream-only detection) (7d38bf9)
Add YAML frontmatter to CLI help for AI agent discovery (9f755d8)
Add structured help with YAML frontmatter for AI agent discovery (a31ccd5)
Add repeatable MLX patch comparison report generator (8bbec6f)
Add cached_tokens to usage response, assertion test suite, and test-macafm skill (037c657)
Reorder test report: description above link (f9cff88)
Move test report link below title, add description with judge methodology (b8f7728)
Add afm-next nightly test report link for Qwen3.5-35B-A3B (90dfc8d)
Test harness: template mode, smart scoring fixes, report improvements (aefe39a)
Merge pull request #28 from scouzi1966/feature/optimise-metal (4d58e3f)
Fix review: QKV mode check, QuantizedKVCache mode, dead perf code (2ca6495)
Fix GLM-5 OOM + gemma-3 crash + MoE argPartition optimization (5b8e88c)
Auto-detect VLM models + fix model discovery for HF cache dirs (02a7008)
Perf: Metal kernel fusions + graph optimizations (107.6→130 tok/s, +21%) (9e9fcf3)
Perf: beat Python mlx-lm throughput (95.7→107.6 tok/s, +12%) (b11fa4a)
Add assumption for previous afm installation (afb62d6)
Update model reference in README.md (eb17f6c)
Update README with new features in nightly build (3f4e859)
Update README with new features for nightly build (d9c804b)
Update README with new API features and parameters (1da2ad3)
Fix Metal kernel fallback and temp media file cleanup (d84ab02)
Add --media flag for VLM single-prompt mode + base64 data URL support (a834638)
Qwen3.5 perf: add VLM Metal kernel + default to LLM loading (42→95 tok/s) (ec88798)
Add test scripts, reports, and benchmark strict=False fix (cdceabf)
Fix afm args quoting with shlex.split, add benchmark script (7905be4)
Fix afm: args quoting in test harness (read -ra → eval) (0ddb545)
Clean up README: consolidate install sections, add stable/nightly table (c78e5d5)
Exclude test report HTML/JSONL from GitHub language stats (a4c9c7d)
Fix CLI --stop not passed to Server in MlxCommand (f3607af)
Fix stop sequences in thinking models, add CLI --stop flag, fix JSON schema injection (050e836)
Fix claude nested session issue and regenerate report with both AI analyses (dd419d8)
Add Qwen3.5-35B-A3B-4bit test suite and report (129/132 passed) (b93db37)
Move stable install instructions below latest release link (747a67c)
Update afm-next heading wording (385e738)
Change afm-next heading to 'Available NOW' (697abf0)
Update README with Qwen3.5-35B-A3B support and afm-next install instructions (50fcd13)
Download *.jinja files and fix missing chat_template fallback (0cfba17)
Gate verbose colored logs behind --very-verbose flag (7514638)
Merge pull request #26 from alantmiller/fix/vision-async-dispatch (199cded)
Use 'Changes since last build (SHA)' format in release notes (7f4b7e7)
Show 'changes since' commit SHA in nightly release notes (62a2a1f)
Add --since flag to publish-next.sh for changelog control (5a44659)
Expand install/upgrade instructions in nightly release notes (2913362)
Include commit SHA in brew version string (969ca10)
Add commit field and fullVersion to BuildInfo.swift (586bbea)
Preserve nightly release history with unique tags (745e5cd)
Add git commit SHA to --version output (e.g. v0.9.5-abc1234) (0e5a8b6)
Add changelog and install instructions to local publish script (493c94f)
Fix MXFP4 quantization crash, token counting, gemma3n routing, and test harness (d79a1d3)
Update README with nightly build installation instructions (f1d2813)
fix: vision subcommand dispatches async run() correctly (04566f0)
Fix bare JSON tool call detection and add ToolCallFormat.swift patch (f6efa24)
Change nightly build to manual trigger only (af238ab)
Add nightly build workflow for afm-next (8892329)
Merge origin/main into feature/mlx-prompt-caching (146030f)
Add tool call parser test results (26/26 pass) (694abbb)
Fix review findings: zero-arg JSON, prefix caching default, fallback tag detection (1ee1ede)
Add hermes, llama3_json, gemma, and mistral tool call parsers (341f35d)
Merge pull request #24 from scouzi1966/feature/structured-outputs (a7b3b38)
Address PR review: nullable types, null rejection, guided streaming deltas (e25d3ef)
Add structured outputs, --guided-json CLI flag, and comprehensive test suite (3b30fa0)
Add incremental streaming tool call arguments and fix parameter name mapping (9b27cac)
Update README for v0.9.5 features (519d35f)
Update README with new features and MLX support (fe7bd74)
Add token-level streaming tool call detection and update CLAUDE.md (ec46e88)
Add tool calling, stop sequences, response_format, and real token counts (71e2c68)
Save test reports to test-reports/ with JSONL data and add Kimi brief prompt (bee6a72)
Add logprobs support, --max-logprobs switch, and dynamic system_fingerprint (b37fdff)
Bump version to v0.9.5 and add sampling params test report (4abbdee)
Add top_k, min_p, presence_penalty, and seed sampling parameters (ce69ba0)
Checkpoint: OpenClaw config, verbose logging, max_completion_tokens, and streaming improvements (b69b9a5)
Revise README for clarity on v0.9.4 features (c66764e)
Add Qwen3.5-MoE VLM support, reasoning extraction, --raw flag, and stream cancellation (6cf821c)
Checkpoint: pre Qwen3.5-397B-A17B-4bit reclassify (8f35675)
Update OpenCode usage instructions in README (c72a023)
Revise installation methods in README (73c1640)
Swap OpenCode setup steps: configure first, then start afm (694f456)
Add detailed OpenCode /connect instructions to README (b15801b)
Add OpenCode integration guide to README (90af069)
Wire all MLX CLI params to server mode, enhance generation logging (1bc007f)
Revise installation command formatting in README (38c25d2)
Update README with model repo environment variable (51ef4b4)
Update README.md (5bc112f)
Revise README for v0.9.4 feature announcement (3d6c296)
Update README with feature listing and API access (d7b3bfa)
Revise README for MLX model support and commands (c89f191)
Add MLX excitement and quick install to README hero section (4f9b374)
Add MLX models screenshot to README (0d1bd07)
Add files via upload (f979ab4)
Update README with MLX local model support and new v0.9.4 features (9d86386)
Add regression test report: 61/61 passed (680c465)
Add MLX model test report: 27/28 passed, Kimi-K2.5 interrupted (fa84cf7)
Fix MLX metallib resolution for relocated binaries (5059f1b)

Install / Upgrade via Homebrew

Fresh install:

brew tap scouzi1966/afm
brew install scouzi1966/afm/afm

Upgrade:

brew upgrade afm

Install via PyPI

pip install macafm==0.9.5

Assets 3

Releases: scouzi1966/maclocal-api

afm 0.9.8

afm 0.9.8

Changes since v0.9.7

Install / Upgrade via Homebrew

Install via PyPI

Uh oh!

afm-next (20260327 · 62395ab)

Changes since last build (a0371cc)

Install / Upgrade

Homebrew

pip

Switching between stable and nightly

Uh oh!

afm-next (20260320 · a0371cc)

Changes since last build (072340b)

Install / Upgrade

Homebrew

pip

Switching between stable and nightly

Uh oh!

afm 0.9.7

afm 0.9.7

Highlights

Bug Fixes

Testing & Quality

Install / Upgrade

Uh oh!

afm-next (20260316 · 072340b)

Changes since last build (a49c207)

Install / Upgrade

Homebrew

pip

Switching between stable and nightly

Uh oh!

afm-next (20260312 · a49c207)

Changes since last build (61ba012)

Install / Upgrade

Homebrew

pip

Switching between stable and nightly

Uh oh!

afm-next (20260307 · 61ba012)

Changes since last build (9e978c5)

Install / Upgrade via Homebrew

Uh oh!

afm 0.9.6

afm 0.9.6

Changes since v0.9.5

Install / Upgrade via Homebrew

Install via PyPI

Uh oh!

afm-next (20260304 · 9e978c5)

Changes since last build (410d7e5)

Install / Upgrade via Homebrew

Uh oh!

afm 0.9.5

afm 0.9.5

Changes since v0.9.4

Install / Upgrade via Homebrew

Install via PyPI

Uh oh!

Changes since last build (`a0371cc`)

Changes since last build (`072340b`)

Changes since last build (`a49c207`)

Changes since last build (`61ba012`)

Changes since last build (`9e978c5`)

Changes since last build (`410d7e5`)