Releases: scouzi1966/maclocal-api
afm 0.9.8
afm 0.9.8
Apple Foundation Models + MLX local models — OpenAI-compatible API, WebUI, all Swift.
Changes since v0.9.7
Batch dispatch & concurrency:
- OpenAI-compatible
/v1/batchesand/v1/filesendpoints - SSE multiplex endpoint
/v1/batch/completions - Batched prefill for concurrent requests (B=N single forward pass)
- Auto-promotion/teardown lifecycle for batch mode
- Multi-slot reservation, cancellation, and active count
GPU profiling:
--gpu-profile,--gpu-trace,--gpu-capture,--gpu-profile-bwmodes- Per-request
X-AFM-ProfileAPI - Native IOReport GPU power monitoring (no mactop dependency)
- Auto-detect shader-enabled Instruments template
Tool calling improvements:
- Mistral
[ARGS]fallback parser, gemma3_text format - Tool call parsing improvements for ToolCall-15 benchmark
- Batched tool call runtime and constrained tooling parity
Stability & performance:
- Release original weights after fusion to save 11 GB GPU memory
- Fix relocated binary crash (pip install / Homebrew)
- Fix SSM Metal kernel group index for batch_size > 1
- Grammar-constrained decoding via OpenAI
strict: true - RadixTreeCache always created on model load
truncateToOffset()for efficient KV cache management
Install / Upgrade via Homebrew
Fresh install:
brew tap scouzi1966/afm
brew install scouzi1966/afm/afm
Upgrade:
brew upgrade afm
Install via PyPI
pip install macafm==0.9.8
afm-next (20260327 · 62395ab)
Nightly build from main branch.
- Commit: 62395ab
- Date: 20260327
- Version: 0.9.8-next.62395ab.20260327
This is an unstable development build. For the latest stable release, use
brew install scouzi1966/afm/afm.
Changes since last build (a0371cc)
- Add promptfoo agentic evals to build and test skills (
62395ab) - fix: SSM Metal kernel group index for batch_size > 1 (
c3ad99a) - fix: Mistral [ARGS] fallback parser, gemma3_text format, tool parser logging (
f6bcfe3) - fix: gemma3_text tool format, tool name resolution, controller fallback (
d5d5acf) - fix: tool call fallback parsing in batch scheduler non-streaming path (
ab71925) - fix: CacheList serial generation, zero-width KV, bare XML function fallback (
64811b6) - fix: SSM segsum mask broadcast crash in batch mode for hybrid models (
b63ab84) - fix: tool call parsing improvements for ToolCall-15 benchmark integration (
a6cd3b8) - feat: batched prefill for concurrent requests (#63) (
0007f91) - test: add unit tests for PR review fixes (
e5f2d6d) - fix: address PR review — cancel, race condition, JSONL validation (
687197c) - fix: batch all non-multimodal requests + guard path existence check (
96abf34) - feat: batched prefill for concurrent requests (B=N single forward pass) (
c4fa145) - docs: update test inventory and skill with post-processing parity tests (
7c6fc96) - test(batch): add unit tests for post-processing parity (
b3462f5) - fix(batch): add post-processing parity to batch controllers (
ffbd692) - fix: add batch protocol stubs to FakeMLXChatService test helper (
83c44f3) - feat(batch): add Section 15 batch dispatch tests to assertion harness (
5fb4349) - feat(batch): register batch API and SSE multiplex routes (
8b6c53f) - feat(batch): add SSE multiplex endpoint /v1/batch/completions (
d13d78d) - feat(batch): add OpenAI-compatible /v1/batches and /v1/files endpoints (
1b90d0b) - fix: route non-streaming requests through BatchScheduler when concurrent mode active (
365f57b) - feat(batch): add auto-promotion/teardown lifecycle for batch mode (
701c39e) - feat(batch): add BatchStore actor for in-memory file and batch state (
2dd672e) - feat(batch): add multi-slot reservation, cancellation, and active count to BatchScheduler (
e8a89b1) - feat(batch): add request and response types for batch dispatch API (
cb6df75) - Add batch dispatch implementation plan (
2fc9904) - Address spec review findings: race conditions, slot reservation, cancellation (
97ff926) - Add batch dispatch API design spec (
3503f16) - fix: release original weights after fusion to save 11 GB GPU memory (
19af692) - cleanup: remove LLMModel.swift from patch system (no changes from upstream) (
e20f92e) - revert: remove ineffective Memory.clearCache() and sync eval changes (
8d189c8) - fix: add Memory.clearCache() after prefill, add LLMModel.swift to patch system (
38f7c31) - fix: skip vision_tower weights when loading VLM safetensors as LLM (
db64fd9) - feat: add peak_memory_gib to usage, reset per request, fix serial prefix caching (
7632199) - Wire OpenAI strict: true to grammar-constrained decoding (v2) (
7aed2ef) - Update benchmark config for high-concurrency testing (B=180) (
5f08f1f) - feat: X-AFM-Profile API for per-request GPU profiling (#60) (
e0e9bb6) - Merge pull request #58 from scouzi1966/codex/feature/codex-batched-tooling (
4410ad9) - Fix batched tooling review issues (
98868c3) - Fix graceful shutdown crash and improve benchmark harness (
8805fb0) - feat: native IOReport GPU power monitoring (no mactop dependency) (
92dab29) - Add --afm-only mode, --afm CLI flag, and smart graph titles to benchmark harness (
4b696ef) - Add scheduler-native constrained tooling parity (
0cdf0c7) - docs: add GPU shader profiling to test-macafm skill and test inventory (
6dafee0) - feat: add gpu-profile-report.py harness and command line in profile output (
ded3029) - docs: update CLAUDE.md GPU profiling section with measured results and shader template (
4c9a950) - feat: auto-detect shader-enabled Instruments template for per-kernel GPU profiling (
c656e10) - feat: add GPU shader profiling tools (--gpu-profile, --gpu-trace, --gpu-capture, --gpu-profile-bw) (
c787f77) - Add batched tool call runtime (
30b64ba) - Merge pull request #57 from scouzi1966/codex/feature/codex-promptfoo-suite (
8682af7) - Remove promptfoo report artifacts from PR (
498c63f) - Add Sourcery PR review workflow (
b7c9bf0) - Add batched tooling feasibility note (
77396d2) - Add paged attention feasibility note (
59d52e9) - Update promptfoo output defaults (
ce9034a) - Add primary-source agent framework suites (
f5b0132) - Add promptfoo agentic reports for codex next 0.9.8 (
8f0dcc2) - Fix promptfoo suite runner exit handling (
7783783) - Add Promptfoo agentic eval suite (
cd77762) - Document exact replay investigation (
30e33f0) - Update cache validation logs (
b583c07) - Add cache save logs (
4c6a9d8) - Add cache replay diagnostics (
6c06167) - Fix cache profiling export (
0314c52) - Update nightly release link to 20260320-a0371cc (
0b25850)
Install / Upgrade
Homebrew
brew tap scouzi1966/afm
brew install scouzi1966/afm/afm-next # fresh install
brew upgrade afm-next # upgrade existing
brew reinstall afm-next # force reinstall (same version, new build)pip
pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-nextSwitching between stable and nightly
# Homebrew
brew unlink afm && brew install scouzi1966/afm/afm-next # switch to nightly
brew unlink afm-next && brew link afm # switch back to stable
# pip
pip install macafm # stable
pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-next # nightlyafm-next (20260320 · a0371cc)
Nightly build from main branch.
- Commit: a0371cc
- Date: 20260320
- Version: 0.9.8-next.a0371cc.20260320
This is an unstable development build. For the latest stable release, use
brew install scouzi1966/afm/afm.
Changes since last build (072340b)
- Update nightly release link to 20260320-90693a7 (
a0371cc) - Bump version to v0.9.8 (
90693a7) - test: add comprehensive test suite, roadmap docs, and test reports (
8e3148f) - test: add performance baseline comparison tests (
7b4907c) - perf: replace state=state round-trip with truncateToOffset() (
8526aa8) - feat: always create RadixTreeCache on model load (
b55e458) - feat: add truncateToOffset() to BaseKVCache and KVCacheSimple (
76a25d0) - Add AFM vs mlx-lm concurrency benchmark script and results (
a80484a) - Update README to reflect MLX LLM terminology (
d73620a) - Note that stable and nightly are currently at the same level (
8b0d70a) - Release v0.9.7: promote nightly to stable (
107ccbd) - Add mandatory clean-slate install testing to promote skill (
414c144) - Make WebUI mandatory in all release skills (
0d1889d) - Test no-reply attribution (
718c918) - Add repo-local Codex skills (
6e50f55) - Bump nightly wheel version to 0.9.7.dev20260316 (
0b7e5ff) - Update nightly release link to 20260316-072340b (
1d982a6)
Install / Upgrade
Homebrew
brew tap scouzi1966/afm
brew install scouzi1966/afm/afm-next # fresh install
brew upgrade afm-next # upgrade existing
brew reinstall afm-next # force reinstall (same version, new build)pip
pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-nextSwitching between stable and nightly
# Homebrew
brew unlink afm && brew install scouzi1966/afm/afm-next # switch to nightly
brew unlink afm-next && brew link afm # switch back to stable
# pip
pip install macafm # stable
pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-next # nightlyafm 0.9.7
afm 0.9.7
Apple Foundation Models + MLX local models — OpenAI-compatible API, WebUI, all Swift.
Highlights
- Concurrent batch decoding — pipelined batch decode with round-robin interleaving and shared prefix cache (
--concurrent N) - Telegram bridge — remote chat via Telegram bot (replaces iMessage bridge)
- XGrammar structured output — native C++ grammar constraints for tool calls (EBNF-first, enabled by default)
- Radix tree prefix cache — multi-slot prefix caching replaces single-slot PromptCacheBox
--help-json— AI capability cards for tool-using agents (model/feature discovery)- New model support — Nemotron H latent MoE, Qwen3.5 MoE/dense, GLM4/5 MoE
Bug Fixes
- Fix XML tool call params serialized as strings instead of arrays/objects (#36, #37)
- Fix
qwen3_5dense model auto-detection for Qwen3.5-9B - Fix
qwen3_5_moetool call format detection for Qwen3.5-35B-A3B - Fix VLM prefix cache crash: reshape suffix tokens and fix hybrid cache offset (#41)
- Fix SmallVector crash on sequential MLX requests
- Fix prefix cache
broadcast_shapescrash (#47) - Fix 503 rejection: move capacity check from middleware to controller
- Fix streaming tool call arg leak, grammar reset, and hybrid XML parser
- Fix WebUI path resolution for external launches
Testing & Quality
- Multi-model assertion test runner with XML tool call deep validation (Section 11)
- Grammar constraint tests (Section 13) and unit test tier
- Pipelined batch decode benchmarks
- 7 flaky assertion test fixes
Install / Upgrade
Homebrew:
brew tap scouzi1966/afm
brew install scouzi1966/afm/afm
# or upgrade:
brew upgrade afm
PyPI:
pip install macafm==0.9.7
afm-next (20260316 · 072340b)
Nightly build from main branch.
- Commit: 072340b
- Date: 20260316
- Version: 0.9.7-next.072340b.20260316
This is an unstable development build. For the latest stable release, use
brew install scouzi1966/afm/afm.
Changes since last build (a49c207)
- Fix sampling params test: replace log file check with API validation (
072340b) - Fix test scripts: handle empty-choices usage chunk and thinking models (
4f7e3c1) - Cancel in-flight Telegram requests on reset (
6522279) - Improve Telegram empty-response diagnostics (
731e469) - Address Telegram PR review feedback (
9711017) - Harden Telegram state storage (
4867f86) - Make Telegram the sole remote bridge (
ab4d199) - Add experimental iMessage bridge (
e2c8184) - Fix review: pass logprobs in empty-text stop-sequence chunk (
7b030ca) - Fix 7 flaky assertion tests: stoppedBySequence signal + test robustness (
2cb4848) - Merge pull request #50 from scouzi1966/feature/mlx-concurrent-batch (
151bc0d) - Merge pull request #49 from scouzi1966/feature/codex-optimize-api (
e0e2390) - Fix 503 rejection: move capacity check from broken middleware to controller (
ecb2377) - Disable thinking for guided JSON (
95a964f) - Fix --concurrent help text: remove misleading "default 4" (
eaadfab) - Validate guided JSON before MLX startup (
c3e2958) - Address review feedback on evals and CLI output (
437ba48) - Add --concurrent N safeguards: max concurrency limit, 503 rejection, serial fallback (
49ef55b) - Fix WebUI path resolution for external launches (
a6b2548) - Support Nemotron H latent MoE variant (
3469eb5) - Improve API compatibility evals and finish reasons (
4b49ef6) - checkpoint: pre-deferred-batch-promotion (
33e42dc) - Update benchmark results with pipelined decode numbers (
a25b5b4) - Pipelined batch decode: dispatch previous step's tokens while computing next (
d070fad) - Optimize batch decode: lazy eval + reduced actor yield (
27e2df3) - Phase 2: dense batched decoding with BatchKVCacheSimple (
20fb2e1) - Phase 1: concurrent generation with round-robin interleaving and shared prefix cache (
9b5fd1b) - Add xgrammar v0.1.32 constexpr linker fix to patch system (
239f369) - Add repository contributor guide (
8b0ddff) - Add roadmap: incremental delta.tool_calls argument streaming (
80e84e6) - Add pip install method to release notes and nightly publish skill (
f25a2f5) - Add nightly wheel distribution via pip from kruks.ai (
ae833d2) - Add changelog filtering and README update step to nightly publish skill (
aef5232) - Update nightly release link to 20260312-a49c207 (
44ca769)
Install / Upgrade
Homebrew
brew tap scouzi1966/afm
brew install scouzi1966/afm/afm-next # fresh install
brew upgrade afm-next # upgrade existing
brew reinstall afm-next # force reinstall (same version, new build)pip
pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-nextSwitching between stable and nightly
# Homebrew
brew unlink afm && brew install scouzi1966/afm/afm-next # switch to nightly
brew unlink afm-next && brew link afm # switch back to stable
# pip
pip install macafm # stable
pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-next # nightlyafm-next (20260312 · a49c207)
Nightly build from main branch.
- Commit: a49c207
- Date: 20260312
- Version: 0.9.7-next.a49c207.20260312
This is an unstable development build. For the latest stable release, use
brew install scouzi1966/afm/afm.
Changes since last build (61ba012)
- Fix prefix cache save path: add state round-trip after trim (#47) (
a49c207) - Restore JSON object/array parsing in XML params, accept object arguments in multi-turn (
dee050b) - Fix streaming tool call arg leak, grammar reset, and hybrid XML parser (
912ee29) - Add SSE-level tool call logging, fix JSON-as-object bug, add max_context_length to /v1/models (
a8cd4ef) - Fix prefix cache broadcast_shapes crash (#47), add cache HIT/MISS logging, fix log formatting (
31a55af) - Add grammar constraints visibility, prefix cache fixes, toolcall matrix testing, and realworld workload generator (
b144a4a) - Add changelog baseline selection to nightly publish skill (
c036fe0) - Add post-build verification, true clean build, and test/fix/rebuild loop to nightly publish skill (
cc5807e) - Add unit test tier, grammar constraint tests (Section 13), and test index/coverage badges (
3d71b40) - Make grammar constraints opt-in, add decodeJSONEscapes for model pre-escaping (
ac340b3) - Add --vv trace logging, EBNF named required params, fix AnyCodableValue cast bug (
9d89e30) - Fix OpenCode log flags: quote DEBUG in --log-level "DEBUG" --print-logs (
6efa346) - Add OpenCode log location docs and XML entity decoding regression test (
057255d) - Skip NSXMLParser for tool call parsing — use regex-only to fix bare < and & in code content (
8848584) - Clean up log formatting: remove blank lines, compact log handler, reorder tool-call-parser help (
b26a96e) - Add grammar diagnostic logging, coercion logging, and XMLParser body preview (
1b762ca) - Dynamic think tags, EBNF-first grammar, incremental type coercion (
37a42f7) - Add fuzzy tool name correction for hallucinated names in fallback parsers (
563a098) - Add xgrammar StructuralTag constraint + vLLM-style reasoner gating for tool calls (
42578ff) - Fix xgrammar stop-token warning, array/object coercion, add type coercion tests (
ba6bd14) - Change default prefill step size to 1024, add multi-turn benchmark (
edf6935) - Wire --enable-prefix-caching flag, add prefix cache benchmark script (
ce5bed9) - Enable xgrammar tool constraint by default, gate with DISABLE flag (
76f71d1) - Gate xgrammar tool constraint behind compile flag ENABLE_XGRAMMAR_TOOL_CONSTRAINT (
7412265) - Rename llamacpp_tool_parser to afm_adaptive_xml (
b7baada) - Add llamacpp_tool_parser: JSON-in-XML fallback, type coercion, tool_choice=none (
8e77b69) - cleanup: remove XGrammar Python subprocess bridge (
d624f50) - feat: wire XGrammarService into generation pipeline (
66eb8c4) - feat: add XGrammarService with native C++ grammar matching (
051ff54) - feat: add CXGrammar SPM target with C wrapper around xgrammar C++ (
e5d1d4b) - vendor: add xgrammar C++ library as submodule (v0.1.17) (
e180429) - docs: add XGrammar C++ interop implementation plan (
ca21613) - docs: add XGrammar C++ interop design (
d33cd3a) - Add inference optimizations testing design document (
6b962b2) - fix: address code review findings (
85e4248) - feat: add RequestScheduler for fair request scheduling (
5238e9b) - test: add KV cache eviction test suite (
6fc67dc) - feat: add --kv-eviction streaming for StreamingLLM-style context handling (
f571a61) - test: add json_schema constrained decoding test (
52d70ad) - feat: add Swift XGrammarBridge client for subprocess communication (
be6cab5) - feat: add XGrammar Python bridge for structured output (
88d70d8) - fix: allow radix cache hits on partial edge matches (
d716c7c) - test: add prefix cache multi-hit test to assertions suite (
731c667) - feat: replace single-slot PromptCacheBox with RadixTreeCache (
c3df386) - feat: add RadixTreeCache data structure for multi-slot prefix caching (
5bdd6fd) - Add detailed implementation plan for inference optimizations (
5d3c11d) - Add configuration & flags section to optimization design (
aa8e78d) - Replace custom FSM with XGrammar for structured output (
02c518e) - Remove speculative decoding from optimization plan (
93277f2) - Add inference optimizations design document (
7af6f00) - Fix VLM prefix cache crash: reshape suffix tokens to [1,N] and fix hybrid cache offset (#41) (
2269dde) - Fix: Resolve SmallVector crash on sequential MLX requests (
004e33c) - Update README: nightly v0.9.7-next release notes and link (
5e03179)
Install / Upgrade
Homebrew
brew tap scouzi1966/afm
brew install scouzi1966/afm/afm-next # fresh install
brew upgrade afm-next # upgrade existing
brew reinstall afm-next # force reinstall (same version, new build)pip
pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-nextSwitching between stable and nightly
# Homebrew
brew unlink afm && brew install scouzi1966/afm/afm-next # switch to nightly
brew unlink afm-next && brew link afm # switch back to stable
# pip
pip install macafm # stable
pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-next # nightlybrew tap scouzi1966/afm
brew install scouzi1966/afm/afm-next
**Upgrade** (already installed):
brew upgrade afm-next
**If you have stable `afm` installed**, unlink it first:
brew unlink afm
brew install scouzi1966/afm/afm-next
**Switch back to stable**:
brew unlink afm-next
brew link afm
**Force reinstall** (same version, new build):
brew reinstall afm-next
afm-next (20260307 · 61ba012)
Nightly build from main branch.
- Commit: 61ba012
- Date: 20260307
- Version: 0.9.7-next.61ba012.20260307
This is an unstable development build. For the latest stable release, use
brew install scouzi1966/afm/afm.
Changes since last build (9e978c5)
- Add nightly test reports for 2026-03-06: multi-model assertions + comprehensive suite (
61ba012) - Add XML tool call deep validation, fix qwen3_5 dense auto-detection, multi-model test runner (
c85b53f) - Fix --models discovery parsing after list-models.sh size column addition (
c83d238) - Increase TIMEOUT_LOAD from 6min to 15min in mlx-model-test.sh (
5714b9f) - v0.9.7: Add --help-json AI capability cards, fix model picker, add PR regression tests (
4b07fb3) - Add missing nightly report mlx-model-report-20260306_134309.html (
6a69ff9) - Fix qwen3_5_moe tool call detection, update test suite and skill, add nightly reports (
12f75ec) - Fix XML tool call params serialized as strings instead of arrays/objects (closes #36) (#37) (
d4132df) - Update README with package deployment note (
94466e7) - Reset "What's new in afm-next" after v0.9.6 stable release (
58152b1) - Update promote skill: build from main HEAD or nightly, add smoke tests (
312b979) - Release v0.9.6: update README versions, add smoke tests to promote skill (
07c905a) - Add rollback procedure to promote-nightly skill (
3013567) - Add safeguard: preserve nightly release when promoting to stable (
1e991ca) - Add afm-build-promote-nightly skill for promoting nightly to stable (
f6d9aaf)
Install / Upgrade via Homebrew
Fresh install (first time):
brew tap scouzi1966/afm
brew install scouzi1966/afm/afm-next
Upgrade (already installed):
brew upgrade afm-next
If you have stable afm installed, unlink it first:
brew unlink afm
brew install scouzi1966/afm/afm-next
Switch back to stable:
brew unlink afm-next
brew link afm
Force reinstall (same version, new build):
brew reinstall afm-next
afm 0.9.6
afm 0.9.6
Apple Foundation Models + MLX local models — OpenAI-compatible API, WebUI, all Swift.
Changes since v0.9.5
- Read nightly version from BuildInfo.swift instead of hardcoding (
9e978c5) - Add real MLX performance stats to API responses and console logging (
20f175b) - Bump version to 0.9.6 (
495363f) - Fix broken PyPI package: add missing cli.py, stage assets in publish script (
8c95fcd) - Merge pull request #35 from scouzi1966/fix/chat-template-kwargs-issue-34 (
9e6a073) - Update test-macafm skill with Kwargs section and checklist items (
af4c10c) - Address code review: fail on invalid --default-chat-template-kwargs JSON (
c59af14) - Support chat_template_kwargs API parameter (fixes #34) (
acc7b61) - Update README with local experimentation instructions (
809d3a0) - Move legacy scripts and release artifacts to archive/ (
d73074b) - Bump version to 0.9.5, add publish-stable script (
5e57324)
Install / Upgrade via Homebrew
Fresh install:
brew tap scouzi1966/afm
brew install scouzi1966/afm/afm
Upgrade:
brew upgrade afm
Install via PyPI
pip install macafm==0.9.6
afm-next (20260304 · 9e978c5)
Nightly build from main branch.
- Commit: 9e978c5
- Date: 20260304
- Version: 0.9.6-next.9e978c5.20260304
This is an unstable development build. For the latest stable release, use
brew install scouzi1966/afm/afm.
Changes since last build (410d7e5)
- Read nightly version from BuildInfo.swift instead of hardcoding (
9e978c5) - Add real MLX performance stats to API responses and console logging (
20f175b) - Bump version to 0.9.6 (
495363f) - Fix broken PyPI package: add missing cli.py, stage assets in publish script (
8c95fcd) - Merge pull request #35 from scouzi1966/fix/chat-template-kwargs-issue-34 (
9e6a073) - Update test-macafm skill with Kwargs section and checklist items (
af4c10c) - Address code review: fail on invalid --default-chat-template-kwargs JSON (
c59af14) - Support chat_template_kwargs API parameter (fixes #34) (
acc7b61) - Update README with local experimentation instructions (
809d3a0) - Move legacy scripts and release artifacts to archive/ (
d73074b) - Bump version to 0.9.5, add publish-stable script (
5e57324) - Enhance README with Vibe coding details (
9d1cc38)
Install / Upgrade via Homebrew
Fresh install (first time):
brew tap scouzi1966/afm
brew install scouzi1966/afm/afm-next
Upgrade (already installed):
brew upgrade afm-next
If you have stable afm installed, unlink it first:
brew unlink afm
brew install scouzi1966/afm/afm-next
Switch back to stable:
brew unlink afm-next
brew link afm
Force reinstall (same version, new build):
brew reinstall afm-next
afm 0.9.5
afm 0.9.5
Apple Foundation Models + MLX local models — OpenAI-compatible API, WebUI, all Swift.
Changes since v0.9.4
- Auto-clone homebrew-afm tap repo if missing during nightly publish (
410d7e5) - Add ownership guard to build-afm-nightly-publish skill (
1b71d95) - Add build-afm-nightly-publish skill (
bc777d6) - Fix Jinja crash on nullable tool schemas (closes #32) (
f4c80cc) - Simplify build-afm Step 3 to report only binary path and version (
d96c786) - Add fork-first instruction to vibe coding callout (
7f58591) - Add vibe coding callout to README for non-Swift developers (
75af8fc) - Reorder build-afm prerequisites by dependency chain (
2d498f1) - Add prerequisite validation to build-afm skill (
1c27148) - Add skills, test reports, qwen3_5 registry alias, and bench tooling (
50ed40f) - Add script paths and commands to report index page (
038ba82) - Show invocation command in report header for reproducibility (
c7cdfd3) - Fix index.html links to use htmlpreview for HTML reports (
7d8187c) - Remove GitHub Actions Pages workflow (Actions disabled on repo) (
7b4fff1) - Add GitHub Actions workflow for Pages deployment (
f63791b) - Add GitHub Pages index for test reports (
2e3f7bf) - Add MLX patch comparison report (3-ref with upstream-only detection) (
7d38bf9) - Add YAML frontmatter to CLI help for AI agent discovery (
9f755d8) - Add structured help with YAML frontmatter for AI agent discovery (
a31ccd5) - Add repeatable MLX patch comparison report generator (
8bbec6f) - Add cached_tokens to usage response, assertion test suite, and test-macafm skill (
037c657) - Reorder test report: description above link (
f9cff88) - Move test report link below title, add description with judge methodology (
b8f7728) - Add afm-next nightly test report link for Qwen3.5-35B-A3B (
90dfc8d) - Test harness: template mode, smart scoring fixes, report improvements (
aefe39a) - Merge pull request #28 from scouzi1966/feature/optimise-metal (
4d58e3f) - Fix review: QKV mode check, QuantizedKVCache mode, dead perf code (
2ca6495) - Fix GLM-5 OOM + gemma-3 crash + MoE argPartition optimization (
5b8e88c) - Auto-detect VLM models + fix model discovery for HF cache dirs (
02a7008) - Perf: Metal kernel fusions + graph optimizations (107.6→130 tok/s, +21%) (
9e9fcf3) - Perf: beat Python mlx-lm throughput (95.7→107.6 tok/s, +12%) (
b11fa4a) - Add assumption for previous afm installation (
afb62d6) - Update model reference in README.md (
eb17f6c) - Update README with new features in nightly build (
3f4e859) - Update README with new features for nightly build (
d9c804b) - Update README with new API features and parameters (
1da2ad3) - Fix Metal kernel fallback and temp media file cleanup (
d84ab02) - Add --media flag for VLM single-prompt mode + base64 data URL support (
a834638) - Qwen3.5 perf: add VLM Metal kernel + default to LLM loading (42→95 tok/s) (
ec88798) - Add test scripts, reports, and benchmark strict=False fix (
cdceabf) - Fix afm args quoting with shlex.split, add benchmark script (
7905be4) - Fix afm: args quoting in test harness (read -ra → eval) (
0ddb545) - Clean up README: consolidate install sections, add stable/nightly table (
c78e5d5) - Exclude test report HTML/JSONL from GitHub language stats (
a4c9c7d) - Fix CLI --stop not passed to Server in MlxCommand (
f3607af) - Fix stop sequences in thinking models, add CLI --stop flag, fix JSON schema injection (
050e836) - Fix claude nested session issue and regenerate report with both AI analyses (
dd419d8) - Add Qwen3.5-35B-A3B-4bit test suite and report (129/132 passed) (
b93db37) - Move stable install instructions below latest release link (
747a67c) - Update afm-next heading wording (
385e738) - Change afm-next heading to 'Available NOW' (
697abf0) - Update README with Qwen3.5-35B-A3B support and afm-next install instructions (
50fcd13) - Download *.jinja files and fix missing chat_template fallback (
0cfba17) - Gate verbose colored logs behind --very-verbose flag (
7514638) - Merge pull request #26 from alantmiller/fix/vision-async-dispatch (
199cded) - Use 'Changes since last build (SHA)' format in release notes (
7f4b7e7) - Show 'changes since' commit SHA in nightly release notes (
62a2a1f) - Add --since flag to publish-next.sh for changelog control (
5a44659) - Expand install/upgrade instructions in nightly release notes (
2913362) - Include commit SHA in brew version string (
969ca10) - Add commit field and fullVersion to BuildInfo.swift (
586bbea) - Preserve nightly release history with unique tags (
745e5cd) - Add git commit SHA to --version output (e.g. v0.9.5-abc1234) (
0e5a8b6) - Add changelog and install instructions to local publish script (
493c94f) - Fix MXFP4 quantization crash, token counting, gemma3n routing, and test harness (
d79a1d3) - Update README with nightly build installation instructions (
f1d2813) - fix: vision subcommand dispatches async run() correctly (
04566f0) - Fix bare JSON tool call detection and add ToolCallFormat.swift patch (
f6efa24) - Change nightly build to manual trigger only (
af238ab) - Add nightly build workflow for afm-next (
8892329) - Merge origin/main into feature/mlx-prompt-caching (
146030f) - Add tool call parser test results (26/26 pass) (
694abbb) - Fix review findings: zero-arg JSON, prefix caching default, fallback tag detection (
1ee1ede) - Add hermes, llama3_json, gemma, and mistral tool call parsers (
341f35d) - Merge pull request #24 from scouzi1966/feature/structured-outputs (
a7b3b38) - Address PR review: nullable types, null rejection, guided streaming deltas (
e25d3ef) - Add structured outputs, --guided-json CLI flag, and comprehensive test suite (
3b30fa0) - Add incremental streaming tool call arguments and fix parameter name mapping (
9b27cac) - Update README for v0.9.5 features (
519d35f) - Update README with new features and MLX support (
fe7bd74) - Add token-level streaming tool call detection and update CLAUDE.md (
ec46e88) - Add tool calling, stop sequences, response_format, and real token counts (
71e2c68) - Save test reports to test-reports/ with JSONL data and add Kimi brief prompt (
bee6a72) - Add logprobs support, --max-logprobs switch, and dynamic system_fingerprint (
b37fdff) - Bump version to v0.9.5 and add sampling params test report (
4abbdee) - Add top_k, min_p, presence_penalty, and seed sampling parameters (
ce69ba0) - Checkpoint: OpenClaw config, verbose logging, max_completion_tokens, and streaming improvements (
b69b9a5) - Revise README for clarity on v0.9.4 features (
c66764e) - Add Qwen3.5-MoE VLM support, reasoning extraction, --raw flag, and stream cancellation (
6cf821c) - Checkpoint: pre Qwen3.5-397B-A17B-4bit reclassify (
8f35675) - Update OpenCode usage instructions in README (
c72a023) - Revise installation methods in README (
73c1640) - Swap OpenCode setup steps: configure first, then start afm (
694f456) - Add detailed OpenCode /connect instructions to README (
b15801b) - Add OpenCode integration guide to README (
90af069) - Wire all MLX CLI params to server mode, enhance generation logging (
1bc007f) - Revise installation command formatting in README (
38c25d2) - Update README with model repo environment variable (
51ef4b4) - Update README.md (
5bc112f) - Revise README for v0.9.4 feature announcement (
3d6c296) - Update README with feature listing and API access (
d7b3bfa) - Revise README for MLX model support and commands (
c89f191) - Add MLX excitement and quick install to README hero section (
4f9b374) - Add MLX models screenshot to README (
0d1bd07) - Add files via upload (
f979ab4) - Update README with MLX local model support and new v0.9.4 features (
9d86386) - Add regression test report: 61/61 passed (
680c465) - Add MLX model test report: 27/28 passed, Kimi-K2.5 interrupted (
fa84cf7) - Fix MLX metallib resolution for relocated binaries (
5059f1b)
Install / Upgrade via Homebrew
Fresh install:
brew tap scouzi1966/afm
brew install scouzi1966/afm/afm
Upgrade:
brew upgrade afm
Install via PyPI
pip install macafm==0.9.5