afm-next (20260613 · 5aad36d)
Pre-release
Pre-release
·
24 commits
to main
since this release
Nightly build from main branch.
- Commit: 5aad36d
- Date: 20260613
- Version: 0.9.13-next.5aad36d.20260613
This is an unstable development build. For the latest stable release, use
brew install scouzi1966/afm/afm.
Changes since last build (4bd0ec62a2cc39e4d69463f5ff8f1c119a3759ed)
- Merge speculative-decoding (MTP/EAGLE3 + streaming + benchmarks) into main (
5aad36d) - docs: add "which flag for which model" decision section (
4300baf) - merge afm-opt: PR #134 build/metallib + whitespace review fixes (
0b3d120) - fix(build): address PR #134 review — metallib install guard, debug symlink, probe, whitespace (
248a8b5) - fix(spec-stream): address PR #135 review — think injection, error propagation, cancellation (
dea202c) - docs(bench): streaming spec-decode retest results (MTP + EAGLE3) (
689cb83) - feat(spec): streaming support for MTP and EAGLE3 fast paths (
e88011d) - docs: decode-optimization feature guide + release notes / social copy (
e0935fd) - perf(eagle3)+bench: lossless bs=2 fast path; afm vs mlx-vlm verify-fidelity (
f518a25) - bench(eagle3): afm vs mlx-vlm EAGLE3 head-to-head on dense Gemma4-31B (
f535e21) - bench(qwen36): 2026-06-06 MLX engine re-run (latest) + MTP head-to-head (
f5a080a) - feat(eagle3): P2/P3 — --eagle3 CLI + service routing, +22% decode on Gemma4-31B (
81bd262) - feat(eagle3): P1 greedy speculative loop — output identical to greedy AR (
7226736) - fix(gemma4): proportional RoPE on full-attention layers (was stock RoPE) (
5a81b9b) - docs: document /v1/embeddings API and list embed/speech in help card (#131) (
8f6a999) - feat(eagle3): P0 — Swift Gemma4Eagle3Drafter, bit-exact vs Python reference (
e3f082c) - feat(eagle3): P0 reference-capture for the dense Gemma4-31B EAGLE3 port (
4f8c9a4) - docs(eagle3): phased afm/Swift port plan for dense Gemma4-31B EAGLE3 (+25% validated) (
80ea9b5) - bench(gemma4): dense 31B flips it — EAGLE3 +25% (vs MoE all-negative) (
551dbb8) - bench(gemma4): spec-decode validation — all 3 methods SLOWER than AR on MoE (negative) (
09720b0) - docs(mtp): record the +52% win; note --mtp-depth now vestigial (
ade3ae2) - perf(mtp): rewrite loop after mlx-lm PR #990 — +52% decode vs AR (was +6%) (
9d86fea) - docs(mtp): record final implementation result (runnable, +6.5% at depth 1) (
b6dc626) - perf(mtp): depth-1 default beats AR (+6.5%); vectorized acceptance + instrumentation (
05e288f) - feat(mtp): P2 runnable in afm via --mtp — correct, perf WIP (
b628326) - feat(mtp): P2 — MTP self-speculative generator, output identical to greedy AR (
11da477) - feat(mtp): P1 — GatedDeltaNet cache rollback, bit-exact (the make-or-break gate) (
4474222) - chore(mtp): point P0 test + capture at the cache-root model location (
9eeaf77) - feat(mtp): P0 — Swift Qwen3_5MTPHead, bit-exact vs Python reference (
d2ab0af) - feat(mtp): P0 reference-capture harness for the Swift MTP head port (
ad86bfe) - docs(mtp): phased afm/Swift port plan for MTP self-speculative decoding (
b7042b8) - bench(ollama): use qwen3.6:27b-mlx (MLX tag), not the failing GGUF default (
999ccb4) - bench(qwen36): rerun full 7-engine cross-engine suite + refresh plots/results (
e72db8e) - fix(metallib): include random (RNG) kernel — was crashing sampled generation (
401484c) - wip(bench): SDPA backport report/plot + metallib RNG-guard fix; harness fixes (
9ef222f) - perf(mlx/sdpa): backport 0.31.3 adaptive-block 2-pass SDPA — decode@16k ~+10% (
7d180f8) - docs(deps): reconfirm mlx-swift 0.30.3 pin — 0.31.3 still has long-context SDPA regression (
ddc2c97) - perf(stream): eager think-tag emission — cut reasoning TTFT ~610ms -> ~346ms (
f1343a6) - perf(mlx): prewarm Metal kernels on server startup (faster cold first-token) (
33c247d) - test: Qwen3.6-27B local-engine performance benchmark (afm vs 6 engines) (
259b5f0) - fix(swift6): box command+error across the CFRunLoop Task (Swift 6.3.2) (
e569a75) - build: migrate to Swift 6 language mode (#130) (
1a6ffc1) - feat(build): add --install flag + verify binary paths (
7144459) - docs: prominent one-command build-from-source section in README (
ac3d303) - build: add root-level build.sh entry point for clone-and-build (
781cfb8) - fix(toolcall): prevent server crash on scalar JSON arg value (#128) (#129) (
c55d54c) - test: post-merge validation reports for ab90ac1 (proof for #127) (
cfb2676) - feat(agent): T1.4-T1.7 — cancel + tokenize + OpenAPI (rerun, stacked-merge correction) (#126) (
ab90ac1) - feat(metrics): vLLM-namespaced /metrics + Grafana dashboard (#122) (
a8dbffa) - feat(agent): Tier-0 promotion + Tier-1 quick wins (request id, stream usage, parallel_tool_calls) (#123) (
ed68189) - docs(claude): require proof before labeling test failures pre-existing (
46e2698) - Bump version to 0.9.13 for next dev cycle (
1cbd60f) - README: move Install section above the fold (
86429a8) - Release v0.9.12: promote nightly to stable (
7dfc4f6)
Install / Upgrade
Homebrew
brew tap scouzi1966/afm
brew install scouzi1966/afm/afm-next # fresh install
brew upgrade afm-next # upgrade existing
brew reinstall afm-next # force reinstall (same version, new build)pip
pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-nextSwitching between stable and nightly
# Homebrew
brew unlink afm && brew install scouzi1966/afm/afm-next # switch to nightly
brew unlink afm-next && brew link afm # switch back to stable
# pip
pip install macafm # stable
pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-next # nightly