Skip to content

afm-next (20260613 · 5aad36d)

Pre-release
Pre-release

Choose a tag to compare

@scouzi1966 scouzi1966 released this 13 Jun 11:18
· 24 commits to main since this release

Nightly build from main branch.

  • Commit: 5aad36d
  • Date: 20260613
  • Version: 0.9.13-next.5aad36d.20260613

This is an unstable development build. For the latest stable release, use brew install scouzi1966/afm/afm.

Changes since last build (4bd0ec62a2cc39e4d69463f5ff8f1c119a3759ed)

  • Merge speculative-decoding (MTP/EAGLE3 + streaming + benchmarks) into main (5aad36d)
  • docs: add "which flag for which model" decision section (4300baf)
  • merge afm-opt: PR #134 build/metallib + whitespace review fixes (0b3d120)
  • fix(build): address PR #134 review — metallib install guard, debug symlink, probe, whitespace (248a8b5)
  • fix(spec-stream): address PR #135 review — think injection, error propagation, cancellation (dea202c)
  • docs(bench): streaming spec-decode retest results (MTP + EAGLE3) (689cb83)
  • feat(spec): streaming support for MTP and EAGLE3 fast paths (e88011d)
  • docs: decode-optimization feature guide + release notes / social copy (e0935fd)
  • perf(eagle3)+bench: lossless bs=2 fast path; afm vs mlx-vlm verify-fidelity (f518a25)
  • bench(eagle3): afm vs mlx-vlm EAGLE3 head-to-head on dense Gemma4-31B (f535e21)
  • bench(qwen36): 2026-06-06 MLX engine re-run (latest) + MTP head-to-head (f5a080a)
  • feat(eagle3): P2/P3 — --eagle3 CLI + service routing, +22% decode on Gemma4-31B (81bd262)
  • feat(eagle3): P1 greedy speculative loop — output identical to greedy AR (7226736)
  • fix(gemma4): proportional RoPE on full-attention layers (was stock RoPE) (5a81b9b)
  • docs: document /v1/embeddings API and list embed/speech in help card (#131) (8f6a999)
  • feat(eagle3): P0 — Swift Gemma4Eagle3Drafter, bit-exact vs Python reference (e3f082c)
  • feat(eagle3): P0 reference-capture for the dense Gemma4-31B EAGLE3 port (4f8c9a4)
  • docs(eagle3): phased afm/Swift port plan for dense Gemma4-31B EAGLE3 (+25% validated) (80ea9b5)
  • bench(gemma4): dense 31B flips it — EAGLE3 +25% (vs MoE all-negative) (551dbb8)
  • bench(gemma4): spec-decode validation — all 3 methods SLOWER than AR on MoE (negative) (09720b0)
  • docs(mtp): record the +52% win; note --mtp-depth now vestigial (ade3ae2)
  • perf(mtp): rewrite loop after mlx-lm PR #990 — +52% decode vs AR (was +6%) (9d86fea)
  • docs(mtp): record final implementation result (runnable, +6.5% at depth 1) (b6dc626)
  • perf(mtp): depth-1 default beats AR (+6.5%); vectorized acceptance + instrumentation (05e288f)
  • feat(mtp): P2 runnable in afm via --mtp — correct, perf WIP (b628326)
  • feat(mtp): P2 — MTP self-speculative generator, output identical to greedy AR (11da477)
  • feat(mtp): P1 — GatedDeltaNet cache rollback, bit-exact (the make-or-break gate) (4474222)
  • chore(mtp): point P0 test + capture at the cache-root model location (9eeaf77)
  • feat(mtp): P0 — Swift Qwen3_5MTPHead, bit-exact vs Python reference (d2ab0af)
  • feat(mtp): P0 reference-capture harness for the Swift MTP head port (ad86bfe)
  • docs(mtp): phased afm/Swift port plan for MTP self-speculative decoding (b7042b8)
  • bench(ollama): use qwen3.6:27b-mlx (MLX tag), not the failing GGUF default (999ccb4)
  • bench(qwen36): rerun full 7-engine cross-engine suite + refresh plots/results (e72db8e)
  • fix(metallib): include random (RNG) kernel — was crashing sampled generation (401484c)
  • wip(bench): SDPA backport report/plot + metallib RNG-guard fix; harness fixes (9ef222f)
  • perf(mlx/sdpa): backport 0.31.3 adaptive-block 2-pass SDPA — decode@16k ~+10% (7d180f8)
  • docs(deps): reconfirm mlx-swift 0.30.3 pin — 0.31.3 still has long-context SDPA regression (ddc2c97)
  • perf(stream): eager think-tag emission — cut reasoning TTFT ~610ms -> ~346ms (f1343a6)
  • perf(mlx): prewarm Metal kernels on server startup (faster cold first-token) (33c247d)
  • test: Qwen3.6-27B local-engine performance benchmark (afm vs 6 engines) (259b5f0)
  • fix(swift6): box command+error across the CFRunLoop Task (Swift 6.3.2) (e569a75)
  • build: migrate to Swift 6 language mode (#130) (1a6ffc1)
  • feat(build): add --install flag + verify binary paths (7144459)
  • docs: prominent one-command build-from-source section in README (ac3d303)
  • build: add root-level build.sh entry point for clone-and-build (781cfb8)
  • fix(toolcall): prevent server crash on scalar JSON arg value (#128) (#129) (c55d54c)
  • test: post-merge validation reports for ab90ac1 (proof for #127) (cfb2676)
  • feat(agent): T1.4-T1.7 — cancel + tokenize + OpenAPI (rerun, stacked-merge correction) (#126) (ab90ac1)
  • feat(metrics): vLLM-namespaced /metrics + Grafana dashboard (#122) (a8dbffa)
  • feat(agent): Tier-0 promotion + Tier-1 quick wins (request id, stream usage, parallel_tool_calls) (#123) (ed68189)
  • docs(claude): require proof before labeling test failures pre-existing (46e2698)
  • Bump version to 0.9.13 for next dev cycle (1cbd60f)
  • README: move Install section above the fold (86429a8)
  • Release v0.9.12: promote nightly to stable (7dfc4f6)

Install / Upgrade

Homebrew

brew tap scouzi1966/afm
brew install scouzi1966/afm/afm-next    # fresh install
brew upgrade afm-next                    # upgrade existing
brew reinstall afm-next                  # force reinstall (same version, new build)

pip

pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-next

Switching between stable and nightly

# Homebrew
brew unlink afm && brew install scouzi1966/afm/afm-next   # switch to nightly
brew unlink afm-next && brew link afm                      # switch back to stable

# pip
pip install macafm          # stable
pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-next   # nightly