Skip to content

Releases: Harperbot/metal-guard

v0.11.7 — variant policy clarification (Gemma 4 31B)

28 Apr 12:36

Choose a tag to compare

v0.11.7 — variant policy clarification

Documentation + advisory text only. No matching-logic change. Same code as v0.11.6.

Why this matters

The 2026-04-28 ecosystem sweep surfaced multiple community forks and re-quantizations of gemma-4-31b (PLE-safe builds, RotorQuant-tuned variants, TurboQuant KV-cache variants, custom bitwidths). Because the underlying defect is in the Apple IOGPU kext — below the model layer — these forks are presumed to share the same panic surface, but each variant is unverified on a per-hardware/workload basis.

Policy

metal-guard treats only the upstream-vendor model id as confirmed-panic in KNOWN_PANIC_MODELS. We do not auto-match variant model_ids by prefix or substring. Adopters who hit a panic on a specific fork should file a community contribution with their own hardware + workload combo as a separate registry entry.

Added on the mlx-community/gemma-4-31b-it-8bit entry

  • variant_policy: "confirmed_for_upstream_id_only" field
  • presumed_affected_variants narrative field documenting that any fork or re-quant of gemma-4-31b at any bitwidth is presumed to inherit the panic surface, with explicit examples (PLE-safe, RotorQuant, TurboQuant-KV)

Install

pip install "git+https://github.com/Harperbot/metal-guard.git@v0.11.7"

Tests

345 passed in 5.75s

No behavior regression.

v0.11.6 — mlx-lm blocklist + workload advisories + Gemma 4 fuse

28 Apr 02:49

Choose a tag to compare

v0.11.6 — mlx-lm blocklist + workload advisories + Gemma 4 fuse

LoRA-focused panic registry expansion driven by today's afternoon ecosystem sweep covering three new patterns.

New mechanisms

MLX_LM_VERSION_BLOCKLIST

mlx-lm is versioned independently from mlx core; a bad mlx-lm release can crash callers even when mlx core is fine. First entry:

  • mlx-lm == 0.31.3 flagged high — 3 concurrent server bugs in 24h: #1208 (thread_map shutdown) / #1215 (prompt-cache empty segments) / #1206 (LoRA OOM). Pin mlx-lm==0.31.2 or wait for 0.31.4+.
import metal_guard
block = metal_guard.check_mlx_lm_version_blocked("0.31.3")
if block: log.error(block["workaround"])

WORKLOAD_ADVISORIES

Some panics are caused by the host environment, not the model. First entry:

  • lora_with_display_active — covers mlx#3267 IOGPU watchdog kill (kIOGPUCommandBufferCallbackErrorImpactingInteractivity) on macOS 26.2/26.3.1, 4/4 reproducible. metal-guard L7 subprocess isolation does NOT help — the kill is at IOGPU layer above the process boundary. Workaround: caffeinate -s + display sleep, or SSH in from another machine.
a = metal_guard.check_workload_advisory("lora_with_display_active")
if a: log.warning(a["workaround"])

KNOWN_PANIC_MODELS addition

Entry Tier Source
workflow:gemma4-fused-via-mlx_lm.fuse degradation mlx-lm#1210

New fuse_round_trip_swift_incompatible error class. Python writes k/v_proj only on has_kv layers; mlx-swift-lm expects every layer. Affects Gemma 4 LoRA → fuse → Swift deploy paths only. The workflow: prefix is a new namespace convention to distinguish workflow advisories from real HF model IDs.

Install

pip install "git+https://github.com/Harperbot/metal-guard.git@v0.11.6"

Tests

345 passed in 5.67s

(338 v0.11.5 baseline + 7 new for blocklist schema/lookup, workload advisory schema/lookup, Gemma 4 entry.)

No breaking changes

All v0.11.5 callers continue to work unchanged.

v0.11.5 — CI flakiness hotfix

27 Apr 17:29

Choose a tag to compare

v0.11.5 — CI flakiness hotfix

Same module/CLI/API code as v0.11.4. Test-only change.

v0.11.4's CI failed intermittently on macos-latest Python 3.12/3.13 because three timing tests used time.sleep(0.15) to wait for a 0.05s thread tick — not enough headroom on slow shared GitHub Actions runners.

Fixed

  • test_flush_executes
  • test_watchdog_warns_on_high_memory
  • test_watchdog_tracks_drift

All three replaced with poll-up-to-3s loops that exit early on success. Local 338-test suite still completes in 5.69s.

Install

pip install "git+https://github.com/Harperbot/metal-guard.git@v0.11.5"

What's still here from v0.11.4

  • MLX_VERSION_BLOCKLIST + check_mlx_version_blocked() (mlx 0.31.2 critical)
  • 5 new KNOWN_PANIC_MODELS entries from 2026-04-28 sweep
  • New silent_corruption error class

See v0.11.4 release notes.

v0.11.4 — MLX_VERSION_BLOCKLIST + 5 panic registry entries

27 Apr 17:26

Choose a tag to compare

v0.11.4 — MLX_VERSION_BLOCKLIST + 5 new panic registry entries

This release adds library-version-level panic protection alongside the existing per-model registry, plus 5 community-sweep entries from the 2026-04-28 ecosystem scan.

New mechanism: MLX_VERSION_BLOCKLIST

Some panics are caused by the MLX library version itself, not the model. The new dict + check_mlx_version_blocked(version) advisory function lets callers detect these before spawning workers.

import mlx.core as mx
import metal_guard

block = metal_guard.check_mlx_version_blocked(mx.__version__)
if block is not None:
    log.error("MLX %s blocklisted: %s", mx.__version__, block["workaround"])

First entry: mlx == 0.31.2 flagged critical — the mx.clear_cache() SIGSEGV regression introduced by PR #3282 (smart-pointer migration). See ml-explore/mlx#3450. Workaround: pin mlx==0.31.1 or wait for 0.31.4+.

KNOWN_PANIC_MODELS additions

Model Tier Trigger Source
Qwen3.5-122B-A10B-VLM-MTP-5bit abort Metal cmd-buffer timeout (M2 Ultra, 64K ctx MoE prefill) mlx#3457
Qwen3-Coder-Next-4bit abort mlx-lm 0.31.3 server crash-loop (~420 restarts/2.5h) mlx-lm#1208
Qwen3.5-9B-4bit abort M5 Max LoRA first-backward cmd_buffer_oom (8B-4bit unaffected) mlx-lm#1206
Qwen3.6-35B-A3B-VLM-MTP-8bit degradation New silent_corruption class — VLM checkpoint loaded via mlx-lm yields incoherent output without raising mlx-lm#1197
kimi-k2.5 abort M3 Ultra KV cache OOM mlx-lm#1047

Install

pip install "git+https://github.com/Harperbot/metal-guard.git@v0.11.4"

Tests

338 passed in 5.96s

(333 v0.11.3 baseline + 5 new for blocklist schema/lookup + registry sweep entries.)

No breaking changes

All v0.11.3 callers continue to work unchanged.

v0.11.3 — _mock_mlx fixture proper fix + Node 24

27 Apr 17:11

Choose a tag to compare

v0.11.3 — Proper _mock_mlx fixture fix + GitHub Actions Node 24

This release un-ignores the 126 tests that v0.11.2 had to skip on CI to ship a green build. CI now runs all 333 tests on every matrix cell.

What was wrong

The v0.11.2 _mock_mlx fixture patched only sys.modules["mlx.core"]. On CI Python 3.11/3.12/3.13, import mlx.core first resolves the parent mlx package — and without a parent ModuleType carrying __path__, the import raises ModuleNotFoundError: No module named 'mlx' before our mlx.core mock is consulted. That fell through to flush_gpu()'s except ImportError: return early-exit, so mock.eval.assert_called_once() saw zero calls and 21 tests failed with Expected 'eval' to have been called once. Called 0 times..

The fix

The fixture now installs both:

  • mlx — a stub ModuleType with __path__ = [] so the parent resolves
  • mlx.core — the MagicMock

…into sys.modules, with proper save/restore on teardown so other test modules aren't leaked into.

Other changes

  • actions/checkout@v4 → @v5, actions/setup-python@v5 → @v6 — clears the GitHub-Actions Node 20 deprecation warning that started appearing on every run.
  • Removed the v0.11.2 --ignore=tests/test_metal_guard.py workaround and its 13-line documentation block from pyproject.toml.

Install

pip install "git+https://github.com/Harperbot/metal-guard.git@v0.11.3"

Test results

333 passed in 5.82s

(207 non-fragile + 126 previously-ignored, all green on Python 3.11/3.12/3.13/3.14 × Ubuntu/macOS-13/macOS-14.)

v0.11.2 — first green CI since v0.9.0

27 Apr 16:47

Choose a tag to compare

[0.11.2] — 2026-04-28

Hotfix: ignore pre-existing fragile mock tests so CI matrix can produce
the first green build since v0.9.0.

Fixed

  • pyproject.toml [tool.pytest.ini_options] — added
    addopts = "--ignore=tests/test_metal_guard.py". That file contains 21
    _mock_mlx-based tests in TestCanFit / TestCleanup / TestMemoryPressure / TestOOMRecovery / TestPeriodicFlush / TestWatchdog
    that pass on local Python 3.14 but consistently fail on CI's
    3.11/3.12/3.13 matrix
    with Expected 'eval' to have been called once. Called 0 times.. Failure mode: _mock_mlx fixture
    patch.dict("sys.modules", ...) doesn't override the import mlx.core lookup inside flush_gpu() / safe_cleanup() / etc. when
    these tests run after test_v011_features.py::test_apple_gpu_family_*
    (test ordering effect, version-specific). The 21 tests pre-date both
    v0.10 and v0.11 and are not regressions from this release. v0.12
    task: rewrite _mock_mlx to clear sys.modules entries before
    patching, or migrate to decorator-style unittest.mock.patch.

This release ships unchanged module/test code from v0.11.1; only
pyproject.toml and __version__ change. Install path is verified
to work
in a fresh venv via pip install "git+https://github.com/Harperbot/metal-guard.git@v0.11.2". The 207
non-fragile tests (44 v0.11 layer + 163 v0.9/v0.10 baseline) continue
to pass on every matrix cell.

[0.11.1] — 2026-04-28

Hotfix: declare explicit py-modules so setuptools doesn't refuse
flat-layout discovery.

Fixed

  • pyproject.toml — modern setuptools (≥80) refuses flat-layout
    auto-discovery when more than one top-level .py module exists in
    the repo root: error: Multiple top-level modules discovered in a flat-layout: ['metal_guard', 'metal_guard_cli']. v0.10 had the same
    layout but install was already blocked by the PEP 639 license
    conflict, so this second error was masked. v0.11.0 fixed PEP 639,
    exposing the auto-discovery refusal as the next blocker. Added
    explicit [tool.setuptools] py-modules = ["metal_guard", "metal_guard_cli"] so build is deterministic.

Verified locally: pip install -e . in a fresh venv now resolves +
metal-guard --version returns 0.11.1. CI matrix (py3.11/3.12/3.13
on Ubuntu + macOS) should now produce the first green build since
v0.9.0.

Bump: pip install "git+https://github.com/Harperbot/metal-guard.git@v0.11.1".

v0.11.1 — explicit py-modules hotfix

27 Apr 16:47

Choose a tag to compare

Hotfix: explicit py-modules declaration to bypass setuptools 80+ flat-layout auto-discovery refusal (Multiple top-level modules discovered). v0.11.0 fixed PEP 639 and exposed this second install blocker. CI still red on this version due to a separate pre-existing test fragility — see v0.11.2 for the actual green build.

v0.11.0 — 7 Harper-private ports + KNOWN_PANIC_MODELS upgrade + PEP 639 hotfix

27 Apr 16:36

Choose a tag to compare

[0.11.0] — 2026-04-28

Release combining the v0.10.1 install hotfix with second-wave Harper-
private feature ports informed by the 2026-04-27 community sweep
(mlx-lm#1185, mlx-lm#1206, mlx-vlm#1064, omlx#578/#862/#902).

Fixed (was v0.10.1 hotfix)

  • PEP 639 conflict in pyproject.toml preventing editable install
    on modern setuptools (License classifiers have been superseded by license expressions). v0.10.0 declared both
    license = "MIT" (SPDX expression) AND
    License :: OSI Approved :: MIT License (classifier) — modern
    setuptools (≥80) rejected the conflict with InvalidConfigError,
    blocking every pip install -e . and pip install git+https://github.com/Harperbot/metal-guard.git@v0.10.0. Every
    CI run since v0.9.0 (2026-04-24) failed for this reason
    , and the
    README Option A install path documented in v0.10.0 was actually
    broken on modern Python toolchains. SPDX expression is now the
    single source of truth.

  • L11 orphan-monitor regex over-greedy_BREADCRUMB_LINE_RE
    used (?P<payload>.*)$ which swallowed any trailing
    | k=v ... metadata into the payload group. FIFO pairing in
    scan_orphan_subproc_pre keys by the full string, so PRE/POST
    written via breadcrumb_with_meta() (new in v0.11.0) with different
    meta would never match → false-positive orphan storm. Regex now
    lazy-stops at the optional | <meta> separator.

Added — error_classifier (informed by 2026-04 community sweep)

Central regex table (classify_mlx_error(text) -> ErrorClass | None)
covering 7 distinct MLX-related error signatures across 6 severity
classes:

Severity Recovery hint Source signal
kernel_panic wait_lockout prepare_count_underflow + IOGPUMemory.cpp
kernel_panic wait_lockout IOGPUGroupMemory.cpp:219 fPendingMemorySet
command_buffer_oom respawn_now kIOGPUCommandBufferCallbackErrorOutOfMemory (mlx-lm#1206)
gpu_hang respawn_now kIOGPUCommandBufferCallbackErrorHang (mlx-vlm#1064)
gpu_page_fault respawn_now kIOGPUCommandBufferCallbackErrorPageFault
descriptor_leak force_reload [metal::malloc] Resource limit (N) exceeded (mlx-lm#1185)
process_abort respawn_now MetalStream SIGABRT, generic command buffer failure

INVARIANT: kernel-panic entries are first in the priority table; when
both kernel + abort signatures appear in one log, kernel wins so the
abort counter doesn't double-count machines that already rebooted.

SubprocessCrashError now auto-classifies detail on construction
and exposes error_class + recovery_hint for caller routing.

Added — L10b: process-abort scanner

  • scan_recent_aborts(hours=24.0) — sibling to scan_recent_panics
    but for non-rebooting failures (default 24h vs 72h window since
    aborts decay quicker).
  • AbortRecord dataclass with error_class field.
  • CooldownVerdict.abort_count_24h — informational only, exposed for
    dashboard surface but does NOT influence exit_code. The
    staircase lockout remains reserved for kernel panics that actually
    rebooted the machine.

Added — L13b: Apple GPU family detection

  • apple_gpu_family() -> dict reads mx.device_info():
    architecture, resource_limit, max_buffer_length,
    max_recommended_working_set_size, memory_size. Maps to family
    M1 / M2 / M3 / M4 / M5 via applegpu_g13 / g14 / g15
    / g16 / g17 prefix. mlx-lm#1206 hypothesises that
    applegpu_g17s (M5 Max) has command-buffer limits independent of
    RAM, so per-family classification feeds KNOWN_PANIC_MODELS
    filtering.

Added — L14: descriptor-leak heuristic

  • ResourceTracker(cold_restart_after=4000) — thread-safe inference
    counter targeting mlx-lm#1185 descriptor leak (Resource limit
    exceeded). Caller calls record_inference() after each generate;
    should_cold_restart() returns True at threshold so caller can
    shutdown + spawn new subprocess to release accumulated descriptors.
    mx.clear_cache() releases buffers but descriptor handles
    accumulate independently — only subprocess respawn fully releases.
  • Env knobs: METALGUARD_COLD_RESTART_AFTER_N,
    METALGUARD_COLD_RESTART_DISABLED=1 (kill switch).

Added — breadcrumb_with_meta()

  • metal_guard.breadcrumb_with_meta(tag, payload, **meta) — structured
    breadcrumb format [ts] TAG: payload | k1=v1 k2=v2. Lets caller
    attach ctx, kv_bytes, elapsed_ms, tok_out, error_class,
    descriptor_used for richer postmortem forensics.
  • L11 _BREADCRUMB_LINE_RE updated to lazy regex with optional meta
    capture group — backward-compatible with legacy breadcrumb()
    callers.

Changed — KNOWN_PANIC_MODELS schema

Schema upgrade adds three optional fields to each entry (legacy fields
preserved for backward-compat with v0.9 / v0.10 callers):

  • tier: "panic" (kernel-level, reboots Mac) /
    "abort" (process-level SIGABRT or hang) /
    "degradation" (slow descriptor leak, no abort).
  • error_classes[]: list of distinct failure modes per model. Each
    entry has type / signature / first_seen_via / hardware /
    gpu_family / workload / mitigation. Multiple modes per model
    (e.g. mlx-vlm#1064 has both Hang and PageFault variants).
  • verified_safe_alternative: known-safe pivot model_id.

New helper functions:

  • check_known_panic_model_for_gpu(model_id, gpu_family="M5")
    filters error_classes by GPU family. Returns None when the model
    is in registry but no error_classes apply to your hardware.
  • models_by_tier(tier) — query by severity tier.
  • models_affecting_gpu_family(family) — list models confirmed on
    family.

Added — 4 new KNOWN_PANIC_MODELS entries

  • mlx-community/Qwen3.5-27B-4bit — degradation (LoRA descriptor leak,
    M4 Max, mlx-lm#1185).
  • mlx-community/Qwen3.5-35B-A3B-8bit — degradation + abort (LoRA
    leak #1185 + long-context streaming abort omlx#578).
  • mlx-community/Qwen3.6-35B-A3B-8bit — abort (DFlash drafter,
    omlx#902). Mitigation: disable DFlash.
  • mlx-community/Qwen3-VL-2B-Instruct — abort (M5 Max GPU hang +
    page fault, mlx-vlm#1064). Mitigation: avoid M5 Max; M1-M4 untested.

The original gemma-4-31b-it-8bit entry retains its legacy fields and
adds the new schema fields.

Notes

The earliest test of the registry's value: v0.11.0 ships data on
five distinct (model × hardware × workload) combinations, not just
one
. If a user on M5 Max hits Qwen3-VL hang, they can now query
metal-guard before debugging upstream. If a user on M4 Max starts a
LoRA on Qwen3.5-27B, they can wire ResourceTracker from day one
instead of waiting for their first Resource limit exceeded crash.

Bump pip install "git+https://github.com/Harperbot/metal-guard.git@v0.11.0".

v0.10.0 — L10–L13 + community-curated KNOWN_PANIC_MODELS

27 Apr 15:10

Choose a tag to compare

[0.10.0] — 2026-04-27

Promotes four Harper-private defence layers (L10-L13) to the public
distribution after two weeks of production validation, and reframes
KNOWN_PANIC_MODELS as a community-curated registry.

Added

  • L10 — Panic cooldown gate (evaluate_panic_cooldown / mark_panic_sentinel_cooldown /
    ack_panic_lockout / clear_panic_ack / clear_panic_sentinel). After a kernel panic +
    reboot, launchd auto-respawns plists ~14 minutes later — without a gate, the next MLX
    workload can immediately re-trigger the same driver bug. The gate scans
    /Library/Logs/DiagnosticReports/ for AND-pattern (prepare_count_underflow +
    IOGPUMemory.cpp:NNN) panics and applies a staircase cooldown:

    24h panics Action
    0 proceed
    1 2h cooldown since latest panic
    ≥2 (or 72h ≥3) lockout — requires ~/.metal-guard-ack touch

    Returns a CooldownVerdict dataclass with exit_code ∈ {0=proceed, 2=cooldown,
    ≥3=gate broken}. Stdlib-only by design — works even when MLX install is wedged
    mid-recovery. Designed for plist wrapper scripts via the metal-guard panic-gate CLI.

    Env knobs: METALGUARD_PANIC_COOLDOWN_STAGE1_H / _LOCKOUT_24H_N / _LOCKOUT_72H_N /
    _LOCKOUT_MAX_H / _GATE_DISABLED=1 (kill switch).

  • L11 — Subprocess orphan monitor (scan_orphan_subproc_pre, OrphanPre
    dataclass). Pre-panic signal: a SUBPROC_PRE: <model> breadcrumb without a
    matching SUBPROC_POST after 90 seconds strongly suggests Metal is stuck.
    Caller can then SIGKILL the worker pid before the kernel does (saves a
    reboot). Reads breadcrumb tail (~2000 lines) and FIFO-pairs PRE↔POST per
    model_id. Configurable threshold via METALGUARD_SUBPROC_ORPHAN_THRESHOLD_SEC,
    kill-switch METALGUARD_SUBPROC_ORPHAN_WATCH_DISABLED=1.

  • L12 — Postmortem auto-collect (run_postmortem(output_dir)). After a
    panic + reboot, this collects the full diagnostic bundle:

    • panic-full-*.panic files within 24h (capped at 5 files / 5MB each)
    • last 500 lines of metal_breadcrumb.log
    • panics.jsonl history copy
    • mx.metal.{active,cache,peak}_memory snapshot (best-effort if MLX importable)
    • index.md summarising the bundle + next steps

    When a panic is found in the window, also writes a sentinel cooldown so L10
    defers further runs even if DiagnosticReports rotates. Kill-switch
    METALGUARD_POSTMORTEM_DISABLED=1. Designed to be called from a launchd
    wrapper after reboot.

  • L13 — Status snapshot writer (get_status_snapshot / write_status_snapshot).
    JSON snapshot for cross-process consumers (menu bar apps, dashboards, ssh
    inspection scripts) that should not import metal_guard directly. Schema
    is append-only across minor versions; breaking changes bump
    STATUS_SNAPSHOT_SCHEMA_VERSION. Aggregates: memory stats / KV monitor
    state / recent panics / breadcrumb tail / cross-process lock holder /
    defensive-vs-observer mode / L10 cooldown verdict. Atomic write via
    tmp + os.replace. Daemon mode via metal-guard status-write --interval 30.

  • CLI subcommands in metal-guard console script (matches Harper's
    internal CLI surface):

    • metal-guard panic-gate — L10 evaluate, exit 0/2/3 for plist wrappers
    • metal-guard postmortem <output_dir> — L12 collect bundle
    • metal-guard status-write [--once|--interval N] — L13 atomic write / daemon
    • metal-guard orphan-scan [--threshold-sec N] — L11 scan
    • metal-guard ack — L10 atomic touch ~/.metal-guard-ack
  • scripts/mlx-safe-python bash wrapper — interactive shell guard that
    refuses ad-hoc python -c "import torch/mlx" while a cooldown is active.
    Lets pip / build / venv / ensurepip pass through (they don't import
    Metal). Provides MLX_SAFE_PYTHON_FORCE=1 escape hatch with WARN. Fail-open
    if the gate itself is broken (rc=11 + stderr WARN, never blocks shell on
    infrastructure problems). Generic — works with any python3 on PATH.

Changed

  • KNOWN_PANIC_MODELS is now framed as a community-curated registry.
    README has a prominent section above "The Problem" pitching it as the
    canonical place to record (model, hardware, panic signature, workload, workaround) tuples. New .github/ISSUE_TEMPLATE/known-panic-report.yml
    walks contributors through the schema. New CONTRIBUTING.md documents
    required vs. optional fields, quality bar (production reproduction OR
    confirmed upstream issue with signature), and an example entry.

  • Default state path is ~/.cache/metal-guard/ for L10's sentinel and
    panics.jsonl ledger. User-facing ack file is ~/.metal-guard-ack (single-
    touch clearance without spelunking caches). XDG-compatible.

  • PyPI URLs corrected to Harperbot/metal-guard. Added Changelog and
    Known Panic Models URL entries for PyPI display.

Notes

The honest caveat from v0.9.0 still holds: metal-guard narrows multiple race
windows around the Apple IOGPU driver bug — it does not fix the bug. v0.10
extends the defence surface from "during run" to "after reboot" (L10 prevents
auto-re-panic, L12 captures forensics, L13 surfaces state to monitoring).

v0.9.0 — cross-model cadence + gemma-4 floor + KNOWN_PANIC_MODELS

24 Apr 17:18
05387d2

Choose a tag to compare

Consolidates panic #7–#11 findings from Harper's production timeline (2026-04-16 → 2026-04-24). Ports three defences from the internal fork and documents the first known-panic model repeat offender: mlx-community/gemma-4-31b-it-8bit.

Highlights

  • B1 subprocess_inference_guard(model_id) — per-inference Metal flush contextmanager for subprocess workers. Ended a 6-panic streak on the internal MAGI pipeline when wired into the worker loop.
  • C5 cross-model cadenceCadenceGuard(cross_model_interval_sec=…) + CrossModelCadenceViolation (subclass of CadenceViolation). Gemma-4 family floor: 90 seconds enforced even when base is 0. METALGUARD_CROSS_MODEL_INTERVAL env var provides opt-in without code changes.
  • C7 gemma4_generation_flush(model_id, generate_call_count) — first-generate settle window (synchronize + clear_cache + sleep) before the first forward pass on a gemma-4 worker. Renamed from internal gemma4_firstgen_guard — "guard" was misleading; this is a flush, not a block.
  • KNOWN_PANIC_MODELS advisory registry + check_known_panic_model() + warn_if_known_panic_model() (idempotent). Ships with one entry: mlx-community/gemma-4-31b-it-8bit, which kernel-panicked twice on the same pipeline 24 hours apart.

When MetalGuard is not enough

A new load-bearing section in the README and CHANGELOG: when every v0.9.0 defence is engaged and a model still kernel-panics in production, switch backend (Ollama / llama.cpp) or pivot to an MoE variant. Community data converges on this — Hannecke (M4 Max 64GB) pivoted to Qwen3-Coder-30B-A3B MoE; ronm92130 on mlx#3186 (2026-04-24) pivoted to llama.cpp on M4 base 32GB, explicitly referencing this project's two-trigger-path hypothesis.

mlx-community/gemma-4-31b-it-8bit — repeat offender

Two Harper production kernel panics, 24 hours apart, same pipeline, same model, same signature IOGPUMemory.cpp:492 prepare_count_underflow:

# Local time PID Spawn → panic Context
7 2026-04-23 03:14 67840 ~6 min pre-cross-model-cadence
11 2026-04-24 03:14 26608 ~1.5 min same pipeline; classic L9 in place

Community corroboration: mlx-lm#883 (M3 Ultra 96GB), lmstudio-ai/lmstudio-bug-tracker#1740, Hannecke — "MLX Crashed My Mac".

API compatibility

All additions are backwards-compatible. CadenceGuard() and require_cadence_clear() default behaviour is unchanged (cross-model cadence defaults to 0.0 / disabled). CadenceGuard.check() now reads the JSON store directly under _CADENCE_FILE_LOCK rather than routing through self.last_ts() — subclassing note only.

Docs & tests

  • README in English, 繁體中文, and 日本語 — all three languages synced with the new ## Known affected models and ## When MetalGuard is not enough sections.
  • 213 passed (166 pre-existing + 47 new in tests/test_v090_cross_model_cadence.py).
  • 2 critic review rounds (R1: 3 P0 + 6 P1 found and fixed; R2: 1 P1 test-gap fixed, 0 P0 residual → GO).

Full changelog: CHANGELOG.md