microsoft · codemonkeychris · May 30, 2026 · May 30, 2026 · May 30, 2026
@@ -1530,7 +1530,7 @@ ARM64 stable-AC re-capture on `LAPTOP-4MEP83VI` remains deferred for the §14 ra
 - **New regressions vs close-out:** M8 +21.8% (+2.9pp — Lazy*Stack base-derived registration's added is-check in the Update path), M12 +30.7% (+12.2pp — Cloud-PC volatile; M12 has trended ±15pp across the last three captures and should be confirmed on stable AC).
 - **Net headline:** no bench exceeds the §13 Q1 reopen threshold. The structural wins (dispatch consolidation, single `IItemsBinderStrategy` arm) are in place; the absolute Cloud-PC numbers track the close-out baseline.
 
-**ARM64 stable-AC ratification gate** — **still pending; first capture attempt was inconclusive.** An ARM64-native 3×5 capture on `LAPTOP-4MEP83VI` (the Phase 0/2 baseline machine) landed under [`docs/specs/047/phase3-results/LAPTOP-4MEP83VI/2026-05-28-phase3-completion-3x5-stableac/`](047/phase3-results/LAPTOP-4MEP83VI/2026-05-28-phase3-completion-3x5-stableac/README.md) but **does not ratify the gate**: the fixed variant-ordering run drifted under sustained load (suspected thermal throttling — `ReactorDescriptors` always runs last and so against the hottest core), inflating long-bench deltas (M2 +23.4%, M3 +175.3%, M12 +44.2% vs Today). A controlled **order-swap re-run** (Descriptors first/cold) proves the contamination: M2's Descriptors-vs-Today delta flips from +23.4% to −30.5% (a 54pp position swing), and Descriptors-vs-ReactorV2 collapses from +36.1% to +1.1% — i.e. no real M2 regression. The thermally-insensitive fast benches confirm descriptors ≈ hand-coded V1 (M1/M7/M8/M11/M13 within ±5% vs ReactorV2), and M1's order-robust +30% vs Today is the known V1-protocol-vs-legacy mount overhead, not descriptor-specific. **A thermally-clean ARM64 re-run** (randomized/interleaved variant order, cooldowns, and/or CPU-clock telemetry) is still required to close the gate; until then it remains pending with a named owner + date to be appended. See the capture README for the full drift evidence and reproduction steps.
+**ARM64 stable-AC ratification gate** — **still pending; first capture attempt was inconclusive.** An ARM64-native 3×5 capture on `LAPTOP-4MEP83VI` (the Phase 0/2 baseline machine) landed under [`docs/specs/047/phase3-results/LAPTOP-4MEP83VI/2026-05-28-phase3-completion-3x5-stableac/`](047/phase3-results/LAPTOP-4MEP83VI/2026-05-28-phase3-completion-3x5-stableac/README.md) but **does not ratify the gate**: the fixed variant-ordering run drifted under sustained load (suspected thermal throttling — `ReactorDescriptors` always runs last and so against the hottest core), inflating long-bench deltas (M2 +23.4%, M3 +175.3%, M12 +44.2% vs Today). A controlled **order-swap re-run** (Descriptors first/cold) proves the contamination: M2's Descriptors-vs-Today delta flips from +23.4% to −30.5% (a 54pp position swing), and Descriptors-vs-ReactorV2 collapses from +36.1% to +1.1% — i.e. no real M2 regression. The thermally-insensitive fast benches confirm descriptors ≈ hand-coded V1 (M1/M7/M8/M11/M13 within ±5% vs ReactorV2), and M1's order-robust +30% vs Today is the known V1-protocol-vs-legacy mount overhead, not descriptor-specific. **A thermally-clean ARM64 re-run** (randomized/interleaved variant order, cooldowns, and/or CPU-clock telemetry) is still required to close the gate; until then it remains pending with a named owner + date to be appended. See the capture README for the full drift evidence and reproduction steps. **Phase-4 update (PR #465):** a post-Phase-4 capture landed under [`docs/specs/047/phase4-results/LAPTOP-4MEP83VI/2026-05-29-arm64/`](047/phase4-results/LAPTOP-4MEP83VI/2026-05-29-arm64/RESULTS.md); it **still does not close the gate** (same gap — fixed ordering, no §15.5 isolation, so the timing axis is throttled and the macro suite is unrunnable post-Phase-4). Its value is the deterministic **allocation** axis: most benches held/improved vs the 2026-05-25 baseline (M9 −41%), but **M1 regressed +20%** (3.2× over its 407 B gate) and **M12 +17%** — so the M1 leaf-alloc work (KD-3 fold + bucketing-regression investigation) is now confirmed as required, ahead of the thermally-clean re-run.
 
 **Carry-forward known defects from Phase 1:**
 - **KD-3** — dispatch fast-path for the ported built-ins (M4 was +88.9% V1 vs Today at Phase 1; final advisory shows M4 −21.2% / M5 −24.3% at amortized scope — KD-3 has materially closed at the batch-11 registration set).
@@ -1541,7 +1541,14 @@ ARM64 stable-AC re-capture on `LAPTOP-4MEP83VI` remains deferred for the §14 ra
 **Status: code-complete — migration closed; V1 is the unconditional production
 path.** The only outstanding items are baseline-machine-only (ARM64
 `LAPTOP-4MEP83VI`): the stable-AC perf ratification and the §11.6 hard byte-gate
-*measurement/enforcement*. See the close-out tracker
+*measurement/enforcement*. An **indicative ARM64 capture has landed** (PR #465,
+`047/phase4-results/LAPTOP-4MEP83VI/2026-05-29-arm64/`): the deterministic
+**allocation** axis is measured — M2/M3 meet the §15.6 "≤ Today" budget, **M1
+regressed +20%** (and M1/M2 miss the absolute 407/1,520 B gates; M3 passes), plus
+an **M12 +17%** pool-reuse regression. The **timing** axis (no §15.5 isolation)
+and the **macro suite** (its projects were deleted in Phase 4) remain unratified,
+so the gate is **not yet closed** — it needs an isolated stable-AC re-capture and
+the M1/M12 alloc fix. See the close-out tracker
 [`tasks/047-extensible-control-model-phase4-implementation.md`](tasks/047-extensible-control-model-phase4-implementation.md).
 
 - ✅ Delete the private switch. *(Done §4.5 — dispatch is V1 registry →
@@ -1560,8 +1567,11 @@ path.** The only outstanding items are baseline-machine-only (ARM64
   for no-callback / one-callback / three-callback; the stale `≤100 / ≤320 / ≤500`
   estimates predate the Phase-0 baseline capture). *(Code-complete: the bucketed
   `Element` base (§11.7, `ElementExtras`) ships and the target constants are
-  landed (`PerformanceBudgets.cs`); the gate **measurement/enforcement** is
-  ARM64-baseline-blocked — §4.4/§4.9 handoff.)*
+  landed (`PerformanceBudgets.cs`); the gate has now been **MEASURED** on
+  `LAPTOP-4MEP83VI` ARM64 (PR #465): **M1 1,289 B (FAIL, 3.2×), M2 3,687 B
+  (FAIL, 2.4×), M3 8,530 B (PASS)** per-render. The gates do **not** pass for
+  M1/M2 — enforcement stays open pending the M1 leaf-alloc fix + an isolated
+  re-capture. §4.4/§4.9 handoff.)*
 - ✅ Document the final author-facing surface in `docs/guide/`. *(Done §4.8.)*
 
 ### Future: source generation (deferred, no committed timeline)

@@ -0,0 +1,108 @@
+# Spec-047 Post-Phase-4 Perf Capture — vs 2026-05-25 ARM64 baseline
+
+**Machine:** `LAPTOP-4MEP83VI` (Qualcomm ARMv8, the spec-047 §4.9 baseline box)
+**Arch/Runtime:** ARM64-native, Release, .NET 10.0.8 — identical to baseline
+**Date:** 2026-05-30 (UTC) · **Branch:** `main` (all of spec-047 incl. Phase 4 merged)
+**Suite:** micro M1–M13 (`PerfBench.ControlModel`), reps=5, iters matched to baseline
+(M1–M8 @5000, M9 @2000, M10–M13 @1000). 195 rows, 0 errors.
+
+> ⚠️ **Scope caveat — this is an INDICATIVE capture, not the formal §4.9 ratification.**
+> The §15.5 environment-isolation requirements (AC power, High-Performance plan, DRR
+> off, foreground non-occluded window) could **not** be enforced from this automated
+> run, and the harness does not yet implement the §4.9-required randomized/interleaved
+> variant ordering + CPU-clock telemetry. **Consequence: the timing (ns) numbers are
+> environment-contaminated and must be disregarded** for cross-baseline comparison —
+> the `Direct` variant (pure WinUI, *zero* Reactor code) is itself inflated +60–140%
+> vs the baseline run, which can only be thermal/power throttling. **The allocation
+> (bytes) numbers ARE valid**: managed allocation is deterministic and
+> environment-independent — confirmed by `Direct` alloc matching the baseline
+> byte-for-byte (M1 Direct = 3,771,824 B in both runs).
+
+---
+
+## Headline findings (allocation — the valid, deterministic axis)
+
+The macro suite (L1–L14: TTFF / working-set / FPS / GC) is **not runnable** — Phase 4
+deleted its projects (`StressPerf.ReactorV2`, `BlankReactorV2`). So only the §15.6
+micro budgets (per-element alloc M1–M3, dispatch M4–M6, update M7–M8) are covered here.
+
+### 1. §15.6 "M1–M3 per-element alloc must improve/equal Today" — **M1 FAILS**
+
+| Bench | Reactor (new) B/render | Today (base) B/render | Δ vs Today | Verdict |
+|---|---:|---:|---:|:--|
+| **M1** TextBlock, no callback | **1,289** | 1,071 | **+20.3%** | ❌ **regressed** |
+| M2 ToggleSwitch, 1 callback | 3,687 | 3,884 | −5.1% | ✅ improved |
+| M3 Button + 2 pointer mods | 8,530 | 9,075 | −6.0% | ✅ improved |
+
+### 2. Phase-4 refactor impact: current `Reactor` vs **baseline `ReactorV2`** (same V1 lineage)
+
+This isolates what the post-baseline Phase-4 work (`ElementExtras` bucketing §4.4,
+EHS split §4.3, echo hybrid §4.2) did to the V1 path's allocation:
+
+| Bench | new B/render | base-V2 B/render | Δ | Note |
+|---|---:|---:|---:|:--|
+| **M1** | **1,289** | 1,077 | **+19.6%** | ❌ leanest leaf got **heavier** |
+| M2 | 3,687 | 3,864 | −4.6% | ✅ |
+| M3 | 8,530 | 8,633 | −1.2% | ≈ flat |
+| M4 | 1,941 | 1,998 | −2.8% | ✅ |
+| M5 | 1,948 | 2,212 | −11.9% | ✅ |
+| M6 | 888 | 941 | −5.6% | ✅ |
+| M7 | 252 | 156 | +61.4% | tiny absolute (+96 B) |
+| M8 | 362 | 425 | −14.9% | ✅ |
+| **M9** | 184,431 | 312,246 | **−40.9%** | ✅ big win (keyed list) |
+| M10 | 3,411 | 3,949 | −13.6% | ✅ |
+| M11 | 1,641 | 1,670 | −1.7% | ✅ (per-element state) |
+| **M12** | 1,273 | 1,088 | **+17.0%** | ❌ pool-reuse regressed |
+| M13 | 29 | 29 | −0.4% | ≈ flat |
+
+The M1 regression is **deterministic, not noise**: every new rep (6.34–6.51 MB)
+sits uniformly above every baseline rep (5.25–5.42 MB) — a consistent ~+235 B/render.
+Likely sources to investigate: the added `Element.Extensions` slot on every element,
+the §4.3 EHS-split, or the `ReactorState.PendingEchoMatch` slot on the mount path.
+M12 (pool rent/return) similarly regressed +17%.
+
+### 3. §11.6 absolute byte-gate (`PerformanceBudgets.cs`) — **M1, M2 FAIL**
+
+| Bench | Target | Reactor (new) B/render | Pass? |
+|---|---:|---:|:---:|
+| M1 | ≤ 407 | 1,289 | ❌ (3.2×) |
+| M2 | ≤ 1,520 | 3,687 | ❌ (2.4×) |
+| M3 | ≤ 19,200 | 8,530 | ✅ |
+
+Note the gate targets were defined as `baseline × 0.4`, but the *measured* ARM64
+baselines were ~1,077 / 3,864 / 8,633 — so M1/M2 never had a realistic path to
+407/1,520 without the deferred KD-3 binder-check fold + further leaf-alloc work, and
+M3's 19,200 target was already cleared at baseline. **The byte gates as written are
+not met for M1/M2.** This directly confirms the spec's own KD-3 trigger condition
+("fold the M1 leading-`if` binder check … if M1 is still above budget after §4.3/§4.4")
+— M1 *is* over budget, so that follow-up is now warranted.
+
+---
+
+## Timing (ns) — captured but NOT comparable cross-baseline
+
+Disregard for ratification. Evidence of environment contamination (identical `Direct`
+code, new vs baseline ns): M3 +139%, M4 +130%, M5 +60%, M7 +940µs absolute swing.
+Within-run `Reactor`-vs-`Direct` overhead is directionally consistent with baseline
+(Reactor adds dispatch cost on M1–M6, wins big on M7/M9 via pooling) but the absolute
+numbers are throttled and should be re-captured under §15.5 isolation before any
+timing-budget sign-off.
+
+---
+
+## Bottom line for §4.9
+
+- ✅ **Build + capture reproducible on the actual ARM64 baseline box**; allocation is
+  deterministic and matches baseline `Direct` byte-for-byte.
+- ✅ **Most of the V1 path held or improved** vs the captured baseline on allocation
+  (M2/M3/M4/M5/M6/M8/M9/M10/M11), with a standout **−41% on M9** (keyed list).
+- ❌ **Two allocation regressions to fix before claiming the byte-gate pass:**
+  **M1 +20%** (and 3.2× over its 407 B gate) and **M12 +17%**.
+- ⛔ **Not a ratification sign-off:** timing axis is environment-throttled, the
+  §4.9-mandated randomized/interleaved ordering + CPU-clock telemetry isn't wired,
+  and the macro suite (L1–L14) can't run (projects deleted). A real §4.9 close needs
+  an isolated stable-AC re-capture (and the macro suite rebuilt against the single
+  `Reactor` variant).
+
+_Raw data: `perfbench-controlmodel-{m1-m8,m9,m10-m13}.jsonl` in this folder.
+Analysis: `analyze.py`. Baseline: `docs/specs/047/baseline-results/LAPTOP-4MEP83VI/2026-05-25-arm64/`._
@@ -0,0 +1,19 @@
+# Spec 047 §15.6 (a) — Absolute Comparison
+
+Mean ns per op + alloc bytes, per variant. Columns are dashes when a variant has < min-reps repetitions. Architecture column distinguishes ARM64-native from x64-emulated runs (spec §15.5 — non-comparable across architectures).
+
+| Bench | Arch | Direct ns | Today ns | Reactor ns | Direct alloc | Today alloc | Reactor alloc |
+|---|---|---:|---:|---:|---:|---:|---:|
+| M1 | Arm64 | 35795.2 | 43113.7 | 43820.5 | 3771877 | 6410214 | 6442971 |
+| M10 | Arm64 | 33549.5 | 44664.7 | 43450.0 | 2958312 | 3558654 | 3410949 |
+| M11 | Arm64 | 33.9 | 38605.7 | 34239.4 | 40 | 1714131 | 1641088 |
+| M12 | Arm64 | 25807.4 | 33245.4 | 34545.0 | 760114 | 1306086 | 1273350 |
+| M13 | Arm64 | 35.5 | 106.8 | 220.6 | 24040 | 29373 | 29320 |
+| M2 | Arm64 | 50476.4 | 112278.4 | 98428.2 | 13425966 | 18317597 | 18436637 |
+| M3 | Arm64 | 422189.5 | 378451.1 | 390740.2 | 28890936 | 41535106 | 42649842 |
+| M4 | Arm64 | 76426.5 | 144403.7 | 139003.5 | 4674357 | 10708440 | 9707059 |
+| M5 | Arm64 | 34110.0 | 157595.6 | 134839.9 | 4674357 | 10736699 | 9741629 |
+| M6 | Arm64 | 43099.6 | 59261.6 | 55663.5 | 3869357 | 4181165 | 4438270 |
+| M7 | Arm64 | 1737075.3 | 22028.2 | 21858.9 | 122599664 | 996032 | 1258197 |
+| M8 | Arm64 | 7617.1 | 9601.4 | 9258.0 | 915536 | 1807872 | 1807872 |
+| M9 | Arm64 | 911654.9 | 3102404.2 | 2483238.8 | 96669803 | 368877754 | 368861339 |
@@ -0,0 +1,19 @@
+# Spec 047 §15.6 (b) — Reactor Delta (Reactor vs Today)
+
+Positive % = Reactor slower / larger than Today. Negative = improvement. One row per (bench, architecture).
+
+| Bench | Arch | ns delta % | ns 95% CI half-width | alloc delta % |
+|---|---|---:|---:|---:|
+| M1 | Arm64 | +1.6% | ±5.1% | +0.5% |
+| M10 | Arm64 | -2.7% | ±4.0% | -4.2% |
+| M11 | Arm64 | -11.3% | ±4.4% | -4.3% |
+| M12 | Arm64 | +3.9% | ±8.5% | -2.5% |
+| M13 | Arm64 | +106.6% | ±220.6% | -0.2% |
+| M2 | Arm64 | -12.3% | ±6.1% | +0.6% |
+| M3 | Arm64 | +3.2% | ±23.9% | +2.7% |
+| M4 | Arm64 | -3.7% | ±8.8% | -9.4% |
+| M5 | Arm64 | -14.4% | ±16.3% | -9.3% |
+| M6 | Arm64 | -6.1% | ±9.2% | +6.1% |
+| M7 | Arm64 | -0.8% | ±4.8% | +26.3% |
+| M8 | Arm64 | -3.6% | ±4.1% | 0.0% |
+| M9 | Arm64 | -20.0% | ±23.3% | 0.0% |
@@ -0,0 +1,19 @@
+# Spec 047 §15.6 (c) — WinUI Gap (Reactor vs Direct)
+
+Absolute overhead Reactor still adds on top of raw WinUI. One row per (bench, architecture).
+
+| Bench | Arch | Reactor ns | Direct ns | Reactor - Direct ns | Reactor alloc - Direct alloc |
+|---|---|---:|---:|---:|---:|
+| M1 | Arm64 | 43820.5 | 35795.2 | +8025.4 | +2671094 |
+| M10 | Arm64 | 43450.0 | 33549.5 | +9900.5 | +452637 |
+| M11 | Arm64 | 34239.4 | 33.9 | +34205.5 | +1641048 |
+| M12 | Arm64 | 34545.0 | 25807.4 | +8737.6 | +513237 |
+| M13 | Arm64 | 220.6 | 35.5 | +185.1 | +5280 |
+| M2 | Arm64 | 98428.2 | 50476.4 | +47951.8 | +5010670 |
+| M3 | Arm64 | 390740.2 | 422189.5 | -31449.3 | +13758906 |
+| M4 | Arm64 | 139003.5 | 76426.5 | +62577.0 | +5032702 |
+| M5 | Arm64 | 134839.9 | 34110.0 | +100729.9 | +5067272 |
+| M6 | Arm64 | 55663.5 | 43099.6 | +12563.8 | +568914 |
+| M7 | Arm64 | 21858.9 | 1737075.3 | -1715216.4 | -121341467 |
+| M8 | Arm64 | 9258.0 | 7617.1 | +1640.8 | +892336 |
+| M9 | Arm64 | 2483238.8 | 911654.9 | +1571583.9 | +272191536 |