Machine: LAPTOP-4MEP83VI (Qualcomm ARMv8, the spec-047 §4.9 baseline box)
Arch/Runtime: ARM64-native, Release, .NET 10.0.8 — identical to baseline
Date: 2026-05-30 (UTC) · Branch: main (all of spec-047 incl. Phase 4 merged)
Suite: micro M1–M13 (PerfBench.ControlModel), reps=5, iters matched to baseline
(M1–M8 @5000, M9 @2000, M10–M13 @1000). 195 rows, 0 errors.
⚠️ Scope caveat — this is an INDICATIVE capture, not the formal §4.9 ratification. The §15.5 environment-isolation requirements (AC power, High-Performance plan, DRR off, foreground non-occluded window) could not be enforced from this automated run, and the harness does not yet implement the §4.9-required randomized/interleaved variant ordering + CPU-clock telemetry. Consequence: the timing (ns) numbers are environment-contaminated and must be disregarded for cross-baseline comparison — theDirectvariant (pure WinUI, zero Reactor code) is itself inflated +60–140% vs the baseline run, which can only be thermal/power throttling. The allocation (bytes) numbers ARE valid: managed allocation is deterministic and environment-independent — confirmed byDirectalloc matching the baseline byte-for-byte (M1 Direct = 3,771,824 B in both runs).
The macro suite (L1–L14: TTFF / working-set / FPS / GC) is not runnable — Phase 4
deleted its projects (StressPerf.ReactorV2, BlankReactorV2). So only the §15.6
micro budgets (per-element alloc M1–M3, dispatch M4–M6, update M7–M8) are covered here.
| Bench | Reactor (new) B/render | Today (base) B/render | Δ vs Today | Verdict |
|---|---|---|---|---|
| M1 TextBlock, no callback | 1,289 | 1,071 | +20.3% | ❌ regressed |
| M2 ToggleSwitch, 1 callback | 3,687 | 3,884 | −5.1% | ✅ improved |
| M3 Button + 2 pointer mods | 8,530 | 9,075 | −6.0% | ✅ improved |
This isolates what the post-baseline Phase-4 work (ElementExtras bucketing §4.4,
EHS split §4.3, echo hybrid §4.2) did to the V1 path's allocation:
| Bench | new B/render | base-V2 B/render | Δ | Note |
|---|---|---|---|---|
| M1 | 1,289 | 1,077 | +19.6% | ❌ leanest leaf got heavier |
| M2 | 3,687 | 3,864 | −4.6% | ✅ |
| M3 | 8,530 | 8,633 | −1.2% | ≈ flat |
| M4 | 1,941 | 1,998 | −2.8% | ✅ |
| M5 | 1,948 | 2,212 | −11.9% | ✅ |
| M6 | 888 | 941 | −5.6% | ✅ |
| M7 | 252 | 156 | +61.4% | tiny absolute (+96 B) |
| M8 | 362 | 425 | −14.9% | ✅ |
| M9 | 184,431 | 312,246 | −40.9% | ✅ big win (keyed list) |
| M10 | 3,411 | 3,949 | −13.6% | ✅ |
| M11 | 1,641 | 1,670 | −1.7% | ✅ (per-element state) |
| M12 | 1,273 | 1,088 | +17.0% | ❌ pool-reuse regressed |
| M13 | 29 | 29 | −0.4% | ≈ flat |
The M1 regression is deterministic, not noise: every new rep (6.34–6.51 MB)
sits uniformly above every baseline rep (5.25–5.42 MB) — a consistent ~+235 B/render.
Likely sources to investigate: the added Element.Extensions slot on every element,
the §4.3 EHS-split, or the ReactorState.PendingEchoMatch slot on the mount path.
M12 (pool rent/return) similarly regressed +17%.
| Bench | Target | Reactor (new) B/render | Pass? |
|---|---|---|---|
| M1 | ≤ 407 | 1,289 | ❌ (3.2×) |
| M2 | ≤ 1,520 | 3,687 | ❌ (2.4×) |
| M3 | ≤ 19,200 | 8,530 | ✅ |
Note the gate targets were defined as baseline × 0.4, but the measured ARM64
baselines were ~1,077 / 3,864 / 8,633 — so M1/M2 never had a realistic path to
407/1,520 without the deferred KD-3 binder-check fold + further leaf-alloc work, and
M3's 19,200 target was already cleared at baseline. The byte gates as written are
not met for M1/M2. This directly confirms the spec's own KD-3 trigger condition
("fold the M1 leading-if binder check … if M1 is still above budget after §4.3/§4.4")
— M1 is over budget, so that follow-up is now warranted.
Disregard for ratification. Evidence of environment contamination (identical Direct
code, new vs baseline ns): M3 +139%, M4 +130%, M5 +60%, M7 +940µs absolute swing.
Within-run Reactor-vs-Direct overhead is directionally consistent with baseline
(Reactor adds dispatch cost on M1–M6, wins big on M7/M9 via pooling) but the absolute
numbers are throttled and should be re-captured under §15.5 isolation before any
timing-budget sign-off.
- ✅ Build + capture reproducible on the actual ARM64 baseline box; allocation is
deterministic and matches baseline
Directbyte-for-byte. - ✅ Most of the V1 path held or improved vs the captured baseline on allocation (M2/M3/M4/M5/M6/M8/M9/M10/M11), with a standout −41% on M9 (keyed list).
- ❌ Two allocation regressions to fix before claiming the byte-gate pass: M1 +20% (and 3.2× over its 407 B gate) and M12 +17%.
- ⛔ Not a ratification sign-off: timing axis is environment-throttled, the
§4.9-mandated randomized/interleaved ordering + CPU-clock telemetry isn't wired,
and the macro suite (L1–L14) can't run (projects deleted). A real §4.9 close needs
an isolated stable-AC re-capture (and the macro suite rebuilt against the single
Reactorvariant).
Raw data: perfbench-controlmodel-{m1-m8,m9,m10-m13}.jsonl in this folder.
Analysis: analyze.py. Baseline: docs/specs/047/baseline-results/LAPTOP-4MEP83VI/2026-05-25-arm64/.