Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# Spec-047 Post-Phase-4 Perf Capture — vs 2026-05-25 ARM64 baseline

**Machine:** `LAPTOP-4MEP83VI` (Qualcomm ARMv8, the spec-047 §4.9 baseline box)
**Arch/Runtime:** ARM64-native, Release, .NET 10.0.8 — identical to baseline
**Date:** 2026-05-30 (UTC) · **Branch:** `main` (all of spec-047 incl. Phase 4 merged)
**Suite:** micro M1–M13 (`PerfBench.ControlModel`), reps=5, iters matched to baseline
(M1–M8 @5000, M9 @2000, M10–M13 @1000). 195 rows, 0 errors.

> ⚠️ **Scope caveat — this is an INDICATIVE capture, not the formal §4.9 ratification.**
> The §15.5 environment-isolation requirements (AC power, High-Performance plan, DRR
> off, foreground non-occluded window) could **not** be enforced from this automated
> run, and the harness does not yet implement the §4.9-required randomized/interleaved
> variant ordering + CPU-clock telemetry. **Consequence: the timing (ns) numbers are
> environment-contaminated and must be disregarded** for cross-baseline comparison —
> the `Direct` variant (pure WinUI, *zero* Reactor code) is itself inflated +60–140%
> vs the baseline run, which can only be thermal/power throttling. **The allocation
> (bytes) numbers ARE valid**: managed allocation is deterministic and
> environment-independent — confirmed by `Direct` alloc matching the baseline
> byte-for-byte (M1 Direct = 3,771,824 B in both runs).

---

## Headline findings (allocation — the valid, deterministic axis)

The macro suite (L1–L14: TTFF / working-set / FPS / GC) is **not runnable** — Phase 4
deleted its projects (`StressPerf.ReactorV2`, `BlankReactorV2`). So only the §15.6
micro budgets (per-element alloc M1–M3, dispatch M4–M6, update M7–M8) are covered here.

### 1. §15.6 "M1–M3 per-element alloc must improve/equal Today" — **M1 FAILS**

| Bench | Reactor (new) B/render | Today (base) B/render | Δ vs Today | Verdict |
|---|---:|---:|---:|:--|
| **M1** TextBlock, no callback | **1,289** | 1,071 | **+20.3%** | ❌ **regressed** |
| M2 ToggleSwitch, 1 callback | 3,687 | 3,884 | −5.1% | ✅ improved |
| M3 Button + 2 pointer mods | 8,530 | 9,075 | −6.0% | ✅ improved |

### 2. Phase-4 refactor impact: current `Reactor` vs **baseline `ReactorV2`** (same V1 lineage)

This isolates what the post-baseline Phase-4 work (`ElementExtras` bucketing §4.4,
EHS split §4.3, echo hybrid §4.2) did to the V1 path's allocation:

| Bench | new B/render | base-V2 B/render | Δ | Note |
|---|---:|---:|---:|:--|
| **M1** | **1,289** | 1,077 | **+19.6%** | ❌ leanest leaf got **heavier** |
| M2 | 3,687 | 3,864 | −4.6% | ✅ |
| M3 | 8,530 | 8,633 | −1.2% | ≈ flat |
| M4 | 1,941 | 1,998 | −2.8% | ✅ |
| M5 | 1,948 | 2,212 | −11.9% | ✅ |
| M6 | 888 | 941 | −5.6% | ✅ |
| M7 | 252 | 156 | +61.4% | tiny absolute (+96 B) |
| M8 | 362 | 425 | −14.9% | ✅ |
| **M9** | 184,431 | 312,246 | **−40.9%** | ✅ big win (keyed list) |
| M10 | 3,411 | 3,949 | −13.6% | ✅ |
| M11 | 1,641 | 1,670 | −1.7% | ✅ (per-element state) |
| **M12** | 1,273 | 1,088 | **+17.0%** | ❌ pool-reuse regressed |
| M13 | 29 | 29 | −0.4% | ≈ flat |

The M1 regression is **deterministic, not noise**: every new rep (6.34–6.51 MB)
sits uniformly above every baseline rep (5.25–5.42 MB) — a consistent ~+235 B/render.
Likely sources to investigate: the added `Element.Extensions` slot on every element,
the §4.3 EHS-split, or the `ReactorState.PendingEchoMatch` slot on the mount path.
M12 (pool rent/return) similarly regressed +17%.

### 3. §11.6 absolute byte-gate (`PerformanceBudgets.cs`) — **M1, M2 FAIL**

| Bench | Target | Reactor (new) B/render | Pass? |
|---|---:|---:|:---:|
| M1 | ≤ 407 | 1,289 | ❌ (3.2×) |
| M2 | ≤ 1,520 | 3,687 | ❌ (2.4×) |
| M3 | ≤ 19,200 | 8,530 | ✅ |

Note the gate targets were defined as `baseline × 0.4`, but the *measured* ARM64
baselines were ~1,077 / 3,864 / 8,633 — so M1/M2 never had a realistic path to
407/1,520 without the deferred KD-3 binder-check fold + further leaf-alloc work, and
M3's 19,200 target was already cleared at baseline. **The byte gates as written are
not met for M1/M2.** This directly confirms the spec's own KD-3 trigger condition
("fold the M1 leading-`if` binder check … if M1 is still above budget after §4.3/§4.4")
— M1 *is* over budget, so that follow-up is now warranted.

---

## Timing (ns) — captured but NOT comparable cross-baseline

Disregard for ratification. Evidence of environment contamination (identical `Direct`
code, new vs baseline ns): M3 +139%, M4 +130%, M5 +60%, M7 +940µs absolute swing.
Within-run `Reactor`-vs-`Direct` overhead is directionally consistent with baseline
(Reactor adds dispatch cost on M1–M6, wins big on M7/M9 via pooling) but the absolute
numbers are throttled and should be re-captured under §15.5 isolation before any
timing-budget sign-off.

---

## Bottom line for §4.9

- ✅ **Build + capture reproducible on the actual ARM64 baseline box**; allocation is
deterministic and matches baseline `Direct` byte-for-byte.
- ✅ **Most of the V1 path held or improved** vs the captured baseline on allocation
(M2/M3/M4/M5/M6/M8/M9/M10/M11), with a standout **−41% on M9** (keyed list).
- ❌ **Two allocation regressions to fix before claiming the byte-gate pass:**
**M1 +20%** (and 3.2× over its 407 B gate) and **M12 +17%**.
- ⛔ **Not a ratification sign-off:** timing axis is environment-throttled, the
§4.9-mandated randomized/interleaved ordering + CPU-clock telemetry isn't wired,
and the macro suite (L1–L14) can't run (projects deleted). A real §4.9 close needs
an isolated stable-AC re-capture (and the macro suite rebuilt against the single
`Reactor` variant).

_Raw data: `perfbench-controlmodel-{m1-m8,m9,m10-m13}.jsonl` in this folder.
Analysis: `analyze.py`. Baseline: `docs/specs/047/baseline-results/LAPTOP-4MEP83VI/2026-05-25-arm64/`._
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Spec 047 §15.6 (a) — Absolute Comparison

Mean ns per op + alloc bytes, per variant. Columns are dashes when a variant has < min-reps repetitions. Architecture column distinguishes ARM64-native from x64-emulated runs (spec §15.5 — non-comparable across architectures).

| Bench | Arch | Direct ns | Today ns | Reactor ns | Direct alloc | Today alloc | Reactor alloc |
|---|---|---:|---:|---:|---:|---:|---:|
| M1 | Arm64 | 35795.2 | 43113.7 | 43820.5 | 3771877 | 6410214 | 6442971 |
| M10 | Arm64 | 33549.5 | 44664.7 | 43450.0 | 2958312 | 3558654 | 3410949 |
| M11 | Arm64 | 33.9 | 38605.7 | 34239.4 | 40 | 1714131 | 1641088 |
| M12 | Arm64 | 25807.4 | 33245.4 | 34545.0 | 760114 | 1306086 | 1273350 |
| M13 | Arm64 | 35.5 | 106.8 | 220.6 | 24040 | 29373 | 29320 |
| M2 | Arm64 | 50476.4 | 112278.4 | 98428.2 | 13425966 | 18317597 | 18436637 |
| M3 | Arm64 | 422189.5 | 378451.1 | 390740.2 | 28890936 | 41535106 | 42649842 |
| M4 | Arm64 | 76426.5 | 144403.7 | 139003.5 | 4674357 | 10708440 | 9707059 |
| M5 | Arm64 | 34110.0 | 157595.6 | 134839.9 | 4674357 | 10736699 | 9741629 |
| M6 | Arm64 | 43099.6 | 59261.6 | 55663.5 | 3869357 | 4181165 | 4438270 |
| M7 | Arm64 | 1737075.3 | 22028.2 | 21858.9 | 122599664 | 996032 | 1258197 |
| M8 | Arm64 | 7617.1 | 9601.4 | 9258.0 | 915536 | 1807872 | 1807872 |
| M9 | Arm64 | 911654.9 | 3102404.2 | 2483238.8 | 96669803 | 368877754 | 368861339 |
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Spec 047 §15.6 (b) — Reactor Delta (Reactor vs Today)

Positive % = Reactor slower / larger than Today. Negative = improvement. One row per (bench, architecture).

| Bench | Arch | ns delta % | ns 95% CI half-width | alloc delta % |
|---|---|---:|---:|---:|
| M1 | Arm64 | +1.6% | ±5.1% | +0.5% |
| M10 | Arm64 | -2.7% | ±4.0% | -4.2% |
| M11 | Arm64 | -11.3% | ±4.4% | -4.3% |
| M12 | Arm64 | +3.9% | ±8.5% | -2.5% |
| M13 | Arm64 | +106.6% | ±220.6% | -0.2% |
| M2 | Arm64 | -12.3% | ±6.1% | +0.6% |
| M3 | Arm64 | +3.2% | ±23.9% | +2.7% |
| M4 | Arm64 | -3.7% | ±8.8% | -9.4% |
| M5 | Arm64 | -14.4% | ±16.3% | -9.3% |
| M6 | Arm64 | -6.1% | ±9.2% | +6.1% |
| M7 | Arm64 | -0.8% | ±4.8% | +26.3% |
| M8 | Arm64 | -3.6% | ±4.1% | 0.0% |
| M9 | Arm64 | -20.0% | ±23.3% | 0.0% |
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Spec 047 §15.6 (c) — WinUI Gap (Reactor vs Direct)

Absolute overhead Reactor still adds on top of raw WinUI. One row per (bench, architecture).

| Bench | Arch | Reactor ns | Direct ns | Reactor - Direct ns | Reactor alloc - Direct alloc |
|---|---|---:|---:|---:|---:|
| M1 | Arm64 | 43820.5 | 35795.2 | +8025.4 | +2671094 |
| M10 | Arm64 | 43450.0 | 33549.5 | +9900.5 | +452637 |
| M11 | Arm64 | 34239.4 | 33.9 | +34205.5 | +1641048 |
| M12 | Arm64 | 34545.0 | 25807.4 | +8737.6 | +513237 |
| M13 | Arm64 | 220.6 | 35.5 | +185.1 | +5280 |
| M2 | Arm64 | 98428.2 | 50476.4 | +47951.8 | +5010670 |
| M3 | Arm64 | 390740.2 | 422189.5 | -31449.3 | +13758906 |
| M4 | Arm64 | 139003.5 | 76426.5 | +62577.0 | +5032702 |
| M5 | Arm64 | 134839.9 | 34110.0 | +100729.9 | +5067272 |
| M6 | Arm64 | 55663.5 | 43099.6 | +12563.8 | +568914 |
| M7 | Arm64 | 21858.9 | 1737075.3 | -1715216.4 | -121341467 |
| M8 | Arm64 | 9258.0 | 7617.1 | +1640.8 | +892336 |
| M9 | Arm64 | 2483238.8 | 911654.9 | +1571583.9 | +272191536 |
Loading
Loading