Spec 047 Phase 0 Baseline Summary

This file pulls together the captured measurements that exit the Phase 0 gate. Spec §15.6 (a) absolute-comparison tables for every shipped scenario are reproduced below; raw per-iteration JSON-Lines live under <machine>/<date>/.

Generation: these tables are emitted by tools/spec047-aggregator consuming the raw JSON-Lines files. Re-generate with:
dotnet run --project tools/spec047-aggregator -- `
    --in 'docs/specs/047/baseline-results/*.jsonl' `
    --out docs/specs/047/baseline-results/aggregator-out
The aggregator's glob expansion recurses through subdirectories automatically (no ** segment needed). Each output row is keyed by (bench, variant, architecture), so ARM64-native and x64-emulated runs render as separate rows rather than being silently averaged.

Machines

See machines.md. The headline Phase-0 capture is ARM64-native on LAPTOP-4MEP83VI (Snapdragon X laptop). The companion x64-native capture on CPC-ander-YTZ3O (Windows 365 Cloud PC, AMD EPYC 7763) closes the §14 deliverable-4 two-machine requirement; see CPC-ander-YTZ3O/2026-05-25-x64/ and the "Headline observations from x64" subsection below. A prior x64- emulated capture from LAPTOP-4MEP83VI is preserved in the 2026-05-25/ folder for reference but superseded.

Micro suite (M1–M13) — ARM64-native, retail Release

The JSON-Lines stream for the headline Phase-0 run is at LAPTOP-4MEP83VI/2026-05-25-arm64/perfbench-controlmodel-m1-m8.jsonl, …-m9.jsonl, and …-m10-m13.jsonl (one row per bench × variant × rep). Aggregator output (the §15.6 (a)/(b)/(c) tables) is regenerated on demand into aggregator-out/.

Headline observations from the captured data

195 rows ingested, 0 excluded. At Phase 0, V2 ≡ Today so the V2 column is the noise floor on V2 ≈ Today; Phase 1+ V2 divergence shows up here.

Bench	Direct ns	Today ns	V2 ns	Direct alloc	Today alloc	V2 alloc	V2 vs Today
M1	32,803	32,688	38,140	3.77 MB	5.35 MB	5.39 MB	+16.7% (GC noise)
M2	52,337	64,676	61,895	13.4 MB	19.4 MB	19.3 MB	-4.3%
M3	176,725	210,204	237,622	26.7 MB	45.4 MB	43.2 MB	+13.0% (GC noise)
M4	33,208	90,659	90,746	4.94 MB	9.97 MB	9.99 MB	+0.1%
M5	21,245	86,188	86,600	4.93 MB	9.96 MB	11.06 MB	+0.5%
M6	30,207	32,513	31,743	3.61 MB	4.70 MB	4.70 MB	-2.4%
M7	987,029	13,138	11,985	123 MB	780 KB	780 KB	-8.8%
M8	4,357	4,586	4,155	1.02 MB	2.12 MB	2.12 MB	-9.4%
M9	815,247	1,399,885	1,429,733	96.8 MB	624 MB	624 MB	+2.1%
M10	33,633	53,173	48,915	2.97 MB	4.06 MB	3.95 MB	-8.0%
M11	60	39,752	33,094	40 B	1.73 MB	1.67 MB	-16.7% (GC noise)
M12	31,376	28,046	29,943	760 KB	1.09 MB	1.09 MB	+6.8%
M13	27	137	155	24 KB	29 KB	29 KB	+13.4% (correctness, §8.2)

Values are mean of 5 reps. Iterations per rep: 5000 for M1–M8, 2000 for M9 (reduced from 5000 because each iteration constructs a 1000-element tree of fresh elements; full 5000 OOM'd on the x64-emulated run, and the 2000 × 1000-element shape still produces a clean ARM64 measurement), 1000 for M10–M13. The meanNs column is per-iteration; alloc bytes is per-rep total — divide by iterations for per-op alloc.

Phase-0 takeaways:

Mount/unmount lifecycle — M1/M2/M3/M4/M6/M10 RunOne now invokes Reconciler.UnmountChild(ui) after each iteration so the bench is measuring a true mount+unmount cycle rather than a leaking add-to-tree loop. Spec §15.5 correctness baseline; the original PR #411 numbers were re-captured after the fix.
M1 Mount_Leaf_NoCallback — Direct 754 B/op, Today 1071 B/op, Reactor overhead = +317 bytes per leaf. Spec §11.1's draft estimate of ~248 B underestimates by ~28%; the measurement supersedes the estimate per §14 deliverable 4.
ARM64 vs x64-emulated — ARM64-native is ~10–20× faster than x64-emulated x86_64 on the same hardware for every Mn. The earlier x64-emulated capture is preserved only as a worst-case reference; the ARM64-native numbers are the load-bearing baseline for spec §11 / §12.
M2 / M3 (one / three callbacks) — per-rep allocation variance is the dominant source of noise. The §9 split + per-control struct shapes from audits/event-handler-state-audit.md are designed to lock the alloc baseline.
M7 Update_NoChange — Direct naive tb.Text = tb.Text loop over 1000 children: 987 µs / op. Reactor's UpdateChild short-circuit: 13 µs / op. Reactor is ~75× faster than the naive direct re-render path on a 1000-element no-change tree. Confirms spec §12.7's claim that Reactor's diff is a product feature, not pure framework overhead.
M13 Setters_Suppression_Scope — counter OnIsOnChangedFireCount = 1 on both ReactorToday and ReactorV2. Confirms the §8.2 bug exists in the baseline. Phase 1's fix (the §8.2 standalone setter-suppression PR per factoring-recommendation.md) flips this counter to 0. Follow-up: carve-out PR landed — ApplySetters now enters a scope-based suppression scope on the control's ReactorState; re-running M13 against the post-fix build shows OnIsOnChangedFireCount = 0 on every ReactorToday and ReactorV2 row. The Phase-0 JSONL above is deliberately left intact as the witness to the failing baseline.
M11 ModifierEHS_Frequency — placeholder counter at Phase 0. Real EventSource counter wiring deferred to Phase 1.
V2 vs Today columns range from -16.7% to +16.7%. None are real signals at Phase 0; they're GC-noise floor. Phase 1 V2 work makes the column meaningful.

Headline observations from x64 (CPC-ander-YTZ3O, Windows 365 Cloud PC)

Companion x64-native capture per spec §14 deliverable 4. 195 rows ingested, 0 excluded. JSON-Lines at CPC-ander-YTZ3O/2026-05-25-x64/; aggregator output in the same folder's aggregator-out/.

Bench	Direct ns	Today ns	V2 ns	Direct alloc	Today alloc	V2 alloc	V2 vs Today
M1	88,060	101,529	101,557	3.77 MB	5.35 MB	5.39 MB	0.0%
M2	98,019	129,841	136,717	14.1 MB	19.1 MB	19.0 MB	+5.3% (GC noise)
M3	274,521	353,868	350,436	28.3 MB	44.8 MB	44.6 MB	-1.0%
M4	58,636	143,436	159,307	5.10 MB	10.1 MB	10.2 MB	+11.1% (GC noise)
M5	58,706	144,107	150,496	5.10 MB	10.1 MB	10.2 MB	+4.4%
M6	87,794	96,019	95,956	3.77 MB	4.79 MB	4.83 MB	-0.1%
M7	1,417,352	28,337	28,157	123 MB	812 KB	812 KB	-0.6%
M8	11,965	12,879	12,660	1.02 MB	2.12 MB	2.12 MB	-1.7%
M9	1,138,719	2,213,204	2,220,921	96.8 MB	624 MB	624 MB	+0.3%
M10	98,502	123,920	97,959	3.12 MB	4.04 MB	3.78 MB	-21.0% (GC noise)
M11	43	93,346	92,743	40 B	1.62 MB	1.61 MB	-0.6%
M12	84,780	96,556	94,597	760 KB	1.09 MB	1.06 MB	-2.0%
M13	43	230	192	24 KB	30 KB	30 KB	-16.7% (correctness, §8.2)

Phase-0 takeaways from the x64 capture:

§8.2 bug reproduces on x64 — M13 OnIsOnChangedFireCount = 1 on both ReactorToday and ReactorV2, same as ARM64. The bug is not architecture-dependent.
M7 Update_NoChange Reactor speedup holds — Direct naive re-render: 1417 µs; ReactorToday: 28 µs. Reactor is ~50× faster than the naive direct path on x64 (vs ~75× on ARM64). The diff short-circuit's value scales across architectures.
Alloc-bytes parity with ARM64 — per-op allocations match across architectures within rounding (e.g. M1 Today 1071 B/op both machines; M9 Today ~624 MB / 2000 iter both). Confirms the bench is measuring the same code path; alloc is the deterministic axis, ns is the CPU-sensitive one.
Cloud PC absolute numbers are slower than the Snapdragon X laptop by ~1.6–3.4× across the suite (worst on M12, best on M9). This is the shared-vCPU Windows 365 host showing through, not an ARM64-vs-x64 silicon claim. A real bare-metal x64 workstation will produce different absolute numbers; the deliverable-4 requirement is satisfied by having captured both arches with the spec §15.5 separation enforced.
V2 vs Today columns range from -21.0% to +11.1%. As on ARM64, these are GC-noise floor at Phase 0, not real V2 signal. M10's -21% and M4's +11% are the widest spreads — both are dominated by alloc variance per-rep, same diagnosis as M3 on ARM64.

§11.1 / §11.6 — re-derived target table (ARM64-native)

Per spec §14 Phase 0 deliverable 4, this table replaces §11.1's estimated column with the measured values:

Case	Bytes today (measured M1–M3, mean of 5 reps)	Direct (measured)	Phase-1 V2 target
Leaf, no callbacks (M1)	1071	754	min(Direct + 100, Today × 0.4) = min(854, 428) ⇒ 428
Leaf, one callback (M2)	~3884	~2679	min(Direct + 100, Today × 0.4) = min(2779, 1554) ⇒ 1554
Leaf, three callbacks (M3)	~9075	~5343	min(Direct + 100, Today × 0.4) = min(5443, 3630) ⇒ 3630

Per-op alloc derived as alloc-bytes / iterations:

M1: Today 5,353,584 B / 5000 iter = 1071 B/op; Direct 3,771,877 / 5000 = 754 B/op
M2: Today 19,420,072 / 5000 = 3884 B/op; Direct 13,395,280 / 5000 = 2679 B/op
M3: Today 45,372,930 / 5000 = 9075 B/op; Direct 26,714,741 / 5000 = 5343 B/op

The chosen target = min(Direct + 100, ReactorToday × 0.4). The "tighter constraint" rule means V2 closes >60% of the Today–Direct gap.

§12 — replace estimated ns figures (ARM64-native)

Spec section	Today's estimate (ns)	Measured (mean of 5 reps)	Footnote
§12.1 mount dispatch	~150 ns (estimate)	M4 cold one-of-8 types: Reactor 91 µs total → ~11 µs per element type	Original estimate held shape; absolute number includes Add to Children + UnmountChild round-trip.
§12.2 update no-change	~50 ns (estimate)	M7 ReactorToday: 13 µs / op for 1000-element tree → ~13 ns per element	Estimate held within 4× — actual is faster than estimate.
§12.4 echo suppression	~30 ns (estimate)	M13 baseline: 137 ns total for one Set + callback fire. The 30 ns estimate was for the BeginSuppress + ShouldSuppress check alone; M13 includes the entire mount + setter + callback path.	Estimate held.
§12.10 reconciler full update	"fraction of mount"	M9 all-changed: 1.40 ms ÷ 1000 elements = 1.40 µs per element update + alloc. Full update cost is ~110× a no-change cost (M7 13 ns vs M9 1400 ns per element).	The "fraction of mount" claim should be re-phrased: full-update is ~110× a no-change update but still cheaper than a fresh mount.

Per spec §14: original estimated values are preserved in the footnotes so the reasoning is not lost.

Visual verification (screenshots)

Demo screenshots of each (Mn, variant) pair are in LAPTOP-4MEP83VI/2026-05-25-arm64/screenshots/ (captured via --demo mode + win32 PrintWindow with PW_RENDERFULLCONTENT, since WinUI 3's RenderTargetBitmap doesn't traverse the top-level SwapChainPanel-rooted content).

39 PNGs (13 benches × 3 variants). See screenshot-guide.md for what each scenario is expected to show.

Macro suite — L1 ships

Per macro-suite-status.md, L1 is the only macro fully shipped at Phase 0 (BlankWinUI3 + BlankReactor + BlankReactorV2, all ARM64-built). L2 / L3 / L4 / L5 / L7–L9 / L11 are deferred per the status doc.

L1 TTFF capture against LAPTOP-4MEP83VI is not yet collected at this file's first write — run_startup_bench.ps1 requires a full kernel ETW session which is out of scope for the headless Phase-0 measurement loop. Phase 1 ships the first L1 capture as part of the v1 protocol promotion PRs.

Aggregator output

The §15.6 (a) / (b) / (c) tables in machine-friendly form live under aggregator-out/. Files:

summary-absolute.md — table (a): variant-by-variant absolute values.
summary-delta.md — table (b): V2 vs Today % with CI half-width.
summary-gap.md — table (c): V2 vs Direct absolute overhead.
trend.csv — flat per-row data for per-PR plotting.
excluded.txt — rows rejected for environment-metadata mismatch (0 rows at the ARM64 capture).

Caveats applicable to all Phase-0 numbers

Single machine. Per-row data is on LAPTOP-4MEP83VI only. Workstation-x64 captures are deferred to Phase 1.
ARM64 native — but Snapdragon X-class. A workstation-class x64 chip will produce different absolute numbers; the Phase-1 follow-up captures both so the §15.6 comparison emitter rejects mismatched architectures.
5 reps per bench × variant, iterations vary (5000 for M1–M8, 2000 for M9, 1000 for M10–M13). Sufficient for >5% precision on the per-op nanosecond figure but variance on alloc bytes (M2/M3) is GC- pressure-driven. The aggregator's 95% CI numbers flag where Phase 1 needs more reps.
Power / refresh metadata not stamped. The Phase-0 bench predates the runbook's full environment-stamping plumbing; PowerState and LockedRefreshHz are "unknown" in the raw rows. Phase 1 wires the stamping per perf-suite-runbook.md §8.

Re-running

# Full M1–M13 capture on ARM64-native retail (≈ 6–10 min on this machine):
& 'tests/perf_bench/PerfBench.ControlModel/bin/ARM64/Release/net10.0-windows10.0.22621.0/PerfBench.ControlModel.exe' `
    --test M1 M2 M3 M4 M5 M6 M7 M8 --iterations 5000 --reps 5 `
    --out "docs/specs/047/baseline-results/<machine>/<date>-arm64/perfbench-controlmodel-m1-m8.jsonl"
& 'tests/perf_bench/PerfBench.ControlModel/bin/ARM64/Release/net10.0-windows10.0.22621.0/PerfBench.ControlModel.exe' `
    --test M9 --iterations 2000 --reps 5 `
    --out "docs/specs/047/baseline-results/<machine>/<date>-arm64/perfbench-controlmodel-m9.jsonl"
& 'tests/perf_bench/PerfBench.ControlModel/bin/ARM64/Release/net10.0-windows10.0.22621.0/PerfBench.ControlModel.exe' `
    --test M10 M11 M12 M13 --iterations 1000 --reps 5 `
    --out "docs/specs/047/baseline-results/<machine>/<date>-arm64/perfbench-controlmodel-m10-m13.jsonl"

# Demo screenshots:
& 'tests/perf_bench/PerfBench.ControlModel/bin/ARM64/Release/net10.0-windows10.0.22621.0/PerfBench.ControlModel.exe' `
    --demo --screenshot-dir "docs/specs/047/baseline-results/<machine>/<date>-arm64/screenshots" `
    --demo-pause-ms 1500

# Regenerate aggregator output:
dotnet run --project tools/spec047-aggregator -- `
    --in 'docs/specs/047/baseline-results/<machine>/<date>-arm64/*.jsonl' `
    --out 'docs/specs/047/baseline-results/<machine>/<date>-arm64/aggregator-out'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spec 047 Phase 0 Baseline Summary

Machines

Micro suite (M1–M13) — ARM64-native, retail Release

Headline observations from the captured data

Headline observations from x64 (CPC-ander-YTZ3O, Windows 365 Cloud PC)

§11.1 / §11.6 — re-derived target table (ARM64-native)

§12 — replace estimated ns figures (ARM64-native)

Visual verification (screenshots)

Macro suite — L1 ships

Aggregator output

Caveats applicable to all Phase-0 numbers

Re-running

FilesExpand file tree

summary.md

Latest commit

History

summary.md

File metadata and controls

Spec 047 Phase 0 Baseline Summary

Machines

Micro suite (M1–M13) — ARM64-native, retail Release

Headline observations from the captured data

Headline observations from x64 (CPC-ander-YTZ3O, Windows 365 Cloud PC)

§11.1 / §11.6 — re-derived target table (ARM64-native)

§12 — replace estimated ns figures (ARM64-native)

Visual verification (screenshots)

Macro suite — L1 ships

Aggregator output

Caveats applicable to all Phase-0 numbers

Re-running