Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
# Spec 047 §14 Phase 3 (3.0.3) — TextBox descriptor proof, x64 advisory

**This is an advisory x64 capture, NOT authoritative.** The Phase 2 Q1
verdict was ratified on `LAPTOP-4MEP83VI` (ARM64, stable AC, dedicated
hardware). This capture was run on a Cloud PC (`CPC-ander-YTZ3O`, AMD
EPYC 7763, x64) and inherits Cloud PC noise characteristics — co-tenant
load, virtualized scheduling, no AC/foreground control. **Do not cite
these numbers in §13 or §14 spec text.** Use them as a directional read
on whether the Phase 3 prereq 3.0.2 (TextBox descriptor with the new
`HandCodedControlled` + `HandCodedEvent` builders) regresses the bench
matrix. A real ARM64 stable-AC re-capture on `LAPTOP-4MEP83VI` should
land before §14 Phase 3 is closed.

## Why this advisory capture exists

Phase 3 prereq 3.0.2 ships the first descriptor port that uses
`HandCodedControlled` / `HandCodedEvent` — the escape-hatch builders
needed for multi-event controls. TextBox is the proof point (2 events:
`TextChanged` round-tripping `Text`, plus fire-only `SelectionChanged`).
The `DescriptorVariantFactory` now registers TextBoxDescriptor alongside
the Q1 head-to-head trio (ToggleSwitch / Slider / Border).

Phase 2 §13 Q1 pre-committed to the bench matrix as the validation gate
for "does the descriptor model add bounded tax." With a new descriptor
shape entering the matrix, the gate must re-run before bulk Phase 3 port
work begins. The §9.2.1 thesis is that the hand-coded-shape descriptor
(i.e. `HandCodedControlled` with a user-supplied native trampoline)
matches the hand-coded handler within ±3% on M2 / M10.

## Capture environment

`CPC-ander-YTZ3O`, x64 (AMD EPYC 7763 64-Core Processor), Release, .NET
10.0.x, Windows 11 26200. **Cloud PC — not on AC/dedicated hardware**.
3 process launches × 5 reps = 15 measurements per (bench, variant) cell.
225 rows total across `launch-1.jsonl` + `launch-2.jsonl` +
`launch-3.jsonl`.

## Headline result

| Bench | vs ReactorV2 ns | vs ReactorToday ns | Phase 2 ARM64 (vs V2) | Band on this capture |
|---|---:|---:|---:|---|
| M1 Mount_Leaf_NoCallback | -0.9% | -2.6% | -1.0% | ≤5% |
| M2 Mount_Leaf_OneCallback | -2.2% | +2.9% | **+9.6%** | ≤5% |
| M5 Dispatch_Switch_Warm | +1.0% | +4.2% | -2.3% | ≤5% |
| M7 Update_NoChange | -1.4% | -0.5% | +8.1% | ≤5% |
| M10 EventHandlerState_Alloc | +1.1% | +9.5% | +19.3% | ≤5% |

**No bench exceeds ±5% vs ReactorV2 on this capture.** Adding the
`HandCodedControlled` + `HandCodedEvent` TextBox descriptor to the
registered set does not regress the matrix at this measurement
resolution.

The +9.6% M2 and +19.3% M10 numbers from the Phase 2 ARM64 stable-AC
capture do not reproduce on this x64 Cloud PC capture. Possible
explanations (none ratified):

1. **Architecture-dependent codegen.** Virtual `PropEntry.Mount`
dispatch + delegate invocation cost may differ between RyuJIT-x64
and RyuJIT-arm64 codegen paths. The Phase 2 README attributes the
residual cost to "virtual `PropEntry.Mount` dispatch + getter/setter
delegate invocations vs the hand-coded handler's inlined property
writes" — that cost is JIT-implementation-sensitive.
2. **Cache hierarchy / memory bandwidth.** AMD EPYC 7763 has different
L1/L2/L3 sizes and inclusivity vs Snapdragon X. The descriptor model
touches more memory per mount (entry list + per-entry delegates) and
may sit better in this cache hierarchy.
3. **Cloud PC virtualization noise.** Cloud PC runs in a virtualized
environment with co-tenants competing for resources. The 95% CI
half-widths in `summary.md` are wide (M2 vs V2 CI is ±15-16k ns on
a ~190k ns mean ⇒ ±8%), so the -2.2% delta may be within noise.
4. **TextBox descriptor offsetting something.** Adding TextBoxDescriptor
could change the registration table or method-dispatch table shape
in a way that incidentally benefits the descriptor variant on these
benches. (Unlikely — benches don't mount TextBox in M1/M2/M5/M7/M10
— but worth flagging for the ARM64 re-capture.)

## §9.2.1 thesis check (TextBox HandCoded shape)

The §9.2.1 thesis: a hand-coded-shape descriptor (using
`HandCodedControlled` + native trampoline) matches the hand-coded
handler within ±3% on M2 / M10. On this advisory x64 capture:

| Bench | ReactorDescriptors vs ReactorV2 | Within ±3%? |
|---|---:|---|
| M2 | **-2.2%** | ✅ yes |
| M10 | **+1.1%** | ✅ yes |

**On this capture the thesis holds.** ARM64 re-capture should confirm.

## Q1 matrix application — for completeness

Per §13 Q1's pre-committed decision matrix, on the Q1 head-to-head
gating benches (M1 / M2 / M5 / M7):

| Band | Verdict | Triggered on this capture? |
|---|---|---|
| ≤5% on all M1/M2/M5/M7 | Ship descriptors as primary | **Yes** |
| 5-15% on any gating bench | Judgment call | No |
| >15% on any gating bench | Ship hand-coded as primary | No |

On this advisory capture the matrix lands in the "ship descriptors as
primary" band on all four gating benches. This is **more favorable**
than the Phase 2 ARM64 capture (which landed in the judgment-call band
on M2 / M7) — consistent with the noise / arch explanations above.

The Phase 2 ARM64 verdict (judgment-call band, recommendation =
descriptors as primary at Phase 3 scope) stands as the authoritative
verdict. This capture does not move it.

## Files

- `launch-1.jsonl` / `launch-2.jsonl` / `launch-3.jsonl` — raw bench
output. 225 rows total.
- `aggregate.py` — copy of the Phase 2 aggregator. Run with no args
from this directory.
- `summary.md` — aggregator output.

## Next step

A stable-AC ARM64 re-capture on `LAPTOP-4MEP83VI` mirroring the Phase 2
methodology should land before §14 Phase 3 is closed:

- Same 3×5 capture pattern (`--reps 5`, 3 process launches).
- Same benches (M1 / M2 / M5 / M7 / M10).
- Same variant set (ReactorToday / ReactorV2 / ReactorDescriptors).
- AC power, foreground window, other apps closed.
- Output to `docs/specs/047/phase3-results/LAPTOP-4MEP83VI/<date>-textbox-proof-3x5/`.

The ARM64 capture is what ratifies the §9.2.1 thesis for the multi-event
descriptor shape. This advisory capture only confirms the new code
compiles, runs, and doesn't blow up the matrix at coarse resolution.
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
"""Spec 047 §14 Phase 2 (Q1 spike) — aggregate launch-N.jsonl into a means
+ 95% CI table per (bench, variant), and emit the Q1 decision-matrix deltas
(ReactorDescriptors vs ReactorV2, ReactorDescriptors vs ReactorToday).

Usage: python aggregate.py # reads launch-*.jsonl in CWD
"""
import glob
import json
import math
import statistics
from collections import defaultdict


def main():
rows = []
for path in sorted(glob.glob("launch-*.jsonl")):
with open(path, "r", encoding="utf-8") as f:
for line in f:
line = line.strip()
if not line:
continue
row = json.loads(line)
if row.get("status") != "ok":
continue
rows.append(row)

# Group by (benchId, variant).
buckets = defaultdict(list)
for r in rows:
buckets[(r["benchId"], r["variant"])].append(r)

benches = sorted({b for (b, _) in buckets}, key=_bench_key)
variants = ["ReactorToday", "ReactorV2", "ReactorDescriptors"]

def summarize(rs, key):
vals = [r[key] for r in rs]
if not vals:
return (math.nan, math.nan, 0)
mean = statistics.mean(vals)
if len(vals) > 1:
stdev = statistics.stdev(vals)
# 95% CI half-width for a t-distribution. For n=15 dof=14, t ≈ 2.145.
# Approximate with 1.96 for simplicity — close enough at n≥10.
ci_half = 1.96 * stdev / math.sqrt(len(vals))
else:
ci_half = math.nan
return mean, ci_half, len(vals)

# ── Per-(bench, variant) summary table. ──
print("# Per-(bench, variant) means")
print()
print(f"| Bench | Variant | n | Mean ns | 95% CI ±ns | Mean alloc B | 95% CI ±B |")
print(f"|---|---|---:|---:|---:|---:|---:|")
for b in benches:
for v in variants:
rs = buckets.get((b, v), [])
mean_ns, ci_ns, n = summarize(rs, "meanNs")
mean_b, ci_b, _ = summarize(rs, "allocBytes")
if n == 0:
print(f"| {b} | {v} | 0 | — | — | — | — |")
else:
print(
f"| {b} | {v} | {n} | {mean_ns:,.0f} | {ci_ns:,.0f} "
f"| {mean_b:,.0f} | {ci_b:,.0f} |"
)
print(f"| | | | | | | |")

# ── Q1 decision-matrix deltas. ──
print()
print("# Q1 head-to-head — ReactorDescriptors deltas")
print()
print(
"| Bench | vs ReactorV2 ns | vs ReactorV2 alloc | vs ReactorToday ns | vs ReactorToday alloc | Q1 band |"
)
print("|---|---:|---:|---:|---:|---|")
for b in benches:
ds = buckets.get((b, "ReactorDescriptors"), [])
v2 = buckets.get((b, "ReactorV2"), [])
today = buckets.get((b, "ReactorToday"), [])
d_ns, _, _ = summarize(ds, "meanNs")
d_b, _, _ = summarize(ds, "allocBytes")
v_ns, _, _ = summarize(v2, "meanNs")
v_b, _, _ = summarize(v2, "allocBytes")
t_ns, _, _ = summarize(today, "meanNs")
t_b, _, _ = summarize(today, "allocBytes")

def pct(a, base):
if base and not math.isnan(base) and not math.isnan(a):
return (a - base) / base * 100.0
return math.nan

vs_v2_ns = pct(d_ns, v_ns)
vs_v2_b = pct(d_b, v_b)
vs_t_ns = pct(d_ns, t_ns)
vs_t_b = pct(d_b, t_b)

# §13 Q1 matrix bands keyed off the worst of ns vs V2.
worst = vs_v2_ns
if math.isnan(worst):
band = "-"
elif abs(worst) <= 5:
band = "<=5%: ship descriptors"
elif abs(worst) <= 15:
band = "5-15%: judgment call"
else:
band = ">15%: ship hand-coded"

print(
f"| {b} | {vs_v2_ns:+.1f}% | {vs_v2_b:+.1f}% | {vs_t_ns:+.1f}% | {vs_t_b:+.1f}% | {band} |"
)


def _bench_key(s):
# M1, M2, ..., M13 — sort numerically.
try:
return int(s.lstrip("M"))
except ValueError:
return 999


if __name__ == "__main__":
main()
Loading
Loading