microsoft
diff --git a/‎docs/specs/047/phase3-results/CPC-ander-YTZ3O-x64-advisory/2026-05-27-textbox-proof-3x5/README.md‎
Lines changed: 131 additions & 0 deletions b/‎docs/specs/047/phase3-results/CPC-ander-YTZ3O-x64-advisory/2026-05-27-textbox-proof-3x5/README.md‎
Lines changed: 131 additions & 0 deletions
diff --git a/‎docs/specs/047/phase3-results/CPC-ander-YTZ3O-x64-advisory/2026-05-27-textbox-proof-3x5/aggregate.py‎
Lines changed: 122 additions & 0 deletions b/‎docs/specs/047/phase3-results/CPC-ander-YTZ3O-x64-advisory/2026-05-27-textbox-proof-3x5/aggregate.py‎
Lines changed: 122 additions & 0 deletions
@@ -0,0 +1,131 @@
+# Spec 047 §14 Phase 3 (3.0.3) — TextBox descriptor proof, x64 advisory
+
+**This is an advisory x64 capture, NOT authoritative.** The Phase 2 Q1
+verdict was ratified on `LAPTOP-4MEP83VI` (ARM64, stable AC, dedicated
+hardware). This capture was run on a Cloud PC (`CPC-ander-YTZ3O`, AMD
+EPYC 7763, x64) and inherits Cloud PC noise characteristics — co-tenant
+load, virtualized scheduling, no AC/foreground control. **Do not cite
+these numbers in §13 or §14 spec text.** Use them as a directional read
+on whether the Phase 3 prereq 3.0.2 (TextBox descriptor with the new
+`HandCodedControlled` + `HandCodedEvent` builders) regresses the bench
+matrix. A real ARM64 stable-AC re-capture on `LAPTOP-4MEP83VI` should
+land before §14 Phase 3 is closed.
+
+## Why this advisory capture exists
+
+Phase 3 prereq 3.0.2 ships the first descriptor port that uses
+`HandCodedControlled` / `HandCodedEvent` — the escape-hatch builders
+needed for multi-event controls. TextBox is the proof point (2 events:
+`TextChanged` round-tripping `Text`, plus fire-only `SelectionChanged`).
+The `DescriptorVariantFactory` now registers TextBoxDescriptor alongside
+the Q1 head-to-head trio (ToggleSwitch / Slider / Border).
+
+Phase 2 §13 Q1 pre-committed to the bench matrix as the validation gate
+for "does the descriptor model add bounded tax." With a new descriptor
+shape entering the matrix, the gate must re-run before bulk Phase 3 port
+work begins. The §9.2.1 thesis is that the hand-coded-shape descriptor
+(i.e. `HandCodedControlled` with a user-supplied native trampoline)
+matches the hand-coded handler within ±3% on M2 / M10.
+
+## Capture environment
+
+`CPC-ander-YTZ3O`, x64 (AMD EPYC 7763 64-Core Processor), Release, .NET
+10.0.x, Windows 11 26200. **Cloud PC — not on AC/dedicated hardware**.
+3 process launches × 5 reps = 15 measurements per (bench, variant) cell.
+225 rows total across `launch-1.jsonl` + `launch-2.jsonl` +
+`launch-3.jsonl`.
+
+## Headline result
+
+| Bench | vs ReactorV2 ns | vs ReactorToday ns | Phase 2 ARM64 (vs V2) | Band on this capture |
+|---|---:|---:|---:|---|
+| M1 Mount_Leaf_NoCallback | -0.9% | -2.6% | -1.0% | ≤5% |
+| M2 Mount_Leaf_OneCallback | -2.2% | +2.9% | **+9.6%** | ≤5% |
+| M5 Dispatch_Switch_Warm | +1.0% | +4.2% | -2.3% | ≤5% |
+| M7 Update_NoChange | -1.4% | -0.5% | +8.1% | ≤5% |
+| M10 EventHandlerState_Alloc | +1.1% | +9.5% | +19.3% | ≤5% |
+
+**No bench exceeds ±5% vs ReactorV2 on this capture.** Adding the
+`HandCodedControlled` + `HandCodedEvent` TextBox descriptor to the
+registered set does not regress the matrix at this measurement
+resolution.
+
+The +9.6% M2 and +19.3% M10 numbers from the Phase 2 ARM64 stable-AC
+capture do not reproduce on this x64 Cloud PC capture. Possible
+explanations (none ratified):
+
+1. **Architecture-dependent codegen.** Virtual `PropEntry.Mount`
+   dispatch + delegate invocation cost may differ between RyuJIT-x64
+   and RyuJIT-arm64 codegen paths. The Phase 2 README attributes the
+   residual cost to "virtual `PropEntry.Mount` dispatch + getter/setter
+   delegate invocations vs the hand-coded handler's inlined property
+   writes" — that cost is JIT-implementation-sensitive.
+2. **Cache hierarchy / memory bandwidth.** AMD EPYC 7763 has different
+   L1/L2/L3 sizes and inclusivity vs Snapdragon X. The descriptor model
+   touches more memory per mount (entry list + per-entry delegates) and
+   may sit better in this cache hierarchy.
+3. **Cloud PC virtualization noise.** Cloud PC runs in a virtualized
+   environment with co-tenants competing for resources. The 95% CI
+   half-widths in `summary.md` are wide (M2 vs V2 CI is ±15-16k ns on
+   a ~190k ns mean ⇒ ±8%), so the -2.2% delta may be within noise.
+4. **TextBox descriptor offsetting something.** Adding TextBoxDescriptor
+   could change the registration table or method-dispatch table shape
+   in a way that incidentally benefits the descriptor variant on these
+   benches. (Unlikely — benches don't mount TextBox in M1/M2/M5/M7/M10
+   — but worth flagging for the ARM64 re-capture.)
+
+## §9.2.1 thesis check (TextBox HandCoded shape)
+
+The §9.2.1 thesis: a hand-coded-shape descriptor (using
+`HandCodedControlled` + native trampoline) matches the hand-coded
+handler within ±3% on M2 / M10. On this advisory x64 capture:
+
+| Bench | ReactorDescriptors vs ReactorV2 | Within ±3%? |
+|---|---:|---|
+| M2 | **-2.2%** | ✅ yes |
+| M10 | **+1.1%** | ✅ yes |
+
+**On this capture the thesis holds.** ARM64 re-capture should confirm.
+
+## Q1 matrix application — for completeness
+
+Per §13 Q1's pre-committed decision matrix, on the Q1 head-to-head
+gating benches (M1 / M2 / M5 / M7):
+
+| Band | Verdict | Triggered on this capture? |
+|---|---|---|
+| ≤5% on all M1/M2/M5/M7 | Ship descriptors as primary | **Yes** |
+| 5-15% on any gating bench | Judgment call | No |
+| >15% on any gating bench | Ship hand-coded as primary | No |
+
+On this advisory capture the matrix lands in the "ship descriptors as
+primary" band on all four gating benches. This is **more favorable**
+than the Phase 2 ARM64 capture (which landed in the judgment-call band
+on M2 / M7) — consistent with the noise / arch explanations above.
+
+The Phase 2 ARM64 verdict (judgment-call band, recommendation =
+descriptors as primary at Phase 3 scope) stands as the authoritative
+verdict. This capture does not move it.
+
+## Files
+
+- `launch-1.jsonl` / `launch-2.jsonl` / `launch-3.jsonl` — raw bench
+  output. 225 rows total.
+- `aggregate.py` — copy of the Phase 2 aggregator. Run with no args
+  from this directory.
+- `summary.md` — aggregator output.
+
+## Next step
+
+A stable-AC ARM64 re-capture on `LAPTOP-4MEP83VI` mirroring the Phase 2
+methodology should land before §14 Phase 3 is closed:
+
+- Same 3×5 capture pattern (`--reps 5`, 3 process launches).
+- Same benches (M1 / M2 / M5 / M7 / M10).
+- Same variant set (ReactorToday / ReactorV2 / ReactorDescriptors).
+- AC power, foreground window, other apps closed.
+- Output to `docs/specs/047/phase3-results/LAPTOP-4MEP83VI/<date>-textbox-proof-3x5/`.
+
+The ARM64 capture is what ratifies the §9.2.1 thesis for the multi-event
+descriptor shape. This advisory capture only confirms the new code
+compiles, runs, and doesn't blow up the matrix at coarse resolution.
@@ -0,0 +1,122 @@
+"""Spec 047 §14 Phase 2 (Q1 spike) — aggregate launch-N.jsonl into a means
++ 95% CI table per (bench, variant), and emit the Q1 decision-matrix deltas
+(ReactorDescriptors vs ReactorV2, ReactorDescriptors vs ReactorToday).
+
+Usage:  python aggregate.py    # reads launch-*.jsonl in CWD
+"""
+import glob
+import json
+import math
+import statistics
+from collections import defaultdict
+
+
+def main():
+    rows = []
+    for path in sorted(glob.glob("launch-*.jsonl")):
+        with open(path, "r", encoding="utf-8") as f:
+            for line in f:
+                line = line.strip()
+                if not line:
+                    continue
+                row = json.loads(line)
+                if row.get("status") != "ok":
+                    continue
+                rows.append(row)
+
+    # Group by (benchId, variant).
+    buckets = defaultdict(list)
+    for r in rows:
+        buckets[(r["benchId"], r["variant"])].append(r)
+
+    benches = sorted({b for (b, _) in buckets}, key=_bench_key)
+    variants = ["ReactorToday", "ReactorV2", "ReactorDescriptors"]
+
+    def summarize(rs, key):
+        vals = [r[key] for r in rs]
+        if not vals:
+            return (math.nan, math.nan, 0)
+        mean = statistics.mean(vals)
+        if len(vals) > 1:
+            stdev = statistics.stdev(vals)
+            # 95% CI half-width for a t-distribution. For n=15 dof=14, t ≈ 2.145.
+            # Approximate with 1.96 for simplicity — close enough at n≥10.
+            ci_half = 1.96 * stdev / math.sqrt(len(vals))
+        else:
+            ci_half = math.nan
+        return mean, ci_half, len(vals)
+
+    # ── Per-(bench, variant) summary table. ──
+    print("# Per-(bench, variant) means")
+    print()
+    print(f"| Bench | Variant | n | Mean ns | 95% CI ±ns | Mean alloc B | 95% CI ±B |")
+    print(f"|---|---|---:|---:|---:|---:|---:|")
+    for b in benches:
+        for v in variants:
+            rs = buckets.get((b, v), [])
+            mean_ns, ci_ns, n = summarize(rs, "meanNs")
+            mean_b, ci_b, _ = summarize(rs, "allocBytes")
+            if n == 0:
+                print(f"| {b} | {v} | 0 | — | — | — | — |")
+            else:
+                print(
+                    f"| {b} | {v} | {n} | {mean_ns:,.0f} | {ci_ns:,.0f} "
+                    f"| {mean_b:,.0f} | {ci_b:,.0f} |"
+                )
+        print(f"| | | | | | | |")
+
+    # ── Q1 decision-matrix deltas. ──
+    print()
+    print("# Q1 head-to-head — ReactorDescriptors deltas")
+    print()
+    print(
+        "| Bench | vs ReactorV2 ns | vs ReactorV2 alloc | vs ReactorToday ns | vs ReactorToday alloc | Q1 band |"
+    )
+    print("|---|---:|---:|---:|---:|---|")
+    for b in benches:
+        ds = buckets.get((b, "ReactorDescriptors"), [])
+        v2 = buckets.get((b, "ReactorV2"), [])
+        today = buckets.get((b, "ReactorToday"), [])
+        d_ns, _, _ = summarize(ds, "meanNs")
+        d_b, _, _ = summarize(ds, "allocBytes")
+        v_ns, _, _ = summarize(v2, "meanNs")
+        v_b, _, _ = summarize(v2, "allocBytes")
+        t_ns, _, _ = summarize(today, "meanNs")
+        t_b, _, _ = summarize(today, "allocBytes")
+
+        def pct(a, base):
+            if base and not math.isnan(base) and not math.isnan(a):
+                return (a - base) / base * 100.0
+            return math.nan
+
+        vs_v2_ns = pct(d_ns, v_ns)
+        vs_v2_b = pct(d_b, v_b)
+        vs_t_ns = pct(d_ns, t_ns)
+        vs_t_b = pct(d_b, t_b)
+
+        # §13 Q1 matrix bands keyed off the worst of ns vs V2.
+        worst = vs_v2_ns
+        if math.isnan(worst):
+            band = "-"
+        elif abs(worst) <= 5:
+            band = "<=5%: ship descriptors"
+        elif abs(worst) <= 15:
+            band = "5-15%: judgment call"
+        else:
+            band = ">15%: ship hand-coded"
+
+        print(
+            f"| {b} | {vs_v2_ns:+.1f}% | {vs_v2_b:+.1f}% | {vs_t_ns:+.1f}% | {vs_t_b:+.1f}% | {band} |"
+        )
+
+
+def _bench_key(s):
+    # M1, M2, ..., M13 — sort numerically.
+    try:
+        return int(s.lstrip("M"))
+    except ValueError:
+        return 999
+
+
+if __name__ == "__main__":
+    main()