microsoft
diff --git a/‎docs/reports/stress-perf-stocks-grid.md‎
Lines changed: 23 additions & 0 deletions b/‎docs/reports/stress-perf-stocks-grid.md‎
Lines changed: 23 additions & 0 deletions
diff --git a/‎tests/stress_perf/METHODOLOGY.md‎
Lines changed: 70 additions & 1 deletion b/‎tests/stress_perf/METHODOLOGY.md‎
Lines changed: 70 additions & 1 deletion
diff --git a/‎tests/stress_perf/run_full_matrix.ps1‎
Lines changed: 16 additions & 2 deletions b/‎tests/stress_perf/run_full_matrix.ps1‎
Lines changed: 16 additions & 2 deletions
diff --git a/‎tests/stress_perf/run_stocks_grid_baseline.ps1‎
Lines changed: 14 additions & 3 deletions b/‎tests/stress_perf/run_stocks_grid_baseline.ps1‎
Lines changed: 14 additions & 3 deletions
diff --git a/‎tests/stress_perf_rn/StocksGrid/App.tsx‎
Lines changed: 22 additions & 14 deletions b/‎tests/stress_perf_rn/StocksGrid/App.tsx‎
Lines changed: 22 additions & 14 deletions
@@ -149,6 +149,18 @@ Wpf         950     ◀── MILCore + render-thread state
 RN-Fabric 1,156     ◀── Hermes + JS bundle + Yoga + Fabric shadow tree
 ```
 
+All numbers are `WorkingSet64` (process RSS) sampled externally by the
+harness, **not** `performance.memory.usedJSHeapSize`. JS-heap-only would
+massively under-report RN by excluding Hermes, JSI, Fabric reconciler,
+Yoga, and text-shaping caches.
+
+**Don't read RN's number as per-cell overhead.** A large fraction of the
+1,156 MB is RN-fixed cost — engine + bundle + reconciler infrastructure
+RN pays before the first cell exists. To attribute per-cell cost, run an
+empty-tree baseline of the same .exe and subtract; the delta is what
+scales with content. We haven't captured that baseline yet (open
+question — see below).
+
 Memory ranking is essentially identical battery and AC. Power state
 doesn't change architectural memory footprints.
 
@@ -227,6 +239,17 @@ perception was a battery-specific artifact.
 - Reactor's tree-build cost (22 ms steady) is the dominant reconcile
   phase. Investigation candidate: element allocation pooling, cached
   text formatters.
+- RN engine-baseline vs per-cell memory split. Run
+  `StocksGrid.exe --headless --percent 0 --duration 5` (or with a
+  zero-cell variant of the data source) to measure RN's fixed cost;
+  delta from the 1,156 MB loaded number is per-content cost. Until
+  we have that baseline, the RN row in the memory table is
+  apples-to-bowling-balls vs the C# variants.
+- True JS-to-pixel mount time. `Avg Mount` is currently a pure-JS
+  rAF-after-commit proxy and excludes Fabric work that lands after
+  the first rAF. Hook the native side per
+  [RNW Fabric perf wiki, Part 2](https://github.com/microsoft/react-native-windows/wiki/Performance-tests-Fabric#part-2--native-perf-tests)
+  to get real mount-to-pixel timing.
 
 ## How to reproduce
 
 
@@ -131,12 +131,81 @@ say one wins.
 - Never directly diff battery numbers against AC numbers; treat them as
   separate baselines.
 
+## Per-tick latency: `Avg Update` (C#) vs `Avg Mount` (RN)
+
+The synchronous variants (Direct / Bound / Wpf / DirectX / Reactor) bracket
+the tick handler with a stopwatch — `BeginUpdate` before the property patch
++ reconcile, `EndUpdate` after — and report `Avg Update` ms. Because the
+work runs on the UI thread synchronously, the bracket captures all of it:
+data mutation, framework reconciliation, and any commit-to-tree work.
+
+**RN-Fabric can't use that pattern.** `setState` returns immediately while
+React reconcile → Fabric commit → Yoga → Composition continues across
+other threads. A JS-side stopwatch around `setSnapshot` measures only JS
+dispatch and undercounts the per-tick cost by a large factor.
+
+For RN we report `Avg Mount` instead. The tracker stamps T0 just before
+`setSnapshot` and records `(rAF-now − T0)` from a single
+`requestAnimationFrame` scheduled inside a `useLayoutEffect` on the
+dispatched state. By the time the rAF callback runs:
+
+- React has finished its commit phase (useLayoutEffect ran)
+- Fabric has had a chance to apply the commit to the host tree
+- One display frame has been scheduled
+
+It's a **pure-JS proxy**, not pixel-accurate. It excludes any Fabric work
+that lands after the rAF tick (e.g. layout follow-ups in subsequent
+frames). For true JS-to-pixel mount time, hook the native side per the
+[RNW Fabric perf wiki, Part 2](https://github.com/microsoft/react-native-windows/wiki/Performance-tests-Fabric#part-2--native-perf-tests).
+
+**Don't diff `Avg Update` against `Avg Mount`.** They bracket different
+work. The harness reports them in separate columns (`InAppAvgUpdateMs`
+for C#, `InAppAvgMountMs` for RN) for that reason.
+
+## Memory: in-app `usedJSHeapSize` vs harness `WorkingSet64`
+
+Each variant's PerfTracker can read process memory locally, but the only
+in-process API exposed to RN/Hermes is `performance.memory.usedJSHeapSize`
+— **JS heap only**. It excludes:
+
+- Hermes engine
+- JSI bridge
+- Fabric reconciler + shadow tree
+- Yoga
+- TypeLayout / text-shaping caches
+
+These are tens-to-hundreds of MB of fixed cost RN pays before any cells
+exist. C# variants don't have an equivalent fixed cost. Reading
+`usedJSHeapSize` and comparing it to a C# variant's `WorkingSet64` would
+massively under-report RN.
+
+Because of this, **the harness samples `WorkingSet64` externally for every
+variant** (see `run_stocks_grid_baseline.ps1`'s polling loop) and that's
+the figure published as `PeakRssMB`. RN's PerfTracker still emits a
+per-second JS-heap series into its samples CSV under a `JsHeap_MB` column
+header, but the human-readable report omits it — the only authoritative
+memory column is the harness's `PeakRssMB`.
+
+When citing RN memory numbers, separate **engine-baseline** from
+**per-cell**: a 0-cell (or empty-tree) run gives the fixed cost; the
+delta from the loaded run is per-content cost. The published baseline
+report's RN row mostly reflects engine-baseline — note that explicitly
+when comparing.
+
 ## Don'ts (so we don't redo this analysis)
 
 1. **Don't trust `CompositionTarget.Rendering` for "FPS."** It's UI-thread-
    idle-vsync, not present-rate. Always 2× too high under load.
 2. **Don't trust `requestAnimationFrame` for "FPS" in RN.** It's JS-thread
-   tick rate. Under at light load, bursty at saturation.
+   tick rate. Under-reports at light load, bursty at saturation.
+   2a. **Don't bracket `setState` with a JS stopwatch and call it "update
+   time" in RN.** The dispatch returns immediately; the commit pipeline
+   continues across other threads. Use the rAF-after-commit `Avg Mount`
+   proxy or hook native per the RNW Fabric perf wiki. See above.
+   2b. **Don't read `performance.memory.usedJSHeapSize` and compare it to
+   a C# variant's working set.** JS heap excludes Hermes, JSI, Fabric,
+   Yoga, and text caches — tens-to-hundreds of MB of RN-fixed cost. Use
+   `WorkingSet64` from the harness for any cross-framework number.
 3. **Don't trust DwmCore VSync events filtered by PID.** Vsyncs are global;
    the per-PID attribution is heuristic and only fires when our app's
    swap chain is the signal target. For "OS still presents at 60Hz when
 
@@ -284,8 +284,11 @@ function Parse-TracerCsvForGlobalVsync {
 
 function Median {
   param([double[]]$Values)
-  if ($Values.Count -eq 0) { return 0 }
-  $sorted = $Values | Sort-Object
+  # Filter NaN — Avg Update is missing on RN rows, Avg Mount is missing
+  # on C# rows, both come back as NaN from Parse-FloatField.
+  $clean = @($Values | Where-Object { -not [double]::IsNaN($_) })
+  if ($clean.Count -eq 0) { return [double]::NaN }
+  $sorted = $clean | Sort-Object
   $n = $sorted.Count
   if ($n % 2 -eq 1) { return [double]$sorted[[int]([math]::Floor($n / 2))] }
   return ([double]$sorted[$n/2 - 1] + [double]$sorted[$n/2]) / 2.0
@@ -416,8 +419,17 @@ foreach ($v in $variants) {
       InAppFps               = Parse-FloatField $report 'Avg FPS'
       InAppTotalRenders      = Parse-IntField   $report 'Total Renders'
       InAppRendersPerSec     = $rendersPerSec
+      # `Avg Update` is the synchronous UI-thread span (Direct/Bound/Wpf/
+      # DirectX/Reactor — meaningful). `Avg Mount` is RN's rAF-after-commit
+      # mount-time proxy. Different brackets, separate columns; see
+      # METHODOLOGY.md.
       InAppAvgUpdateMs       = Parse-FloatField $report 'Avg Update'
+      InAppAvgMountMs        = Parse-FloatField $report 'Avg Mount'
       InAppAvgReconcileMs    = Parse-FloatField $report 'Avg Reconcile'
+      # `Avg Memory` / `Peak Memory` come from C# variants only — the RN
+      # variant excludes them because performance.memory.usedJSHeapSize
+      # excludes Hermes/Fabric/Yoga/text caches and would mislead. PeakRssMB
+      # below is the cross-framework number.
       InAppAvgMemoryMB       = Parse-FloatField $report 'Avg Memory'
       InAppPeakMemoryMB      = Parse-FloatField $report 'Peak Memory'
       PeakRssMB              = [math]::Round($peakRss / 1MB, 1)
@@ -439,6 +451,7 @@ $summary = $results | Group-Object -Property Variant, Percent | ForEach-Object {
   $etw   = [double[]]@($rows | ForEach-Object { [double]$_.EtwPresentPerSec })
   $rps   = [double[]]@($rows | ForEach-Object { [double]$_.InAppRendersPerSec })
   $upd   = [double[]]@($rows | ForEach-Object { [double]$_.InAppAvgUpdateMs })
+  $mnt   = [double[]]@($rows | ForEach-Object { [double]$_.InAppAvgMountMs })
   $recon = [double[]]@($rows | ForEach-Object { [double]$_.InAppAvgReconcileMs })
   $rss   = [double[]]@($rows | ForEach-Object { [double]$_.PeakRssMB })
   $vsync = [double[]]@($rows | ForEach-Object { [double]$_.GlobalVsyncPerSec })
@@ -459,6 +472,7 @@ $summary = $results | Group-Object -Property Variant, Percent | ForEach-Object {
     EtwPresent_Max                = [math]::Round(($etw | Measure-Object -Maximum).Maximum, 2)
     InAppRendersPerSec_Med        = [math]::Round((Median $rps), 2)
     InAppAvgUpdateMs_Med          = [math]::Round((Median $upd), 2)
+    InAppAvgMountMs_Med           = [math]::Round((Median $mnt), 2)
     InAppAvgReconcileMs_Med       = [math]::Round((Median $recon), 2)
     PeakRssMB_Med                 = [math]::Round((Median $rss), 1)
   }
 
@@ -209,6 +209,11 @@ foreach ($v in $variants) {
     } catch {}
     $powerState = if ($onBattery) { 'battery' } else { 'ac' }
 
+    # `Avg Update` is the synchronous UI-thread span (Direct/Bound/Wpf/
+    # DirectX/Reactor — meaningful). `Avg Mount` is RN's rAF-after-commit
+    # mount-time proxy (only emitted by the RN variants; see
+    # METHODOLOGY.md). We report them in separate columns so nobody
+    # mistakes one for the other — they bracket different work.
     $row = [pscustomobject]@{
       Run                  = $run
       Variant              = $v.Name
@@ -224,6 +229,7 @@ foreach ($v in $variants) {
       InAppTotalRenders    = Parse-IntField   $report 'Total Renders'
       InAppRendersPerSec   = $rendersPerSec
       InAppAvgUpdateMs     = Parse-FloatField $report 'Avg Update'
+      InAppAvgMountMs      = Parse-FloatField $report 'Avg Mount'
       PeakRssMB            = [math]::Round($peakRss / 1MB, 1)
     }
     $results += $row
@@ -234,11 +240,14 @@ foreach ($v in $variants) {
 # Per-run rows.
 $results | Export-Csv -Path $CsvPath -NoTypeInformation
 
-# Per-(variant,percent) summary stats across all repeats.
+# Per-(variant,percent) summary stats across all repeats. Filters NaN —
+# Avg Update is missing on RN rows, Avg Mount is missing on C# rows, both
+# come back as NaN from Parse-FloatField.
 function Median {
   param([double[]]$Values)
-  if ($Values.Count -eq 0) { return 0 }
-  $sorted = $Values | Sort-Object
+  $clean = @($Values | Where-Object { -not [double]::IsNaN($_) })
+  if ($clean.Count -eq 0) { return [double]::NaN }
+  $sorted = $clean | Sort-Object
   $n = $sorted.Count
   if ($n % 2 -eq 1) { return [double]$sorted[[int]([math]::Floor($n / 2))] }
   return ([double]$sorted[$n/2 - 1] + [double]$sorted[$n/2]) / 2.0
@@ -252,6 +261,7 @@ $summary = $results | Group-Object -Property Variant, Percent | ForEach-Object {
   $etw     = [double[]]@($rows | ForEach-Object { [double]$_.EtwPresentPerSec })
   $rps     = [double[]]@($rows | ForEach-Object { [double]$_.InAppRendersPerSec })
   $upd     = [double[]]@($rows | ForEach-Object { [double]$_.InAppAvgUpdateMs })
+  $mnt     = [double[]]@($rows | ForEach-Object { [double]$_.InAppAvgMountMs })
   $rss     = [double[]]@($rows | ForEach-Object { [double]$_.PeakRssMB })
   $vsync   = [double[]]@($rows | ForEach-Object { [double]$_.GlobalVsyncPerSec })
   # Mode of bottleneck across runs.
@@ -272,6 +282,7 @@ $summary = $results | Group-Object -Property Variant, Percent | ForEach-Object {
     EtwPresent_Max                = [math]::Round(($etw | Measure-Object -Maximum).Maximum, 2)
     InAppRendersPerSec_Med        = [math]::Round((Median $rps), 2)
     InAppAvgUpdateMs_Med          = [math]::Round((Median $upd), 2)
+    InAppAvgMountMs_Med           = [math]::Round((Median $mnt), 2)
     PeakRssMB_Med                 = [math]::Round((Median $rss), 1)
   }
 }
 
@@ -4,15 +4,17 @@
 //
 // Layout: 70 cols × 70 rows = 4,900 cells, 64×18 each, FontSize 8.
 // Update loop: setInterval @ 33 ms; each tick mutates N% of cells (slider).
-// Stats: FPS via requestAnimationFrame, update time via stopwatch, memory
-// via performance.memory (Hermes exposes usedJSHeapSize on RN-Windows).
+// Stats: FPS via requestAnimationFrame; mount time via beginMount-stamp
+// before setSnapshot + recordMountCommit-on-useLayoutEffect (rAF after
+// commit); JS heap via performance.memory (diagnostic only — RSS is
+// captured by the harness externally).
 //
 // Match policy: behaviorally identical to the Reactor app — same data
 // generation algorithm (StockDataSource.ts), same tick rate, same sample
 // schedule.  PerfTracker writes the same flavor of report file.
 
 import * as React from 'react';
-import { useEffect, useMemo, useRef, useState } from 'react';
+import { useEffect, useLayoutEffect, useMemo, useRef, useState } from 'react';
 import {
   Button,
   ScrollView,
@@ -109,8 +111,8 @@ export default function App(props: AppProps) {
   const [percent, setPercent] = useState<number>(initialPercent);
   const [running, setRunning] = useState<boolean>(false);
   const [fpsLabel, setFpsLabel] = useState('FPS: --');
-  const [updateLabel, setUpdateLabel] = useState('Update: -- ms');
-  const [memLabel, setMemLabel] = useState('Mem: -- MB');
+  const [mountLabel, setMountLabel] = useState('Mount: -- ms');
+  const [memLabel, setMemLabel] = useState('JS Heap: -- MB');
   // Surfaces the final headless report in a single TextBlock so we can read
   // it back via UI Automation (WinUI .exe apps have no stdout by default).
   const [report, setReport] = useState<string>('');
@@ -129,16 +131,16 @@ export default function App(props: AppProps) {
     return stop;
   }, []);
 
-  // Update loop. Mirrors Reactor variant's UseEffect on (running, percent).
+  // Update loop. Stamps T0 immediately before setSnapshot; the
+  // useLayoutEffect below records a mount-time sample once React commits.
+  // Bracketing setSnapshot with a synchronous begin/end span (the C# pattern)
+  // would only measure JS dispatch — Fabric's commit pipeline is async.
   useEffect(() => {
     if (!running) return;
     const perf = perfRef.current;
     const handle = setInterval(() => {
-      perf.beginUpdate();
       const changed = source.update(percent);
-      // Build a new snapshot — but only patch the changed indices, like the
-      // Direct variant.  Allocates a new outer array (so React detects it)
-      // but reuses unchanged Cell objects (so React.memo skips them).
+      perf.beginMount();
       setSnapshot(prev => {
         const next = prev.slice();
         for (const idx of changed) {
@@ -151,15 +153,21 @@ export default function App(props: AppProps) {
         }
         return next;
       });
-      perf.endUpdate();
 
       setFpsLabel(`FPS: ${perf.fps.toFixed(0)}`);
-      setUpdateLabel(`Update: ${perf.updateMs.toFixed(1)} ms`);
-      setMemLabel(`Mem: ${perf.memoryMB} MB`);
+      setMountLabel(`Mount: ${perf.mountMs.toFixed(1)} ms`);
+      setMemLabel(`JS Heap: ${perf.jsHeapMB} MB`);
     }, TICK_MS);
     return () => clearInterval(handle);
   }, [running, percent, source]);
 
+  // Records a mount-time sample after each commit. useLayoutEffect runs
+  // post-commit on the JS thread; the tracker schedules a single rAF inside
+  // so the sample brackets through to the next display frame.
+  useLayoutEffect(() => {
+    perfRef.current.recordMountCommit();
+  }, [snapshot]);
+
   // Headless auto-start mirrors the Reactor variant's CliOpts.Headless path.
   useEffect(() => {
     if (!headless) return;
@@ -220,7 +228,7 @@ export default function App(props: AppProps) {
         <Button title="50%" onPress={() => setPercent(50)} disabled={percent === 50} />
         <Button title="100%" onPress={() => setPercent(100)} disabled={percent === 100} />
         <Text style={[styles.toolbarText, styles.fixedW90]}>{fpsLabel}</Text>
-        <Text style={[styles.toolbarText, styles.fixedW120]}>{updateLabel}</Text>
+        <Text style={[styles.toolbarText, styles.fixedW120]}>{mountLabel}</Text>
         <Text style={[styles.toolbarText, styles.fixedW120]}>{memLabel}</Text>
       </View>
       {!!report && (