Skip to content

Commit f91176e

Browse files
perf(stress_perf_rn): replace JS-dispatch timer with rAF-after-commit mount time (#137)
* perf(stress_perf_rn): replace JS-dispatch timer with rAF-after-commit mount time Cross-team review of the StocksGrid perf numbers flagged that RN-Fabric's in-app `Avg Update` and `Avg/Peak Memory` were measuring the wrong things and were not comparable to the C# variants: - `setSnapshot` returns immediately on Fabric while reconcile → Fabric → Yoga → Composition continues across other threads, so a JS-side begin/end stopwatch only captured dispatch. Confirmed empirically: the new mount metric reports 348–1046 ms across 10–100% workloads, vs. 0.5–6 ms from the old timer (170–700× undercount). - `performance.memory.usedJSHeapSize` excludes Hermes, JSI, Fabric shadow tree, Yoga, and text caches — tens-to-hundreds of MB of RN-fixed cost. The harness already samples `WorkingSet64` externally for every variant, so the in-app memory lines were redundant *and* misleading. Changes: - PerfTracker.ts: drop `beginUpdate/endUpdate` + `Avg Update`. Add `beginMount` (stamp T0 before setState) + `recordMountCommit` (called from a useLayoutEffect on the dispatched state, schedules one rAF and records `(rAF-now − T0)` as a sample). Drop `Avg/Peak Memory` from the human-readable report; rename in-app heap reading to `jsHeapMB` and the samples-CSV column to `JsHeap_MB` to disambiguate it from process RSS. - App.tsx: wire `beginMount` ahead of `setSnapshot` and add a useLayoutEffect on `snapshot` that calls `recordMountCommit`. Toolbar labels updated. - VirtualList sibling: relabel in-app memory as `jsHeapMB` for consistency (no `Avg Update` to remove there — it benchmarks scroll P50/P95/P99 from rAF deltas, which doesn't have the async problem). - run_stocks_grid_baseline.ps1 / run_full_matrix.ps1: parse `Avg Mount` into a new `InAppAvgMountMs(_Med)` column. Median filters NaN so C# rows (no Mount) and RN rows (no Update) don't pollute each other's medians. - METHODOLOGY.md: new sections explaining `Avg Update` (C#) vs `Avg Mount` (RN) and JS heap vs `WorkingSet64`. Added Don'ts 2a/2b to keep this from regressing. - docs/reports/stress-perf-stocks-grid.md: stronger memory caveat noting the RN engine baseline isn't decomposed; new open questions for engine-baseline measurement and true JS-to-pixel mount via the RNW Fabric perf wiki. * perf(stress_perf_rn): queue mount-start timestamps; fix grammar nit Address PR review feedback: - PerfTracker.ts: replace single `lastMountStart` with a `pendingMountStarts` queue. Multiple `beginMount()` calls before a commit (React batches state updates under load) all queue; `recordMountCommit()` consumes the oldest (worst-case user-perceived latency, not the optimistic latest), clears the rest (a batched commit reflects all queued dispatches), and an empty queue means no sample (avoids attributing stale stamps to unrelated later commits — e.g. a re-render not driven by a `setSnapshot` tick). - METHODOLOGY.md: "Under at light load" → "Under-reports at light load" (parallel structure with "bursty at saturation").
1 parent dbbc404 commit f91176e

8 files changed

Lines changed: 227 additions & 68 deletions

File tree

docs/reports/stress-perf-stocks-grid.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -149,6 +149,18 @@ Wpf 950 ◀── MILCore + render-thread state
149149
RN-Fabric 1,156 ◀── Hermes + JS bundle + Yoga + Fabric shadow tree
150150
```
151151

152+
All numbers are `WorkingSet64` (process RSS) sampled externally by the
153+
harness, **not** `performance.memory.usedJSHeapSize`. JS-heap-only would
154+
massively under-report RN by excluding Hermes, JSI, Fabric reconciler,
155+
Yoga, and text-shaping caches.
156+
157+
**Don't read RN's number as per-cell overhead.** A large fraction of the
158+
1,156 MB is RN-fixed cost — engine + bundle + reconciler infrastructure
159+
RN pays before the first cell exists. To attribute per-cell cost, run an
160+
empty-tree baseline of the same .exe and subtract; the delta is what
161+
scales with content. We haven't captured that baseline yet (open
162+
question — see below).
163+
152164
Memory ranking is essentially identical battery and AC. Power state
153165
doesn't change architectural memory footprints.
154166

@@ -227,6 +239,17 @@ perception was a battery-specific artifact.
227239
- Reactor's tree-build cost (22 ms steady) is the dominant reconcile
228240
phase. Investigation candidate: element allocation pooling, cached
229241
text formatters.
242+
- RN engine-baseline vs per-cell memory split. Run
243+
`StocksGrid.exe --headless --percent 0 --duration 5` (or with a
244+
zero-cell variant of the data source) to measure RN's fixed cost;
245+
delta from the 1,156 MB loaded number is per-content cost. Until
246+
we have that baseline, the RN row in the memory table is
247+
apples-to-bowling-balls vs the C# variants.
248+
- True JS-to-pixel mount time. `Avg Mount` is currently a pure-JS
249+
rAF-after-commit proxy and excludes Fabric work that lands after
250+
the first rAF. Hook the native side per
251+
[RNW Fabric perf wiki, Part 2](https://github.com/microsoft/react-native-windows/wiki/Performance-tests-Fabric#part-2--native-perf-tests)
252+
to get real mount-to-pixel timing.
230253

231254
## How to reproduce
232255

tests/stress_perf/METHODOLOGY.md

Lines changed: 70 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -131,12 +131,81 @@ say one wins.
131131
- Never directly diff battery numbers against AC numbers; treat them as
132132
separate baselines.
133133

134+
## Per-tick latency: `Avg Update` (C#) vs `Avg Mount` (RN)
135+
136+
The synchronous variants (Direct / Bound / Wpf / DirectX / Reactor) bracket
137+
the tick handler with a stopwatch — `BeginUpdate` before the property patch
138+
+ reconcile, `EndUpdate` after — and report `Avg Update` ms. Because the
139+
work runs on the UI thread synchronously, the bracket captures all of it:
140+
data mutation, framework reconciliation, and any commit-to-tree work.
141+
142+
**RN-Fabric can't use that pattern.** `setState` returns immediately while
143+
React reconcile → Fabric commit → Yoga → Composition continues across
144+
other threads. A JS-side stopwatch around `setSnapshot` measures only JS
145+
dispatch and undercounts the per-tick cost by a large factor.
146+
147+
For RN we report `Avg Mount` instead. The tracker stamps T0 just before
148+
`setSnapshot` and records `(rAF-now − T0)` from a single
149+
`requestAnimationFrame` scheduled inside a `useLayoutEffect` on the
150+
dispatched state. By the time the rAF callback runs:
151+
152+
- React has finished its commit phase (useLayoutEffect ran)
153+
- Fabric has had a chance to apply the commit to the host tree
154+
- One display frame has been scheduled
155+
156+
It's a **pure-JS proxy**, not pixel-accurate. It excludes any Fabric work
157+
that lands after the rAF tick (e.g. layout follow-ups in subsequent
158+
frames). For true JS-to-pixel mount time, hook the native side per the
159+
[RNW Fabric perf wiki, Part 2](https://github.com/microsoft/react-native-windows/wiki/Performance-tests-Fabric#part-2--native-perf-tests).
160+
161+
**Don't diff `Avg Update` against `Avg Mount`.** They bracket different
162+
work. The harness reports them in separate columns (`InAppAvgUpdateMs`
163+
for C#, `InAppAvgMountMs` for RN) for that reason.
164+
165+
## Memory: in-app `usedJSHeapSize` vs harness `WorkingSet64`
166+
167+
Each variant's PerfTracker can read process memory locally, but the only
168+
in-process API exposed to RN/Hermes is `performance.memory.usedJSHeapSize`
169+
**JS heap only**. It excludes:
170+
171+
- Hermes engine
172+
- JSI bridge
173+
- Fabric reconciler + shadow tree
174+
- Yoga
175+
- TypeLayout / text-shaping caches
176+
177+
These are tens-to-hundreds of MB of fixed cost RN pays before any cells
178+
exist. C# variants don't have an equivalent fixed cost. Reading
179+
`usedJSHeapSize` and comparing it to a C# variant's `WorkingSet64` would
180+
massively under-report RN.
181+
182+
Because of this, **the harness samples `WorkingSet64` externally for every
183+
variant** (see `run_stocks_grid_baseline.ps1`'s polling loop) and that's
184+
the figure published as `PeakRssMB`. RN's PerfTracker still emits a
185+
per-second JS-heap series into its samples CSV under a `JsHeap_MB` column
186+
header, but the human-readable report omits it — the only authoritative
187+
memory column is the harness's `PeakRssMB`.
188+
189+
When citing RN memory numbers, separate **engine-baseline** from
190+
**per-cell**: a 0-cell (or empty-tree) run gives the fixed cost; the
191+
delta from the loaded run is per-content cost. The published baseline
192+
report's RN row mostly reflects engine-baseline — note that explicitly
193+
when comparing.
194+
134195
## Don'ts (so we don't redo this analysis)
135196

136197
1. **Don't trust `CompositionTarget.Rendering` for "FPS."** It's UI-thread-
137198
idle-vsync, not present-rate. Always 2× too high under load.
138199
2. **Don't trust `requestAnimationFrame` for "FPS" in RN.** It's JS-thread
139-
tick rate. Under at light load, bursty at saturation.
200+
tick rate. Under-reports at light load, bursty at saturation.
201+
2a. **Don't bracket `setState` with a JS stopwatch and call it "update
202+
time" in RN.** The dispatch returns immediately; the commit pipeline
203+
continues across other threads. Use the rAF-after-commit `Avg Mount`
204+
proxy or hook native per the RNW Fabric perf wiki. See above.
205+
2b. **Don't read `performance.memory.usedJSHeapSize` and compare it to
206+
a C# variant's working set.** JS heap excludes Hermes, JSI, Fabric,
207+
Yoga, and text caches — tens-to-hundreds of MB of RN-fixed cost. Use
208+
`WorkingSet64` from the harness for any cross-framework number.
140209
3. **Don't trust DwmCore VSync events filtered by PID.** Vsyncs are global;
141210
the per-PID attribution is heuristic and only fires when our app's
142211
swap chain is the signal target. For "OS still presents at 60Hz when

tests/stress_perf/run_full_matrix.ps1

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -284,8 +284,11 @@ function Parse-TracerCsvForGlobalVsync {
284284

285285
function Median {
286286
param([double[]]$Values)
287-
if ($Values.Count -eq 0) { return 0 }
288-
$sorted = $Values | Sort-Object
287+
# Filter NaN — Avg Update is missing on RN rows, Avg Mount is missing
288+
# on C# rows, both come back as NaN from Parse-FloatField.
289+
$clean = @($Values | Where-Object { -not [double]::IsNaN($_) })
290+
if ($clean.Count -eq 0) { return [double]::NaN }
291+
$sorted = $clean | Sort-Object
289292
$n = $sorted.Count
290293
if ($n % 2 -eq 1) { return [double]$sorted[[int]([math]::Floor($n / 2))] }
291294
return ([double]$sorted[$n/2 - 1] + [double]$sorted[$n/2]) / 2.0
@@ -416,8 +419,17 @@ foreach ($v in $variants) {
416419
InAppFps = Parse-FloatField $report 'Avg FPS'
417420
InAppTotalRenders = Parse-IntField $report 'Total Renders'
418421
InAppRendersPerSec = $rendersPerSec
422+
# `Avg Update` is the synchronous UI-thread span (Direct/Bound/Wpf/
423+
# DirectX/Reactor — meaningful). `Avg Mount` is RN's rAF-after-commit
424+
# mount-time proxy. Different brackets, separate columns; see
425+
# METHODOLOGY.md.
419426
InAppAvgUpdateMs = Parse-FloatField $report 'Avg Update'
427+
InAppAvgMountMs = Parse-FloatField $report 'Avg Mount'
420428
InAppAvgReconcileMs = Parse-FloatField $report 'Avg Reconcile'
429+
# `Avg Memory` / `Peak Memory` come from C# variants only — the RN
430+
# variant excludes them because performance.memory.usedJSHeapSize
431+
# excludes Hermes/Fabric/Yoga/text caches and would mislead. PeakRssMB
432+
# below is the cross-framework number.
421433
InAppAvgMemoryMB = Parse-FloatField $report 'Avg Memory'
422434
InAppPeakMemoryMB = Parse-FloatField $report 'Peak Memory'
423435
PeakRssMB = [math]::Round($peakRss / 1MB, 1)
@@ -439,6 +451,7 @@ $summary = $results | Group-Object -Property Variant, Percent | ForEach-Object {
439451
$etw = [double[]]@($rows | ForEach-Object { [double]$_.EtwPresentPerSec })
440452
$rps = [double[]]@($rows | ForEach-Object { [double]$_.InAppRendersPerSec })
441453
$upd = [double[]]@($rows | ForEach-Object { [double]$_.InAppAvgUpdateMs })
454+
$mnt = [double[]]@($rows | ForEach-Object { [double]$_.InAppAvgMountMs })
442455
$recon = [double[]]@($rows | ForEach-Object { [double]$_.InAppAvgReconcileMs })
443456
$rss = [double[]]@($rows | ForEach-Object { [double]$_.PeakRssMB })
444457
$vsync = [double[]]@($rows | ForEach-Object { [double]$_.GlobalVsyncPerSec })
@@ -459,6 +472,7 @@ $summary = $results | Group-Object -Property Variant, Percent | ForEach-Object {
459472
EtwPresent_Max = [math]::Round(($etw | Measure-Object -Maximum).Maximum, 2)
460473
InAppRendersPerSec_Med = [math]::Round((Median $rps), 2)
461474
InAppAvgUpdateMs_Med = [math]::Round((Median $upd), 2)
475+
InAppAvgMountMs_Med = [math]::Round((Median $mnt), 2)
462476
InAppAvgReconcileMs_Med = [math]::Round((Median $recon), 2)
463477
PeakRssMB_Med = [math]::Round((Median $rss), 1)
464478
}

tests/stress_perf/run_stocks_grid_baseline.ps1

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -209,6 +209,11 @@ foreach ($v in $variants) {
209209
} catch {}
210210
$powerState = if ($onBattery) { 'battery' } else { 'ac' }
211211

212+
# `Avg Update` is the synchronous UI-thread span (Direct/Bound/Wpf/
213+
# DirectX/Reactor — meaningful). `Avg Mount` is RN's rAF-after-commit
214+
# mount-time proxy (only emitted by the RN variants; see
215+
# METHODOLOGY.md). We report them in separate columns so nobody
216+
# mistakes one for the other — they bracket different work.
212217
$row = [pscustomobject]@{
213218
Run = $run
214219
Variant = $v.Name
@@ -224,6 +229,7 @@ foreach ($v in $variants) {
224229
InAppTotalRenders = Parse-IntField $report 'Total Renders'
225230
InAppRendersPerSec = $rendersPerSec
226231
InAppAvgUpdateMs = Parse-FloatField $report 'Avg Update'
232+
InAppAvgMountMs = Parse-FloatField $report 'Avg Mount'
227233
PeakRssMB = [math]::Round($peakRss / 1MB, 1)
228234
}
229235
$results += $row
@@ -234,11 +240,14 @@ foreach ($v in $variants) {
234240
# Per-run rows.
235241
$results | Export-Csv -Path $CsvPath -NoTypeInformation
236242

237-
# Per-(variant,percent) summary stats across all repeats.
243+
# Per-(variant,percent) summary stats across all repeats. Filters NaN —
244+
# Avg Update is missing on RN rows, Avg Mount is missing on C# rows, both
245+
# come back as NaN from Parse-FloatField.
238246
function Median {
239247
param([double[]]$Values)
240-
if ($Values.Count -eq 0) { return 0 }
241-
$sorted = $Values | Sort-Object
248+
$clean = @($Values | Where-Object { -not [double]::IsNaN($_) })
249+
if ($clean.Count -eq 0) { return [double]::NaN }
250+
$sorted = $clean | Sort-Object
242251
$n = $sorted.Count
243252
if ($n % 2 -eq 1) { return [double]$sorted[[int]([math]::Floor($n / 2))] }
244253
return ([double]$sorted[$n/2 - 1] + [double]$sorted[$n/2]) / 2.0
@@ -252,6 +261,7 @@ $summary = $results | Group-Object -Property Variant, Percent | ForEach-Object {
252261
$etw = [double[]]@($rows | ForEach-Object { [double]$_.EtwPresentPerSec })
253262
$rps = [double[]]@($rows | ForEach-Object { [double]$_.InAppRendersPerSec })
254263
$upd = [double[]]@($rows | ForEach-Object { [double]$_.InAppAvgUpdateMs })
264+
$mnt = [double[]]@($rows | ForEach-Object { [double]$_.InAppAvgMountMs })
255265
$rss = [double[]]@($rows | ForEach-Object { [double]$_.PeakRssMB })
256266
$vsync = [double[]]@($rows | ForEach-Object { [double]$_.GlobalVsyncPerSec })
257267
# Mode of bottleneck across runs.
@@ -272,6 +282,7 @@ $summary = $results | Group-Object -Property Variant, Percent | ForEach-Object {
272282
EtwPresent_Max = [math]::Round(($etw | Measure-Object -Maximum).Maximum, 2)
273283
InAppRendersPerSec_Med = [math]::Round((Median $rps), 2)
274284
InAppAvgUpdateMs_Med = [math]::Round((Median $upd), 2)
285+
InAppAvgMountMs_Med = [math]::Round((Median $mnt), 2)
275286
PeakRssMB_Med = [math]::Round((Median $rss), 1)
276287
}
277288
}

tests/stress_perf_rn/StocksGrid/App.tsx

Lines changed: 22 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -4,15 +4,17 @@
44
//
55
// Layout: 70 cols × 70 rows = 4,900 cells, 64×18 each, FontSize 8.
66
// Update loop: setInterval @ 33 ms; each tick mutates N% of cells (slider).
7-
// Stats: FPS via requestAnimationFrame, update time via stopwatch, memory
8-
// via performance.memory (Hermes exposes usedJSHeapSize on RN-Windows).
7+
// Stats: FPS via requestAnimationFrame; mount time via beginMount-stamp
8+
// before setSnapshot + recordMountCommit-on-useLayoutEffect (rAF after
9+
// commit); JS heap via performance.memory (diagnostic only — RSS is
10+
// captured by the harness externally).
911
//
1012
// Match policy: behaviorally identical to the Reactor app — same data
1113
// generation algorithm (StockDataSource.ts), same tick rate, same sample
1214
// schedule. PerfTracker writes the same flavor of report file.
1315

1416
import * as React from 'react';
15-
import { useEffect, useMemo, useRef, useState } from 'react';
17+
import { useEffect, useLayoutEffect, useMemo, useRef, useState } from 'react';
1618
import {
1719
Button,
1820
ScrollView,
@@ -109,8 +111,8 @@ export default function App(props: AppProps) {
109111
const [percent, setPercent] = useState<number>(initialPercent);
110112
const [running, setRunning] = useState<boolean>(false);
111113
const [fpsLabel, setFpsLabel] = useState('FPS: --');
112-
const [updateLabel, setUpdateLabel] = useState('Update: -- ms');
113-
const [memLabel, setMemLabel] = useState('Mem: -- MB');
114+
const [mountLabel, setMountLabel] = useState('Mount: -- ms');
115+
const [memLabel, setMemLabel] = useState('JS Heap: -- MB');
114116
// Surfaces the final headless report in a single TextBlock so we can read
115117
// it back via UI Automation (WinUI .exe apps have no stdout by default).
116118
const [report, setReport] = useState<string>('');
@@ -129,16 +131,16 @@ export default function App(props: AppProps) {
129131
return stop;
130132
}, []);
131133

132-
// Update loop. Mirrors Reactor variant's UseEffect on (running, percent).
134+
// Update loop. Stamps T0 immediately before setSnapshot; the
135+
// useLayoutEffect below records a mount-time sample once React commits.
136+
// Bracketing setSnapshot with a synchronous begin/end span (the C# pattern)
137+
// would only measure JS dispatch — Fabric's commit pipeline is async.
133138
useEffect(() => {
134139
if (!running) return;
135140
const perf = perfRef.current;
136141
const handle = setInterval(() => {
137-
perf.beginUpdate();
138142
const changed = source.update(percent);
139-
// Build a new snapshot — but only patch the changed indices, like the
140-
// Direct variant. Allocates a new outer array (so React detects it)
141-
// but reuses unchanged Cell objects (so React.memo skips them).
143+
perf.beginMount();
142144
setSnapshot(prev => {
143145
const next = prev.slice();
144146
for (const idx of changed) {
@@ -151,15 +153,21 @@ export default function App(props: AppProps) {
151153
}
152154
return next;
153155
});
154-
perf.endUpdate();
155156

156157
setFpsLabel(`FPS: ${perf.fps.toFixed(0)}`);
157-
setUpdateLabel(`Update: ${perf.updateMs.toFixed(1)} ms`);
158-
setMemLabel(`Mem: ${perf.memoryMB} MB`);
158+
setMountLabel(`Mount: ${perf.mountMs.toFixed(1)} ms`);
159+
setMemLabel(`JS Heap: ${perf.jsHeapMB} MB`);
159160
}, TICK_MS);
160161
return () => clearInterval(handle);
161162
}, [running, percent, source]);
162163

164+
// Records a mount-time sample after each commit. useLayoutEffect runs
165+
// post-commit on the JS thread; the tracker schedules a single rAF inside
166+
// so the sample brackets through to the next display frame.
167+
useLayoutEffect(() => {
168+
perfRef.current.recordMountCommit();
169+
}, [snapshot]);
170+
163171
// Headless auto-start mirrors the Reactor variant's CliOpts.Headless path.
164172
useEffect(() => {
165173
if (!headless) return;
@@ -220,7 +228,7 @@ export default function App(props: AppProps) {
220228
<Button title="50%" onPress={() => setPercent(50)} disabled={percent === 50} />
221229
<Button title="100%" onPress={() => setPercent(100)} disabled={percent === 100} />
222230
<Text style={[styles.toolbarText, styles.fixedW90]}>{fpsLabel}</Text>
223-
<Text style={[styles.toolbarText, styles.fixedW120]}>{updateLabel}</Text>
231+
<Text style={[styles.toolbarText, styles.fixedW120]}>{mountLabel}</Text>
224232
<Text style={[styles.toolbarText, styles.fixedW120]}>{memLabel}</Text>
225233
</View>
226234
{!!report && (

0 commit comments

Comments
 (0)