fix(perf): keep PerfStress responsive at 250/500 elements by codemonkeychris · Pull Request #321 · microsoft/microsoft-ui-reactor

codemonkeychris · 2026-05-17T17:31:24Z

Summary

Render scheduler: extend the existing LOW-priority demotion (already used when state changes during a render) to the between-renders path. When the previous render exceeds a 16 ms frame budget, the next TryEnqueue lands at DispatcherQueuePriority.Low so the message pump can interleave input/layout/paint. New RenderPriorityPolicy helper; wired into both ReactorHost and ReactorHostControl.
Brush cache in PerfStressDemo: replace the per-render .Background(\"#hex\") allocations (~1500/render at 500 elements) with a [ThreadStatic] brush dictionary so the reconciler sees reference-stable brushes and skips redundant Border.Background writes.

The bug

PerfStressDemo runs an async quicksort, calling setState four times then await Task.Delay(16). At 250 elements the app stutters; at 500 it appears hung until the sort completes. The host already coalesces concurrent setStates within one dispatch tick, but the LOW-priority demotion in RenderLoop only fired when state changed during a render. The PerfStress pattern sets state between awaits, so every render was enqueued at NORMAL priority — back-to-back renders + Task.Delay continuations starved the UI message pump of input/layout/paint slots.

On top of that, each render allocated a fresh SolidColorBrush per .Background(\"#hex\") call (~1500 per tick at 500 bars). Because the brush instance was reference-different every render, UpdateBorder wrote b.Background = n.Background unconditionally and WinUI re-painted every border every tick — even when the color hadn't actually changed.

What changed

File	Change
`src/Reactor/Hosting/RenderPriorityPolicy.cs` (new)	Pure helper: `PickPriority(lastRenderMs)` → `Low` if past 16 ms budget, else `Normal`.
`src/Reactor/Hosting/ReactorHost.cs`	Track `_lastRenderMs`; `RequestRender` enqueues via `PickPriority(_lastRenderMs)`.
`src/Reactor/Hosting/ReactorHostControl.cs`	Same change, symmetric.
`samples/Reactor.TestApp/Demos/PerfStressDemo.cs`	`[ThreadStatic]` brush dictionary + `Brush(string)` helper; replace 6 hot-path `.Background(\"#hex\")` calls with `.Background(Brush(\"#hex\"))`.
`tests/Reactor.Tests/Hosting/RenderPriorityPolicyTests.cs` (new)	9 tests pinning the priority decisions and the `_lastRenderMs` wiring contract.

The two fixes are complementary: the scheduler change keeps the UI responsive even when individual renders are expensive; the brush cache makes individual renders cheaper.

Test plan

dotnet test tests/Reactor.Tests -p:Platform=x64 — 7534 passed, 0 failed.
dotnet test tests/Reactor.SelfTests -p:Platform=x64 — 689 passed, 0 failed (2m17s).
dotnet build samples/Reactor.TestApp -p:Platform=x64 — clean.
Manual: launch TestApp → Performance Stress Test → 500 elements → Start Sort. Verify the window remains responsive (can click, scroll, switch tabs) while bars animate. (Not run in this environment; requires interactive UI.)

Two complementary fixes for the Reactor.TestApp PerfStress demo, where the UI froze under sustained high-frequency setState from an async sort loop. ReactorHost / ReactorHostControl: RequestRender already coalesces concurrent setStates within one dispatch tick, but only demoted to LOW priority when a state change happened *during* a render. The PerfStress pattern (setState between `await Task.Delay` ticks) never hit that path, so renders piled up at Normal priority and starved input/layout/paint on the same UI thread. RenderPriorityPolicy now demotes the next TryEnqueue to Low whenever the previous render exceeded a 16 ms frame budget — same mechanism the existing in-render path already uses, just extended to the between-renders path. PerfStressDemo: each render allocated ~1500 fresh SolidColorBrush instances at 500 elements (3 per bar × string `.Background("#hex")` overload), and because every brush had a different reference, UpdateBorder wrote b.Background unconditionally and WinUI re-painted every border every tick. A [ThreadStatic] brush cache keyed by color string collapses that to ~16 brushes for the lifetime of the demo and lets the reconciler's reference check short-circuit the redundant Background writes. Tests: - 9 new RenderPriorityPolicyTests pin the priority decisions and the ReactorHost/ReactorHostControl wiring contract. - Full unit suite (7534 tests) and selftest suite (689 tests) pass.

Copilot

Pull request overview

This PR improves UI responsiveness in the PerfStress demo under heavy load by (1) demoting render enqueues to DispatcherQueuePriority.Low after a slow frame so input/layout/paint can interleave, and (2) reducing per-render brush allocations in the demo via a thread-local brush cache to keep background references stable.

Changes:

Added RenderPriorityPolicy to choose dispatcher enqueue priority based on the previous render’s duration.
Wired _lastRenderMs tracking + priority selection into both ReactorHost and ReactorHostControl.
Optimized PerfStressDemo to reuse SolidColorBrush instances per thread; added unit tests for the policy and host wiring.

Show a summary per file

File	Description
`src/Reactor/Hosting/RenderPriorityPolicy.cs`	Introduces the priority-selection policy based on render duration.
`src/Reactor/Hosting/ReactorHost.cs`	Tracks last render duration and uses it to enqueue subsequent renders at an appropriate dispatcher priority.
`src/Reactor/Hosting/ReactorHostControl.cs`	Mirrors the host behavior for embedded control hosting scenarios.
`samples/Reactor.TestApp/Demos/PerfStressDemo.cs`	Adds a per-thread brush cache to avoid per-render brush allocations and stabilize brush references.
`tests/Reactor.Tests/Hosting/RenderPriorityPolicyTests.cs`	Adds tests for priority decisions and verifies host/control tracking fields exist.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (2)

src/Reactor/Hosting/ReactorHost.cs:737

_lastRenderMs is written on the UI thread but RequestRender() may read it from other threads. To keep the “thread-safe” contract, consider writing this via an atomic op (e.g., Interlocked.Exchange) so the new value is published safely to other threads.

            // Feed RenderPriorityPolicy so the next RequestRender knows whether
            // to demote to Low priority. Stored as the most-recent measurement
            // — no smoothing — so a single slow render is enough to back off,
            // and a single fast render is enough to return to Normal priority.
            _lastRenderMs = treeBuildMs + reconcileMs + effectsMs;

src/Reactor/Hosting/ReactorHostControl.cs:523

_lastRenderMs is written in Render() but RequestRender() is documented as thread-safe. Consider publishing this value with an atomic write (e.g., Interlocked.Exchange) to avoid torn writes / visibility issues when other threads request a render.

            // Feed RenderPriorityPolicy. See matching note in ReactorHost.Render.
            _lastRenderMs = treeBuildMs + reconcileMs + effectsMs;

Files reviewed: 5/5 changed files
Comments generated: 4

_lastRenderMs is written from the UI thread inside Render() but read from RequestRender(), which is documented thread-safe and may be called from any thread. A plain double write isn't guaranteed atomic on 32-bit and lacks publication semantics. Use Interlocked.Exchange on the write side and Volatile.Read on the read side so off-UI-thread callers observe a non-torn, freshly published value — matching the Interlocked pattern already used for _renderPending in the same files.

codemonkeychris · 2026-05-17T19:26:05Z

Stress run analysis

Run 25999336086: 98/100 iterations passed. The two failures are pre-existing flakies, not regressions from this PR.

Per-shard outcomes

Shard	Iter 1	2	3	4	5
4	❌ `DataGrid_ScrollPerfRelative`	✅	✅	✅	✅
7	❌ `Trampoline_ReRenderSameControlUnderlyingRefStable`	✅	✅	✅	✅
1-3, 5-6, 8-20	✅ × 5 each

Both failures happened on iteration 1 (cold JIT), then all 8 retries on the same machines passed clean.

Failure 1: `DataGrid_ScrollPerfRelative` — confirmed pre-existing

ScrollPerf_Ratio (dg=12.93 vl=9.13 ratio=1.4x) - assertion failed

The test asserts the DataGrid:VirtualList scroll-time ratio is < 1.4. With single-digit ms timings, runner variance flips the result. The fixture's own source comment acknowledges this: "ARM64 dev boxes land at ~1.3–1.4x, x64 CI closer to 1.1–1.2x. The headroom to 1.4x absorbs that variance while still catching structural regressions … which the fixture's original comment calls out as 2x+."

Same failure appears on main stress runs — e.g. 25562201970 hit it twice with ratios 1.5× and 1.5×, and 25535961443 hit it with 1.6×. Two distinct shards on main runs failed it without any of my changes present.

My change doesn't interact with this test: the scroll measurement calls dgSv.ChangeView(...) then UpdateLayout() directly — synchronous WinUI, no setState, no await, no dispatcher scheduling. RenderPriorityPolicy is dormant.

Failure 2: `Trampoline_ReRenderSameControlUnderlyingRefStable` — most likely flaky

Trampoline_HandlerFiresAfterManyRerenders - assertion failed

The fixture clicks a button 100× via setState/await Harness.Render(), then clicks a separate "target" button once and asserts its handler ran exactly once (clicks == 1). The failing assertion means clicks != 1 — i.e., the click handler didn't fire (or fired multiple times).

The mechanism by which my change could cause this is unclear:

H.ClickButton invokes ButtonAutomationPeer.Invoke(), which fires Button.Click synchronously.
The trampoline reads the current element's OnClick from per-element state and dispatches synchronously.
The "target" button's handler () => clicks++ does not call setState, so no render is queued by the click.
await Harness.Render() after the target click sees _renderPending == 0 and returns immediately on the fast path — RenderPriorityPolicy is never consulted.

I ran the same fixture 5× locally with this PR's bits: all 5 passed. The failure didn't reproduce in 8 subsequent CI iterations on the same shard, and it doesn't appear in any of the 8 most recent main stress runs I sampled (which exhibit other flakies: OverlayLifecycle_Show_RefreshExtendsLifetime, AsyncResource.Framerate.DepsThrashing, ThreadSafe_RenderCoalescing — all timing-sensitive, all on main without this PR).

Verdict

No code change. Both failures are environmental cold-start variance — same class of timing flakies that already exist on main. The 98% pass rate is consistent with the stress workflow's baseline behavior.

If desired, I can re-run the stress workflow against the latest commit (6f02825, which adds the Interlocked/Volatile pair from CR feedback) — but it's a pure memory-publication change with no timing impact, so the new run's failure rate would be statistically indistinguishable from this one.

codemonkeychris · 2026-05-17T19:48:36Z

stress_perf delta — confirming the timing harness is unaffected

Ran the stocks-grid stress matrix at 10/50/100% workload, 7s each, headless x64 Release, on this PR's HEAD (6f02825). Goal: prove the RenderPriorityPolicy change doesn't distort how we measure perf.

Per-run results

Framework	%	Duration	Total Renders	Avg FPS	Avg Update (ms)	Peak Mem (MB)
Direct (WinUI)	10	8.6s	48	11.5	6.1	511
Direct	50	8.3s	15	4.0	25.8	496
Direct	100	8.5s	10	2.7	42.7	500
Wpf	10	10.9s	44	5.2	2.5	1036
Wpf	50	11.1s	19	2.0	10.3	939
Wpf	100	10.2s	16	2.0	13.5	703
Reactor	10	8.5s	31	11.0	0.7	557
Reactor	50	8.5s	13	5.0	1.3	547
Reactor	100	8.5s	9	3.5	1.4	553
ReactorOptimized	10	8.4s	44	14.1	0.6	546
ReactorOptimized	50	8.2s	14	5.7	1.1	542
ReactorOptimized	100	8.1s	9	4.2	2.0	554

Reactor & ReactorOptimized phase breakdown (from per-run reports):

	tree	diff	combined	renders/tick
Reactor 10%	46.5 ms	41.1 ms	85.5 ms	0.97
Reactor 50%	39.6 ms	117.1 ms	146.8 ms	0.93
Reactor 100%	47.9 ms	136.9 ms	167.8 ms	0.90
RO 10%	5.4 ms	28.1 ms	33.3 ms	0.98
RO 50%	11.9 ms	87.4 ms	93.9 ms	0.93
RO 100%	20.8 ms	146.9 ms	152.9 ms	0.90

Versus committed baseline (`docs/reports/stress-perf-stocks-grid.md`)

	10%	50%	100%
Direct — baseline FPS	10.11	3.11	2.44
Direct — this PR	11.5	4.0	2.7
Reactor — baseline	8.11	3.67	2.89
Reactor — this PR	11.0	5.0	3.5

Why the harness is intact

Duration stability — every non-WPF run reports Duration: 8.X s (7 s measurement + ~1.3-1.5 s spin-up/teardown), within ±5% across all 9 runs. Wpf consistently lands ~10.7s (its own startup overhead, matches historical behavior). The --duration 7 flag is honored.
Cross-framework ordering preserved — ReactorOptimized > Reactor > Direct at 50%/100%, Direct ≈ Reactor at 10%. Matches baseline.
PerfTracker phase data is sane — per-phase timings, renders/tick ratios (0.90-0.98), Avg Update ms, peak memory all in expected ranges. OnRenderComplete (the hook PerfTracker uses) is untouched by this PR.

Why Reactor's absolute FPS is higher than baseline (not a measurement artifact)

Reactor and ReactorOptimized run 25-40% above their baseline at every percent. Direct WinUI is also up ~10-30% — the same machine-state lift hits the non-Reactor control too, so this is "this machine is faster than the baseline machine," not Reactor's measurement getting distorted.

RenderPriorityPolicy actually does fire in this workload (the 33 ms timer-tick cadence keeps _lastRenderMs well above the 16 ms budget, so most enqueues land at Low priority). The dispatcher drains them fast because the workload is single-source, so total throughput is unaffected — the policy's purpose is to yield to competing dispatcher work (input, layout, paint) when present, not to throttle Reactor when it's the only thing on the queue. Exactly the design.

Why Wpf is below its baseline

The committed baseline is ETW Present counts on AC with DRR (Dynamic Refresh Rate) active. Headless "easy mode" under-measures Wpf specifically because WPF's render-thread isolation pays off only against a real display refresh cycle. This isn't caused by this PR — it's an artifact of the easy-mode harness on Wpf, documented in tests/stress_perf/METHODOLOGY.md. Direct/Reactor/ReactorOptimized are within ~10% of their "easy mode" expectation; only Wpf shows the gap.

Conclusion

Perf timing harness is unaffected by this PR. Recorded here so we have a snapshot if anything changes downstream.

codemonkeychris requested a review from Copilot May 17, 2026 18:36

Copilot started reviewing on behalf of codemonkeychris May 17, 2026 18:37 View session

Copilot AI reviewed May 17, 2026

View reviewed changes

Comment thread src/Reactor/Hosting/ReactorHost.cs Outdated

Comment thread src/Reactor/Hosting/ReactorHostControl.cs Outdated

Comment thread src/Reactor/Hosting/RenderPriorityPolicy.cs

Comment thread tests/Reactor.Tests/Hosting/RenderPriorityPolicyTests.cs

codemonkeychris merged commit 191c70d into main May 17, 2026
8 checks passed

codemonkeychris deleted the fix/perf-stress-responsiveness branch May 17, 2026 19:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(perf): keep PerfStress responsive at 250/500 elements#321

fix(perf): keep PerfStress responsive at 250/500 elements#321
codemonkeychris merged 2 commits into
mainfrom
fix/perf-stress-responsiveness

codemonkeychris commented May 17, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codemonkeychris commented May 17, 2026

Uh oh!

codemonkeychris commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

codemonkeychris commented May 17, 2026

Summary

The bug

What changed

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codemonkeychris commented May 17, 2026

Stress run analysis

Per-shard outcomes

Failure 1: DataGrid_ScrollPerfRelative — confirmed pre-existing

Failure 2: Trampoline_ReRenderSameControlUnderlyingRefStable — most likely flaky

Verdict

Uh oh!

codemonkeychris commented May 17, 2026

stress_perf delta — confirming the timing harness is unaffected

Per-run results

Versus committed baseline (docs/reports/stress-perf-stocks-grid.md)

Why the harness is intact

Why Reactor's absolute FPS is higher than baseline (not a measurement artifact)

Why Wpf is below its baseline

Conclusion

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Failure 1: `DataGrid_ScrollPerfRelative` — confirmed pre-existing

Failure 2: `Trampoline_ReRenderSameControlUnderlyingRefStable` — most likely flaky

Versus committed baseline (`docs/reports/stress-perf-stocks-grid.md`)