fix(perf): keep PerfStress responsive at 250/500 elements#321
Conversation
Two complementary fixes for the Reactor.TestApp PerfStress demo, where the
UI froze under sustained high-frequency setState from an async sort loop.
ReactorHost / ReactorHostControl: RequestRender already coalesces concurrent
setStates within one dispatch tick, but only demoted to LOW priority when a
state change happened *during* a render. The PerfStress pattern (setState
between `await Task.Delay` ticks) never hit that path, so renders piled up at
Normal priority and starved input/layout/paint on the same UI thread.
RenderPriorityPolicy now demotes the next TryEnqueue to Low whenever the
previous render exceeded a 16 ms frame budget — same mechanism the
existing in-render path already uses, just extended to the
between-renders path.
PerfStressDemo: each render allocated ~1500 fresh SolidColorBrush instances
at 500 elements (3 per bar × string `.Background("#hex")` overload), and
because every brush had a different reference, UpdateBorder wrote
b.Background unconditionally and WinUI re-painted every border every tick.
A [ThreadStatic] brush cache keyed by color string collapses that to ~16
brushes for the lifetime of the demo and lets the reconciler's reference
check short-circuit the redundant Background writes.
Tests:
- 9 new RenderPriorityPolicyTests pin the priority decisions and the
ReactorHost/ReactorHostControl wiring contract.
- Full unit suite (7534 tests) and selftest suite (689 tests) pass.
There was a problem hiding this comment.
Pull request overview
This PR improves UI responsiveness in the PerfStress demo under heavy load by (1) demoting render enqueues to DispatcherQueuePriority.Low after a slow frame so input/layout/paint can interleave, and (2) reducing per-render brush allocations in the demo via a thread-local brush cache to keep background references stable.
Changes:
- Added
RenderPriorityPolicyto choose dispatcher enqueue priority based on the previous render’s duration. - Wired
_lastRenderMstracking + priority selection into bothReactorHostandReactorHostControl. - Optimized
PerfStressDemoto reuseSolidColorBrushinstances per thread; added unit tests for the policy and host wiring.
Show a summary per file
| File | Description |
|---|---|
src/Reactor/Hosting/RenderPriorityPolicy.cs |
Introduces the priority-selection policy based on render duration. |
src/Reactor/Hosting/ReactorHost.cs |
Tracks last render duration and uses it to enqueue subsequent renders at an appropriate dispatcher priority. |
src/Reactor/Hosting/ReactorHostControl.cs |
Mirrors the host behavior for embedded control hosting scenarios. |
samples/Reactor.TestApp/Demos/PerfStressDemo.cs |
Adds a per-thread brush cache to avoid per-render brush allocations and stabilize brush references. |
tests/Reactor.Tests/Hosting/RenderPriorityPolicyTests.cs |
Adds tests for priority decisions and verifies host/control tracking fields exist. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (2)
src/Reactor/Hosting/ReactorHost.cs:737
_lastRenderMsis written on the UI thread butRequestRender()may read it from other threads. To keep the “thread-safe” contract, consider writing this via an atomic op (e.g.,Interlocked.Exchange) so the new value is published safely to other threads.
// Feed RenderPriorityPolicy so the next RequestRender knows whether
// to demote to Low priority. Stored as the most-recent measurement
// — no smoothing — so a single slow render is enough to back off,
// and a single fast render is enough to return to Normal priority.
_lastRenderMs = treeBuildMs + reconcileMs + effectsMs;
src/Reactor/Hosting/ReactorHostControl.cs:523
_lastRenderMsis written inRender()butRequestRender()is documented as thread-safe. Consider publishing this value with an atomic write (e.g.,Interlocked.Exchange) to avoid torn writes / visibility issues when other threads request a render.
// Feed RenderPriorityPolicy. See matching note in ReactorHost.Render.
_lastRenderMs = treeBuildMs + reconcileMs + effectsMs;
- Files reviewed: 5/5 changed files
- Comments generated: 4
_lastRenderMs is written from the UI thread inside Render() but read from RequestRender(), which is documented thread-safe and may be called from any thread. A plain double write isn't guaranteed atomic on 32-bit and lacks publication semantics. Use Interlocked.Exchange on the write side and Volatile.Read on the read side so off-UI-thread callers observe a non-torn, freshly published value — matching the Interlocked pattern already used for _renderPending in the same files.
Stress run analysisRun 25999336086: 98/100 iterations passed. The two failures are pre-existing flakies, not regressions from this PR. Per-shard outcomes
Both failures happened on iteration 1 (cold JIT), then all 8 retries on the same machines passed clean. Failure 1:
|
stress_perf delta — confirming the timing harness is unaffectedRan the stocks-grid stress matrix at 10/50/100% workload, 7s each, headless x64 Release, on this PR's HEAD ( Per-run results
Reactor & ReactorOptimized phase breakdown (from per-run reports):
Versus committed baseline (
|
| 10% | 50% | 100% | |
|---|---|---|---|
| Direct — baseline FPS | 10.11 | 3.11 | 2.44 |
| Direct — this PR | 11.5 | 4.0 | 2.7 |
| Reactor — baseline | 8.11 | 3.67 | 2.89 |
| Reactor — this PR | 11.0 | 5.0 | 3.5 |
Why the harness is intact
- Duration stability — every non-WPF run reports
Duration: 8.X s(7 s measurement + ~1.3-1.5 s spin-up/teardown), within ±5% across all 9 runs. Wpf consistently lands ~10.7s (its own startup overhead, matches historical behavior). The--duration 7flag is honored. - Cross-framework ordering preserved — ReactorOptimized > Reactor > Direct at 50%/100%, Direct ≈ Reactor at 10%. Matches baseline.
- PerfTracker phase data is sane — per-phase timings,
renders/tickratios (0.90-0.98),Avg Updatems, peak memory all in expected ranges.OnRenderComplete(the hook PerfTracker uses) is untouched by this PR.
Why Reactor's absolute FPS is higher than baseline (not a measurement artifact)
Reactor and ReactorOptimized run 25-40% above their baseline at every percent. Direct WinUI is also up ~10-30% — the same machine-state lift hits the non-Reactor control too, so this is "this machine is faster than the baseline machine," not Reactor's measurement getting distorted.
RenderPriorityPolicy actually does fire in this workload (the 33 ms timer-tick cadence keeps _lastRenderMs well above the 16 ms budget, so most enqueues land at Low priority). The dispatcher drains them fast because the workload is single-source, so total throughput is unaffected — the policy's purpose is to yield to competing dispatcher work (input, layout, paint) when present, not to throttle Reactor when it's the only thing on the queue. Exactly the design.
Why Wpf is below its baseline
The committed baseline is ETW Present counts on AC with DRR (Dynamic Refresh Rate) active. Headless "easy mode" under-measures Wpf specifically because WPF's render-thread isolation pays off only against a real display refresh cycle. This isn't caused by this PR — it's an artifact of the easy-mode harness on Wpf, documented in tests/stress_perf/METHODOLOGY.md. Direct/Reactor/ReactorOptimized are within ~10% of their "easy mode" expectation; only Wpf shows the gap.
Conclusion
Perf timing harness is unaffected by this PR. Recorded here so we have a snapshot if anything changes downstream.
Summary
TryEnqueuelands atDispatcherQueuePriority.Lowso the message pump can interleave input/layout/paint. NewRenderPriorityPolicyhelper; wired into bothReactorHostandReactorHostControl..Background(\"#hex\")allocations (~1500/render at 500 elements) with a[ThreadStatic]brush dictionary so the reconciler sees reference-stable brushes and skips redundantBorder.Backgroundwrites.The bug
PerfStressDemoruns an async quicksort, callingsetStatefour times thenawait Task.Delay(16). At 250 elements the app stutters; at 500 it appears hung until the sort completes. The host already coalesces concurrent setStates within one dispatch tick, but the LOW-priority demotion inRenderLooponly fired when state changed during a render. The PerfStress pattern sets state between awaits, so every render was enqueued at NORMAL priority — back-to-back renders + Task.Delay continuations starved the UI message pump of input/layout/paint slots.On top of that, each render allocated a fresh
SolidColorBrushper.Background(\"#hex\")call (~1500 per tick at 500 bars). Because the brush instance was reference-different every render,UpdateBorderwroteb.Background = n.Backgroundunconditionally and WinUI re-painted every border every tick — even when the color hadn't actually changed.What changed
src/Reactor/Hosting/RenderPriorityPolicy.cs(new)PickPriority(lastRenderMs)→Lowif past 16 ms budget, elseNormal.src/Reactor/Hosting/ReactorHost.cs_lastRenderMs;RequestRenderenqueues viaPickPriority(_lastRenderMs).src/Reactor/Hosting/ReactorHostControl.cssamples/Reactor.TestApp/Demos/PerfStressDemo.cs[ThreadStatic]brush dictionary +Brush(string)helper; replace 6 hot-path.Background(\"#hex\")calls with.Background(Brush(\"#hex\")).tests/Reactor.Tests/Hosting/RenderPriorityPolicyTests.cs(new)_lastRenderMswiring contract.The two fixes are complementary: the scheduler change keeps the UI responsive even when individual renders are expensive; the brush cache makes individual renders cheaper.
Test plan
dotnet test tests/Reactor.Tests -p:Platform=x64— 7534 passed, 0 failed.dotnet test tests/Reactor.SelfTests -p:Platform=x64— 689 passed, 0 failed (2m17s).dotnet build samples/Reactor.TestApp -p:Platform=x64— clean.