Skip to content

fix(perf): keep PerfStress responsive at 250/500 elements#321

Merged
codemonkeychris merged 2 commits into
mainfrom
fix/perf-stress-responsiveness
May 17, 2026
Merged

fix(perf): keep PerfStress responsive at 250/500 elements#321
codemonkeychris merged 2 commits into
mainfrom
fix/perf-stress-responsiveness

Conversation

@codemonkeychris
Copy link
Copy Markdown
Collaborator

Summary

  • Render scheduler: extend the existing LOW-priority demotion (already used when state changes during a render) to the between-renders path. When the previous render exceeds a 16 ms frame budget, the next TryEnqueue lands at DispatcherQueuePriority.Low so the message pump can interleave input/layout/paint. New RenderPriorityPolicy helper; wired into both ReactorHost and ReactorHostControl.
  • Brush cache in PerfStressDemo: replace the per-render .Background(\"#hex\") allocations (~1500/render at 500 elements) with a [ThreadStatic] brush dictionary so the reconciler sees reference-stable brushes and skips redundant Border.Background writes.

The bug

PerfStressDemo runs an async quicksort, calling setState four times then await Task.Delay(16). At 250 elements the app stutters; at 500 it appears hung until the sort completes. The host already coalesces concurrent setStates within one dispatch tick, but the LOW-priority demotion in RenderLoop only fired when state changed during a render. The PerfStress pattern sets state between awaits, so every render was enqueued at NORMAL priority — back-to-back renders + Task.Delay continuations starved the UI message pump of input/layout/paint slots.

On top of that, each render allocated a fresh SolidColorBrush per .Background(\"#hex\") call (~1500 per tick at 500 bars). Because the brush instance was reference-different every render, UpdateBorder wrote b.Background = n.Background unconditionally and WinUI re-painted every border every tick — even when the color hadn't actually changed.

What changed

File Change
src/Reactor/Hosting/RenderPriorityPolicy.cs (new) Pure helper: PickPriority(lastRenderMs)Low if past 16 ms budget, else Normal.
src/Reactor/Hosting/ReactorHost.cs Track _lastRenderMs; RequestRender enqueues via PickPriority(_lastRenderMs).
src/Reactor/Hosting/ReactorHostControl.cs Same change, symmetric.
samples/Reactor.TestApp/Demos/PerfStressDemo.cs [ThreadStatic] brush dictionary + Brush(string) helper; replace 6 hot-path .Background(\"#hex\") calls with .Background(Brush(\"#hex\")).
tests/Reactor.Tests/Hosting/RenderPriorityPolicyTests.cs (new) 9 tests pinning the priority decisions and the _lastRenderMs wiring contract.

The two fixes are complementary: the scheduler change keeps the UI responsive even when individual renders are expensive; the brush cache makes individual renders cheaper.

Test plan

  • dotnet test tests/Reactor.Tests -p:Platform=x64 — 7534 passed, 0 failed.
  • dotnet test tests/Reactor.SelfTests -p:Platform=x64 — 689 passed, 0 failed (2m17s).
  • dotnet build samples/Reactor.TestApp -p:Platform=x64 — clean.
  • Manual: launch TestApp → Performance Stress Test → 500 elements → Start Sort. Verify the window remains responsive (can click, scroll, switch tabs) while bars animate. (Not run in this environment; requires interactive UI.)

Two complementary fixes for the Reactor.TestApp PerfStress demo, where the
UI froze under sustained high-frequency setState from an async sort loop.

ReactorHost / ReactorHostControl: RequestRender already coalesces concurrent
setStates within one dispatch tick, but only demoted to LOW priority when a
state change happened *during* a render. The PerfStress pattern (setState
between `await Task.Delay` ticks) never hit that path, so renders piled up at
Normal priority and starved input/layout/paint on the same UI thread.
RenderPriorityPolicy now demotes the next TryEnqueue to Low whenever the
previous render exceeded a 16 ms frame budget — same mechanism the
existing in-render path already uses, just extended to the
between-renders path.

PerfStressDemo: each render allocated ~1500 fresh SolidColorBrush instances
at 500 elements (3 per bar × string `.Background("#hex")` overload), and
because every brush had a different reference, UpdateBorder wrote
b.Background unconditionally and WinUI re-painted every border every tick.
A [ThreadStatic] brush cache keyed by color string collapses that to ~16
brushes for the lifetime of the demo and lets the reconciler's reference
check short-circuit the redundant Background writes.

Tests:
- 9 new RenderPriorityPolicyTests pin the priority decisions and the
  ReactorHost/ReactorHostControl wiring contract.
- Full unit suite (7534 tests) and selftest suite (689 tests) pass.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves UI responsiveness in the PerfStress demo under heavy load by (1) demoting render enqueues to DispatcherQueuePriority.Low after a slow frame so input/layout/paint can interleave, and (2) reducing per-render brush allocations in the demo via a thread-local brush cache to keep background references stable.

Changes:

  • Added RenderPriorityPolicy to choose dispatcher enqueue priority based on the previous render’s duration.
  • Wired _lastRenderMs tracking + priority selection into both ReactorHost and ReactorHostControl.
  • Optimized PerfStressDemo to reuse SolidColorBrush instances per thread; added unit tests for the policy and host wiring.
Show a summary per file
File Description
src/Reactor/Hosting/RenderPriorityPolicy.cs Introduces the priority-selection policy based on render duration.
src/Reactor/Hosting/ReactorHost.cs Tracks last render duration and uses it to enqueue subsequent renders at an appropriate dispatcher priority.
src/Reactor/Hosting/ReactorHostControl.cs Mirrors the host behavior for embedded control hosting scenarios.
samples/Reactor.TestApp/Demos/PerfStressDemo.cs Adds a per-thread brush cache to avoid per-render brush allocations and stabilize brush references.
tests/Reactor.Tests/Hosting/RenderPriorityPolicyTests.cs Adds tests for priority decisions and verifies host/control tracking fields exist.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (2)

src/Reactor/Hosting/ReactorHost.cs:737

  • _lastRenderMs is written on the UI thread but RequestRender() may read it from other threads. To keep the “thread-safe” contract, consider writing this via an atomic op (e.g., Interlocked.Exchange) so the new value is published safely to other threads.
            // Feed RenderPriorityPolicy so the next RequestRender knows whether
            // to demote to Low priority. Stored as the most-recent measurement
            // — no smoothing — so a single slow render is enough to back off,
            // and a single fast render is enough to return to Normal priority.
            _lastRenderMs = treeBuildMs + reconcileMs + effectsMs;

src/Reactor/Hosting/ReactorHostControl.cs:523

  • _lastRenderMs is written in Render() but RequestRender() is documented as thread-safe. Consider publishing this value with an atomic write (e.g., Interlocked.Exchange) to avoid torn writes / visibility issues when other threads request a render.
            // Feed RenderPriorityPolicy. See matching note in ReactorHost.Render.
            _lastRenderMs = treeBuildMs + reconcileMs + effectsMs;
  • Files reviewed: 5/5 changed files
  • Comments generated: 4

Comment thread src/Reactor/Hosting/ReactorHost.cs Outdated
Comment thread src/Reactor/Hosting/ReactorHostControl.cs Outdated
Comment thread src/Reactor/Hosting/RenderPriorityPolicy.cs
Comment thread tests/Reactor.Tests/Hosting/RenderPriorityPolicyTests.cs
_lastRenderMs is written from the UI thread inside Render() but read from
RequestRender(), which is documented thread-safe and may be called from any
thread. A plain double write isn't guaranteed atomic on 32-bit and lacks
publication semantics. Use Interlocked.Exchange on the write side and
Volatile.Read on the read side so off-UI-thread callers observe a
non-torn, freshly published value — matching the Interlocked pattern
already used for _renderPending in the same files.
@codemonkeychris
Copy link
Copy Markdown
Collaborator Author

Stress run analysis

Run 25999336086: 98/100 iterations passed. The two failures are pre-existing flakies, not regressions from this PR.

Per-shard outcomes

Shard Iter 1 2 3 4 5
4 DataGrid_ScrollPerfRelative
7 Trampoline_ReRenderSameControlUnderlyingRefStable
1-3, 5-6, 8-20 ✅ × 5 each

Both failures happened on iteration 1 (cold JIT), then all 8 retries on the same machines passed clean.

Failure 1: DataGrid_ScrollPerfRelative — confirmed pre-existing

ScrollPerf_Ratio (dg=12.93 vl=9.13 ratio=1.4x) - assertion failed

The test asserts the DataGrid:VirtualList scroll-time ratio is < 1.4. With single-digit ms timings, runner variance flips the result. The fixture's own source comment acknowledges this: "ARM64 dev boxes land at ~1.3–1.4x, x64 CI closer to 1.1–1.2x. The headroom to 1.4x absorbs that variance while still catching structural regressions … which the fixture's original comment calls out as 2x+."

Same failure appears on main stress runs — e.g. 25562201970 hit it twice with ratios 1.5× and 1.5×, and 25535961443 hit it with 1.6×. Two distinct shards on main runs failed it without any of my changes present.

My change doesn't interact with this test: the scroll measurement calls dgSv.ChangeView(...) then UpdateLayout() directly — synchronous WinUI, no setState, no await, no dispatcher scheduling. RenderPriorityPolicy is dormant.

Failure 2: Trampoline_ReRenderSameControlUnderlyingRefStable — most likely flaky

Trampoline_HandlerFiresAfterManyRerenders - assertion failed

The fixture clicks a button 100× via setState/await Harness.Render(), then clicks a separate "target" button once and asserts its handler ran exactly once (clicks == 1). The failing assertion means clicks != 1 — i.e., the click handler didn't fire (or fired multiple times).

The mechanism by which my change could cause this is unclear:

  • H.ClickButton invokes ButtonAutomationPeer.Invoke(), which fires Button.Click synchronously.
  • The trampoline reads the current element's OnClick from per-element state and dispatches synchronously.
  • The "target" button's handler () => clicks++ does not call setState, so no render is queued by the click.
  • await Harness.Render() after the target click sees _renderPending == 0 and returns immediately on the fast path — RenderPriorityPolicy is never consulted.

I ran the same fixture 5× locally with this PR's bits: all 5 passed. The failure didn't reproduce in 8 subsequent CI iterations on the same shard, and it doesn't appear in any of the 8 most recent main stress runs I sampled (which exhibit other flakies: OverlayLifecycle_Show_RefreshExtendsLifetime, AsyncResource.Framerate.DepsThrashing, ThreadSafe_RenderCoalescing — all timing-sensitive, all on main without this PR).

Verdict

No code change. Both failures are environmental cold-start variance — same class of timing flakies that already exist on main. The 98% pass rate is consistent with the stress workflow's baseline behavior.

If desired, I can re-run the stress workflow against the latest commit (6f02825, which adds the Interlocked/Volatile pair from CR feedback) — but it's a pure memory-publication change with no timing impact, so the new run's failure rate would be statistically indistinguishable from this one.

@codemonkeychris
Copy link
Copy Markdown
Collaborator Author

stress_perf delta — confirming the timing harness is unaffected

Ran the stocks-grid stress matrix at 10/50/100% workload, 7s each, headless x64 Release, on this PR's HEAD (6f02825). Goal: prove the RenderPriorityPolicy change doesn't distort how we measure perf.

Per-run results

Framework % Duration Total Renders Avg FPS Avg Update (ms) Peak Mem (MB)
Direct (WinUI) 10 8.6s 48 11.5 6.1 511
Direct 50 8.3s 15 4.0 25.8 496
Direct 100 8.5s 10 2.7 42.7 500
Wpf 10 10.9s 44 5.2 2.5 1036
Wpf 50 11.1s 19 2.0 10.3 939
Wpf 100 10.2s 16 2.0 13.5 703
Reactor 10 8.5s 31 11.0 0.7 557
Reactor 50 8.5s 13 5.0 1.3 547
Reactor 100 8.5s 9 3.5 1.4 553
ReactorOptimized 10 8.4s 44 14.1 0.6 546
ReactorOptimized 50 8.2s 14 5.7 1.1 542
ReactorOptimized 100 8.1s 9 4.2 2.0 554

Reactor & ReactorOptimized phase breakdown (from per-run reports):

tree diff effects combined renders/tick
Reactor 10% 46.5 ms 41.1 ms 0.0 ms 85.5 ms 0.97
Reactor 50% 39.6 ms 117.1 ms 0.0 ms 146.8 ms 0.93
Reactor 100% 47.9 ms 136.9 ms 0.0 ms 167.8 ms 0.90
RO 10% 5.4 ms 28.1 ms 0.0 ms 33.3 ms 0.98
RO 50% 11.9 ms 87.4 ms 0.0 ms 93.9 ms 0.93
RO 100% 20.8 ms 146.9 ms 0.0 ms 152.9 ms 0.90

Versus committed baseline (docs/reports/stress-perf-stocks-grid.md)

10% 50% 100%
Direct — baseline FPS 10.11 3.11 2.44
Direct — this PR 11.5 4.0 2.7
Reactor — baseline 8.11 3.67 2.89
Reactor — this PR 11.0 5.0 3.5

Why the harness is intact

  1. Duration stability — every non-WPF run reports Duration: 8.X s (7 s measurement + ~1.3-1.5 s spin-up/teardown), within ±5% across all 9 runs. Wpf consistently lands ~10.7s (its own startup overhead, matches historical behavior). The --duration 7 flag is honored.
  2. Cross-framework ordering preserved — ReactorOptimized > Reactor > Direct at 50%/100%, Direct ≈ Reactor at 10%. Matches baseline.
  3. PerfTracker phase data is sane — per-phase timings, renders/tick ratios (0.90-0.98), Avg Update ms, peak memory all in expected ranges. OnRenderComplete (the hook PerfTracker uses) is untouched by this PR.

Why Reactor's absolute FPS is higher than baseline (not a measurement artifact)

Reactor and ReactorOptimized run 25-40% above their baseline at every percent. Direct WinUI is also up ~10-30% — the same machine-state lift hits the non-Reactor control too, so this is "this machine is faster than the baseline machine," not Reactor's measurement getting distorted.

RenderPriorityPolicy actually does fire in this workload (the 33 ms timer-tick cadence keeps _lastRenderMs well above the 16 ms budget, so most enqueues land at Low priority). The dispatcher drains them fast because the workload is single-source, so total throughput is unaffected — the policy's purpose is to yield to competing dispatcher work (input, layout, paint) when present, not to throttle Reactor when it's the only thing on the queue. Exactly the design.

Why Wpf is below its baseline

The committed baseline is ETW Present counts on AC with DRR (Dynamic Refresh Rate) active. Headless "easy mode" under-measures Wpf specifically because WPF's render-thread isolation pays off only against a real display refresh cycle. This isn't caused by this PR — it's an artifact of the easy-mode harness on Wpf, documented in tests/stress_perf/METHODOLOGY.md. Direct/Reactor/ReactorOptimized are within ~10% of their "easy mode" expectation; only Wpf shows the gap.

Conclusion

Perf timing harness is unaffected by this PR. Recorded here so we have a snapshot if anything changes downstream.

@codemonkeychris codemonkeychris merged commit 191c70d into main May 17, 2026
8 checks passed
@codemonkeychris codemonkeychris deleted the fix/perf-stress-responsiveness branch May 17, 2026 19:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants