Stress fixes: framerate/PerGroup timeouts, overlay tick race, DataGrid scroll poll#397
Merged
Merged
Conversation
…d scroll poll Three independent stress-flake fixes (Clusters T, O, C) per INVESTIGATION.md follow-up to PR #396: Cluster T — bump FixtureTimeout to 30s for three render-pump-heavy fixtures whose budgets are tight on loaded CI runners but not pathological: - DataGridParityFixtures.HookPagingFramerateScroll - AsyncResourceFramerateFixtures.DataGridEditMutation - NativeDockingSmokeFixture.PerGroupDropTargetVisualDemo Cluster O — fix race in ReconcileHighlightOverlay.RefreshOrAdd. DispatcherQueueTimer.Stop() can't dequeue a Tick that the dispatcher queue has already dispatched, so a stale tick can tear down a sprite the refresh still wants alive. Now each refresh swaps in a fresh timer with its own Tick lambda; the stale tick checks ReferenceEquals on ah.Timer and bails when its identity no longer matches. Cluster C — DataGrid_ScrollPopulatesData was relying on a fixed 800ms Render(800) for the scroll-settle → fetch → realization chain (not tracked by _renderPending). Now waits 800ms baseline (preserves the fetch-trigger window so ScrollPop_MultipleFetches still triggers) then polls cells for up to 3s. Validated locally: all 27 OverlayLifecycle_* fixtures pass, all 5 ScrollPop_* sub-checks pass, 15x local stress on both fixtures together is clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR bundles three stress-flake fixes across Reactor selftests and a dev overlay: (1) increases fixture timeouts for a few render-pump-heavy selftests that can exceed the default watchdog under CI load, (2) fixes a ReconcileHighlightOverlay expiry-timer race by swapping timers on refresh and guarding against stale ticks, and (3) hardens DataGrid_ScrollPopulatesData by polling for realized cells after a baseline settle delay.
Changes:
- Increase
FixtureTimeoutto 30s for three known long-running stress fixtures. - Make
ReconcileHighlightOverlaytimer expiry resilient to stale dispatcher ticks by creating a fresh timer on refresh and adding an identity guard. - Replace a fixed post-scroll wait with baseline wait + polling for realized DataGrid cells to avoid
cells=0under CI load.
Show a summary per file
| File | Description |
|---|---|
| tests/Reactor.AppTests.Host/SelfTest/Fixtures/NativeDockingSmokeFixture.cs | Extends timeout for PerGroupDropTargetVisualDemo to reduce CI watchdog flakes. |
| tests/Reactor.AppTests.Host/SelfTest/Fixtures/DataGridScrollFixtures.cs | Adds polling after scroll to wait for realized “Emp-” cells (keeps 800ms baseline). |
| tests/Reactor.AppTests.Host/SelfTest/Fixtures/DataGridParityFixtures.cs | Extends timeout for HookPagingFramerateScroll fixture. |
| tests/Reactor.AppTests.Host/SelfTest/Fixtures/AsyncResourceFramerateFixtures.cs | Extends timeout for DataGridEditMutation fixture. |
| src/Reactor/Hosting/ReconcileHighlightOverlay.cs | Swaps expiry timer on refresh and guards against stale timer ticks tearing down active sprites. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 5/5 changed files
- Comments generated: 2
Comment on lines
151
to
+154
| try { existing.Timer.Stop(); } catch { } | ||
| try { existing.Timer.Start(); } catch { } | ||
| var refreshedTimer = CreateExpiryTimer(target, existing); | ||
| existing.Timer = refreshedTimer; | ||
| refreshedTimer.Start(); |
Comment on lines
132
to
133
| H.Check($"ScrollPop_DataVisible (cells={visibleEmpCells.Count})", | ||
| visibleEmpCells.Count >= 4); |
4 tasks
codemonkeychris
added a commit
that referenced
this pull request
May 24, 2026
Two independent stress-flake fixes per INVESTIGATION.md session 4 follow-up to PR #397: Cluster F (4/1000 hits in run 26351376710) — FloatingTitleBar_PaneBodyVisible asserted body TextBlock visibility after a single Harness.Render() following DockFloatingWindow.Open. Chrome (TabView, TitleBar absence, TabStripFooter) materializes on that first pump, but the inner pane content lags one or two pumps behind on a loaded CI runner. Same realization-race family as Cluster C; apply the same poll pattern (2s budget, 50ms between pumps) and annotate the check name with the observed body count for future-hit diagnostics. Cluster T-new (2/1000 hits) — EventSubscriptionLeakBaseline timing out at its 30s override. Local timing measurement across 3 fresh-process runs: 14.5 / 15.6 / 15.4 s (avg 15.2s) on a dev box. The fixture is 100 mount/unmount cycles x 2 Harness.Render() = 200 renders + 200 reconcile passes; the work itself is substantial. CI VMs under load have been measured at 2-4x slowdown elsewhere in the doc, easily overshooting the 30s budget. Not a hang — just budget vs CI variance. Bump FixtureTimeout to 60s (~4x local baseline) with a comment block explaining the math. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3 tasks
codemonkeychris
added a commit
that referenced
this pull request
May 25, 2026
Eliminates the dominant stress-failure mode (CI VM variance overshooting tight per-fixture budgets). 5 of 8 failures in CI Stress run 26364109146 were 15 s timeout trips across 5 different fixtures - the pattern that per-fixture overrides in PRs #395, #397, #399 chased without converging. Local timing measurements (documented in INVESTIGATION.md) show the Framerate.* family completing in ~7 s on a dev box; CI VMs under contention run 2-4x slower, putting fixture work at ~28 s on the worst tick. 30 s matches the explicit override budget already used by four fixtures and leaves a clean signal gap below the 60 s HangWatchdogLoop dump-on-hang threshold. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three independent stress-flake fixes, bundled per the cadence set by #396.
Each is the implementation of a Net-next-step from
INVESTIGATION.mdpost-#396.
FixtureTimeout => 30son three render-pump-heavyfixtures whose mandatory wall-clock floors are tight against the default
15 s budget on loaded CI runners. Not deadlocks; budget violations.
DataGridParityFixtures.HookPagingFramerateScrollAsyncResourceFramerateFixtures.DataGridEditMutationNativeDockingSmokeFixture.PerGroupDropTargetVisualDemoReconcileHighlightOverlaytick race.DispatcherQueueTimer.Stop()cannot rescind aTickthe dispatcher queue has already dispatched, so a stale tick can tear down a sprite the new refresh still owns. Fix swaps in a fresh timer on refresh; the stale tick's identity check fails and it bails.DataGrid_ScrollPopulatesDatapoll. Fixed 800 msRenderoccasionally undershoots realization on CI →
cells=0. Now keeps the800 ms baseline (so the scroll-settle window still fires fetches, and the
sibling
ScrollPop_MultipleFetchessanity check still passes) and pollscells for up to 3 s.
Test plan
dotnet buildclean across Reactor + AppTests.Host.OverlayLifecycle_*self-test family (27 fixtures): all green locally.DataGrid_ScrollPopulatesDatasingle-shot: all 5 sub-assertions green(
cells=41 / 41,calls=7).Cluster T/O/C all stay at zero hits.
🤖 Generated with Claude Code