Stress test fixes: visual demo budgets, GC marshaling, InfiniteBasic pump#396
Merged
Conversation
…pump Three independent fixes for the post-#395 stress-failure landscape (CI run 26348014478): 1. Tighten visual-demo pacing so SplitterProgrammaticVisualDemo (~9.7s) and PerGroupDropTargetVisualDemo (~9s under load) fit comfortably under a 5s target instead of brushing the 15s fixture timeout. 2. Marshal GC.Collect+WaitForPendingFinalizers+GC.Collect onto a background thread via Task.Run at all 11 selftest call sites. Running this pattern on the UI dispatcher thread is a known deadlock anti-pattern: a finalizer that needs to release a UI-thread-affine RCW marshals back to the dispatcher, but the dispatcher is blocked inside WaitForPendingFinalizers. Speculatively addresses the Framerate.DataGridScroll, Framerate.DataGridEditMutation, and PropertyGrid_Target_Switching hangs. 3. Add a third Harness.Render pump to InfiniteBasic's page-walk loop. The fetcher's 10ms Task.Delay plus the Apply continuation occasionally outlives two pumps on a loaded CI VM; the fixture already pumps three times in the initial-page path for the same reason. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| GC.Collect(); | ||
| await Task.Run(() => | ||
| { | ||
| GC.Collect(); |
| { | ||
| GC.Collect(); | ||
| GC.WaitForPendingFinalizers(); | ||
| GC.Collect(); |
| GC.Collect(); | ||
| await Task.Run(() => | ||
| { | ||
| GC.Collect(); |
| { | ||
| GC.Collect(); | ||
| GC.WaitForPendingFinalizers(); | ||
| GC.Collect(); |
| // dispatcher were blocked inside WaitForPendingFinalizers. | ||
| await Task.Run(() => | ||
| { | ||
| GC.Collect(); |
| { | ||
| GC.Collect(); | ||
| GC.WaitForPendingFinalizers(); | ||
| GC.Collect(); |
| GC.Collect(); | ||
| await Task.Run(() => | ||
| { | ||
| GC.Collect(); |
| { | ||
| GC.Collect(); | ||
| GC.WaitForPendingFinalizers(); | ||
| GC.Collect(); |
| GC.Collect(); | ||
| await Task.Run(() => | ||
| { | ||
| GC.Collect(); |
| { | ||
| GC.Collect(); | ||
| GC.WaitForPendingFinalizers(); | ||
| GC.Collect(); |
5 tasks
codemonkeychris
added a commit
that referenced
this pull request
May 24, 2026
…d scroll poll (#397) Three independent stress-flake fixes (Clusters T, O, C) per INVESTIGATION.md follow-up to PR #396: Cluster T — bump FixtureTimeout to 30s for three render-pump-heavy fixtures whose budgets are tight on loaded CI runners but not pathological: - DataGridParityFixtures.HookPagingFramerateScroll - AsyncResourceFramerateFixtures.DataGridEditMutation - NativeDockingSmokeFixture.PerGroupDropTargetVisualDemo Cluster O — fix race in ReconcileHighlightOverlay.RefreshOrAdd. DispatcherQueueTimer.Stop() can't dequeue a Tick that the dispatcher queue has already dispatched, so a stale tick can tear down a sprite the refresh still wants alive. Now each refresh swaps in a fresh timer with its own Tick lambda; the stale tick checks ReferenceEquals on ah.Timer and bails when its identity no longer matches. Cluster C — DataGrid_ScrollPopulatesData was relying on a fixed 800ms Render(800) for the scroll-settle → fetch → realization chain (not tracked by _renderPending). Now waits 800ms baseline (preserves the fetch-trigger window so ScrollPop_MultipleFetches still triggers) then polls cells for up to 3s. Validated locally: all 27 OverlayLifecycle_* fixtures pass, all 5 ScrollPop_* sub-checks pass, 15x local stress on both fixtures together is clean. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three independent fixes for the post-#395 stress-failure landscape (CI run 26348014478, 6/20 failed shards, 7 total failures). After landing #395 (Cluster A), the remaining failures break into:
AsyncResource.InfiniteBasic_MultiplePagesFetched (got 4)FloatingTitleBar_PaneBodyVisible(does not reproduce in isolation; not addressed here)This PR addresses Cluster T and Cluster I.
1. Tighten visual-demo timing budgets (Cluster T1)
NativeDocking_SplitterProgrammaticVisualDemowas designed with an explicit ~9.7s timing budget (per the comment block at the top of the fixture) against a 15s timeout, assuming ~80ms/render. Under CI load renders can exceed 200ms, pushing total runtime over the timeout.NativeDocking_PerGroupDropTargetVisualDemohas the same shape with smaller margins.Reduced pacing delays:
Smoke-tested locally: splitter at 4.8s wall-clock (~1.8s fixture runtime), PerGroup at 7.2s wall-clock (~4.2s fixture runtime), both well under 15s timeout. The pacing is still observable for manual debug viewing.
2. Marshal
GC.WaitForPendingFinalizersoff the UI dispatcher (Cluster T2)AsyncResource.Framerate.DataGridScroll,Framerate.DataGridEditMutation, andPropertyGrid_Target_Switchingwere timing out at 15s but should normally complete in 1-3s. Every Framerate fixture (and several others) starts and/or ends withGC.Collect(); GC.WaitForPendingFinalizers(); GC.Collect();. Running that on the UI dispatcher thread is a known deadlock anti-pattern: a finalizer that needs to release a UI-thread-affine RCW (e.g. a WinUI control) marshals back to the dispatcher, but the dispatcher is blocked insideWaitForPendingFinalizerswaiting for that very finalizer.Migrated all 11 call sites onto a background thread via
await Task.Run(() => { GC.Collect(); GC.WaitForPendingFinalizers(); GC.Collect(); }). The UI dispatcher continues pumping while the finalizer drains, breaking the deadlock potential.Files affected:
AsyncResourceFramerateFixtures.cs(6),AsyncResourceFixtures.cs,AsyncInfiniteResourceFramerateFixtures.cs,DataGridParityFixtures.cs,NativeDockingReliabilityFixture.cs(2).The fix is speculative — the local repro (40 iter × 13 Framerate fixtures, 0 hits) is inconclusive against the ~0.075% CI rate. Suggesting a follow-up to add
DOTNET_DbgEnableMiniDump=1to the stress workflow env so the next CI hit (if any) yields a definitive dump.3. Extra render pump in InfiniteBasic page-walk (Cluster I)
InfiniteBasic_MultiplePagesFetched (got 4)failed at shard 9 iter 10. The fixture pumps 2 renders after eachItemAt()call. The fetcher has a 10msTask.Delayplus an Apply continuation; under CI load that occasionally outlives the 2-pump budget. The fixture's existing comment already admits the analogous race for page 1 and pumps three times in the initial-page path — extending the same pattern to the loop.Test plan
dotnet build— 0 errors, only pre-existing warnings🤖 Generated with Claude Code