Skip to content

fix(stress): per-fixture hang watchdog + WaitForIdle cap raise#431

Merged
codemonkeychris merged 2 commits into
mainfrom
fix/stress-flakes-waitidle-watchdog
May 27, 2026
Merged

fix(stress): per-fixture hang watchdog + WaitForIdle cap raise#431
codemonkeychris merged 2 commits into
mainfrom
fix/stress-flakes-waitidle-watchdog

Conversation

@codemonkeychris
Copy link
Copy Markdown
Collaborator

Summary

Two race-driven flakes surfaced in the recent 500x selftest stress run on main (CI Stress run 26513480330): 4 of 10000 iterations failed (~99.96% pass), with 2 of 4 in NativeDocking_* fixtures. Investigation showed both root causes were in test infrastructure, not the new docking window manager work (#418/#420/#421/#415).

Root causes

  1. ReactorHost.WaitForIdleAsync(maxYields: 10) silently returned after 10 low-priority dispatcher yields whether or not the render loop had actually settled. Under CI VM contention, callers like Harness.Render() moved on against a half-rendered tree — producing flakes like NativeDocking_RoleAware_SplitDocArea_CloseNonLast failing to FindText("body:d1") on iteration 3/500.

  2. SelfTestRunner.HangWatchdogLoop used a global 60 s constant equal to EventSubscriptionLeakBaseline's own FixtureTimeout. On worst-case CI ticks (~60 s for 200 renders + 200 reconciles, as the fixture's own comment acknowledged), the two raced — the watchdog won and FailFasted the shard instead of letting the fixture's graceful timeout fire.

Changes

File Change
src/Reactor/Hosting/ReactorHost.cs WaitForIdleAsync default cap 10 → 50; Debug.WriteLine when the cap actually hits so the next flake leaves a greppable trail
tests/.../SelfTest/SelfTestRunner.cs FixtureProgress carries a per-fixture HangThreshold = max(60 s, FixtureTimeout + 30 s); watchdog uses that, not the global constant
tests/.../SelfTest/SelfTestFixtureBase.cs doc comment updated to reflect new watchdog rule
tests/.../Fixtures/NativeDockingReliabilityFixture.cs EventSubscriptionLeakBaseline FixtureTimeout 60 → 120 s; heartbeat H.Check after warmup and every 25 cycles

The heartbeat H.Check calls don't reset the watchdog (it's still elapsed-since-fixture-start), but they make ok lines visible in the log so a future hang reveals which 25-cycle window it lived in, not just "no progress".

Test plan

  • src/Reactor and tests/Reactor.AppTests.Host both compile clean (0 errors, only pre-existing XML-comment / nullability warnings)
  • Kick off a 1000x CI Stress run targeting selftests on this branch
  • Confirm no flakes in the NativeDocking_* fixtures or DataGrid_KeyboardAndPrivateRenderPaths
  • Confirm EventSubscriptionLeakBaseline per-iteration time stays well under the 120 s budget (target: still ~15 s local, ~30-60 s under CI contention)
  • Spot-check stress logs for [Reactor.WaitForIdle] yield cap hit lines — if any appear, the 50-yield cap is being approached and we'll want to investigate the underlying render loop

🤖 Generated with Claude Code

Two race-driven flakes surfaced in the 500x selftest stress run on main
(GitHub Actions run 26513480330): 4/10000 iterations failed, with 2 of
4 in NativeDocking_* fixtures.

WaitForIdleAsync silently returned after 10 low-priority dispatcher
yields whether or not the render loop had settled. Under CI VM
contention this dropped tests onto a half-rendered tree, producing
e.g. SplitDocArea_CloseNonLast finding no `body:d1` on iter 3/500.
Raise the default cap to 50 and log a Debug.WriteLine when it hits,
so the next flake is greppable instead of silent.

The host-level HangWatchdogLoop used a global 60 s constant — equal
to EventSubscriptionLeakBaseline's own FixtureTimeout. The two raced
on the worst-case CI tick (~60 s for 200 renders + 200 reconciles)
and the watchdog won, FailFasting the shard instead of letting the
fixture's graceful timeout fire first. Make the watchdog threshold
per-fixture: max(60 s, FixtureTimeout + 30 s). The fixture's own
budget always gets first crack; dispatcher-starvation FailFast only
fires after that.

Also: bump EventSubscriptionLeakBaseline FixtureTimeout 60 → 120 s
(auto-bumps its watchdog to 150 s under the new rule), and add
heartbeat H.Check calls every 25 cycles so a future hang reveals
which loop window it lived in rather than just "no progress".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the selftest stress infrastructure to reduce race-driven flakes under CI contention by improving “wait until UI settles” behavior and making the hang watchdog threshold fixture-aware (so long-budget fixtures don’t get fail-fasted prematurely).

Changes:

  • Increase ReactorHost.WaitForIdleAsync yield cap (10 → 50) and add a diagnostic trace when the cap is hit.
  • Make the hang watchdog threshold per-fixture (max(60s, FixtureTimeout + 30s)) instead of a single global ceiling.
  • Extend EventSubscriptionLeakBaseline fixture timeout (60s → 120s) and add periodic TAP “heartbeat” checks for better log locality during stress.
Show a summary per file
File Description
src/Reactor/Hosting/ReactorHost.cs Raises WaitForIdleAsync yield cap and logs when the cap is hit.
tests/Reactor.AppTests.Host/SelfTest/SelfTestRunner.cs Introduces per-fixture hang thresholds for the watchdog and adjusts progress publication.
tests/Reactor.AppTests.Host/SelfTest/SelfTestFixtureBase.cs Updates documentation to reflect the new watchdog threshold rule.
tests/Reactor.AppTests.Host/SelfTest/Fixtures/NativeDockingReliabilityFixture.cs Increases leak baseline timeout and adds TAP heartbeat checks.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 4/4 changed files
  • Comments generated: 4

Comment thread tests/Reactor.AppTests.Host/SelfTest/SelfTestRunner.cs Outdated
Comment thread tests/Reactor.AppTests.Host/SelfTest/Fixtures/NativeDockingReliabilityFixture.cs Outdated
Comment thread src/Reactor/Hosting/ReactorHost.cs Outdated
Comment thread src/Reactor/Hosting/ReactorHost.cs
- WaitForIdleAsync: use RunContinuationsAsynchronously so await
  continuations don't run inline on the dispatcher at Low priority
  (defeating the yield loop's purpose); honour TryEnqueue failures
  (queue shutdown) by completing the TCS rather than hanging.
- SelfTestRunner: publish a baseline FixtureProgress *before* calling
  SelfTestFixtureRegistry.Create, so the watchdog can still attribute
  a hang if construction itself blocks. Upgrade the per-fixture
  threshold once FixtureTimeout is known.
- Fix misleading comments in EventSubscriptionLeakBaseline that claimed
  the heartbeat H.Check calls reset the watchdog — they don't; the
  watchdog uses elapsed-since-fixture-start. Heartbeats are log
  breadcrumbs only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codemonkeychris
Copy link
Copy Markdown
Collaborator Author

Stress run summary for https://github.com/microsoft/microsoft-ui-reactor/actions/runs/26526776797:

  • Overall selftest reliability: 997/1000 iterations passed = 99.7%.
  • Scope: 1000 selftest iterations across 40 shards; unit/integration shards were skipped.
  • Docking-window-specific signal: 1 docking-related failure in 1000 iterations, or roughly 99.9% docking iteration reliability for this run.

Failed tests observed during the run:

Shard / Iteration Test Failure
Shard 5 / iteration 1 DataGrid_ScrollPopulatesData ScrollPop_InitialRender - assertion failed
Shard 12 / iteration 17 AsyncResource.Framerate.DataGridScroll exceeded 30s
Shard 40 / iteration 25 NativeDocking_A11y_HostLandmarkAndPaneAutomationIds A11y_PaneAutomationId_ActiveTabFound - assertion failed

@codemonkeychris codemonkeychris merged commit 8acc1af into main May 27, 2026
55 of 59 checks passed
@codemonkeychris codemonkeychris deleted the fix/stress-flakes-waitidle-watchdog branch May 27, 2026 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants