Commit 7558665
Add off-dispatcher AOT hang watchdog and clean up stale selftest skips (#358)
* Add off-dispatcher AOT hang watchdog and clean up stale selftest skips
Provides a debugging story for AOT selftests that hang the host app, and
restores test coverage for fixtures that were skipped speculatively but
actually pass under AOT today.
Watchdog (tests/Reactor.AppTests.Host/SelfTest/SelfTestRunner.cs):
* Background Thread tracks the currently running fixture as an immutable
FixtureProgress record published via Volatile.Read/Write -- no
multi-field consistency race when the publisher and watcher overlap.
* Polls every 1s and FailFasts after HangTimeout (default 60s,
configurable via REACTOR_SELFTEST_HANG_TIMEOUT_SECONDS, 0 disables;
auto-disabled when Debugger.IsAttached).
* On hang, writes "Bail out! HANG_DETECTED: <name> ran <s>s ..." to
stdout and stderr before Environment.FailFast so a Watson minidump is
produced when DOTNET_DbgEnableMiniDump=1.
CLI / environment:
* --no-aot-skip on the Host (Program.cs) flips
SelfTestRunner.SkipAotPatterns so a developer can run a single
DefaultAotSkipPatterns entry against the AOT-published binary.
* REACTOR_SELFTEST_HOST_EXE on the MSTest harness
(Reactor.SelfTests/SelfTestBatch.cs) overrides the auto-discovered
Host binary so the same harness validates the AOT-published .exe
from a publish/ directory.
Parent attribution (SelfTestBatch.cs):
* Parses HANG_DETECTED: <name> from both stdout and stderr; on process
timeout or hang signal, attributes the failure to the named fixture
with a repro command + dump-env instructions, instead of cascading
through _initError (which would fail every unrelated fixture).
* Falls back to the last "# Running:" line when the watchdog did not
fire in time.
Skip-list cleanup (SelfTestRunner.cs):
* Removes 84 stale entries that pass under AOT today.
* Replaces the previous mix of explicit names and Prefix* wildcards
with 108 explicit fixture names grouped by failure mode
(NATIVE_CRASH vs ASSERT_FAIL) so the skip list documents what is
actually broken.
Empirical probe (tests/Reactor.AppTests.Host/probe-aot-skips.ps1):
* New script that expands DefaultAotSkipPatterns via --list-fixtures
and runs each entry in isolation under --no-aot-skip, categorising
by exit code + watchdog signal, writing a CSV.
* Categories observed on this baseline: 84 PASS (removed), 61
NATIVE_CRASH (0xC0000409 / STATUS_STACK_BUFFER_OVERRUN), 47
ASSERT_FAIL, 0 HANG. The hang watchdog stays as a CI safety net
(synthetic Thread.Sleep injection validated end-to-end) even though
no current entry actually hangs.
Docs (docs/aot-support.md):
* New "Debugging an AOT selftest hang" section covering the
three-bucket failure-mode taxonomy, minidump env-var setup,
isolated-repro workflow, watchdog tuning, and the probe script.
Validation:
* JIT selftests: 735/735 passing, no false positives.
* Synthetic hang (Thread.Sleep(int.MaxValue) injected into one fixture
with REACTOR_SELFTEST_HANG_TIMEOUT_SECONDS=3): watchdog fired in 3s,
process exited 0xE0434F4D (COR_E_FAILFAST), stack pointed at
HangWatchdogLoop. Synthetic hang reverted.
* Cleaned AOT run: 627 fixtures, 2358 ok checks, 108 skipped, 1 known
pre-existing flaky assertion, 0 hangs.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Address PR #358 review feedback
* SelfTestBatch.cs: Set _abortedReason on hang/timeout so the
Fixture test method reports never-executed fixtures as
Assert.Inconclusive instead of cascading "was not reported by the
Host" failures across every fixture downstream of the hang.
* SelfTestRunner.cs: Fix SkipAotPatterns XML doc -- true means
patterns are applied (default), false means ignored. Doc had it
inverted.
* SelfTestRunner.cs + docs/aot-support.md: Fix stale references to
`.aot_runs/probe_skips.ps1`; the script lives at
tests/Reactor.AppTests.Host/probe-aot-skips.ps1.
* probe-aot-skips.ps1: STATUS_STACK_BUFFER_OVERRUN exit code is
-1073740791 (0xC0000409 widens to Int64 in PowerShell, so
`-eq 0xC0000409` never matched a negative Process.ExitCode).
Verified locally that a known crasher now categorises as
NATIVE_CRASH via the explicit constant.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>1 parent 3c3bed9 commit 7558665
5 files changed
Lines changed: 516 additions & 186 deletions
File tree
- docs
- tests
- Reactor.AppTests.Host
- SelfTest
- Reactor.SelfTests
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
45 | 108 | | |
46 | 109 | | |
47 | 110 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| 25 | + | |
| 26 | + | |
25 | 27 | | |
26 | 28 | | |
27 | 29 | | |
| |||
0 commit comments