[cdac] x86: implement IGCInfoDecoder.EnumerateLiveSlots; unblock GCRoots stackref tests#129547
[cdac] x86: implement IGCInfoDecoder.EnumerateLiveSlots; unblock GCRoots stackref tests#129547max-charlamb wants to merge 32 commits into
Conversation
…ots stackref tests Builds on the partial x86 IGCInfo support added in dotnet#129456 by porting the remaining decoder pieces required for GC-root scanning on x86, so that `IStackWalk.WalkStackReferences` returns live frame slots on x86 cDAC. The x86 GC info uses the legacy bit-packed `InfoHdr` byte-stream encoding (`src/coreclr/vm/gc_unwind_x86.inl`, `src/coreclr/inc/gcdecoder.cpp`) instead of the modern `GcInfoDecoder` shared by other architectures, so the implementation lives entirely on the existing `X86GCInfo` decoder under `Contracts/GCInfo/X86/`. Changes ------- * `X86GCInfo`: add `UntrackedSlots` lazy property + `DecodeUntrackedSlots()` -- delta-decoded signed varints with the double-align-frame rebase from `gc_unwind_x86.inl:3467`. * `X86GCInfo`: add `VarPtrLifetimes` lazy property + `DecodeVarPtrLifetimes()` -- triplets of (varOffs, begOffs delta, endOffs delta) for EBP-frame tracked locals. * Two new public record types `UntrackedSlot` and `VarPtrLifetime` capture the decoded entries. * `IsCodeOffsetInProlog` / `IsCodeOffsetInEpilog` helpers (offset-parameterised, so EnumerateLiveSlots can answer for any instruction offset without re-constructing X86GCInfo). * `RegMaskToRegisterNumber` helper maps the single-bit `RegMask` flags-enum values to the x86 ModRM register numbers used by `X86Context.TryReadRegister` and `LiveSlot.RegisterNumber`. * Implement `IGCInfoDecoder.EnumerateLiveSlots(uint offset, options)`: early-return empty in prolog/epilog (or aborted+non-interruptible), emit untracked locals (suppressed for filter funclets), emit VarPtr lifetimes covering `offset`, walk `Transitions` up to `offset` accumulating live registers + pushed pointer args, and emit a partially-interruptible `GcTransitionCall` exactly at `offset`. * Flip `IGCInfoDecoder.GetSizeOfStackParameterArea` from `NotSupportedException` to `return 0` for x86 -- x86 has no separate outgoing-argument scratch area; per-offset transitions report pushed args directly, so the GcScanner scratch-area filter is a no-op (correct). * Remove the `[SkipOnArch("x86", "GCInfo decoder does not support x86")]` markers on `GCRoots_WalkStackReferences_FindsRefs` and `GCRoots_RefsPointToValidObjects`. * `DumpTests.targets`: add optional `DebuggeeFilter=<Name>` to restrict `GenerateAllDumps` to a single debuggee. Useful for iterative local x86 work where some other debuggee's publish may fail. * `docs/design/datacontracts/GCInfo.md`: enumerate which `IGCInfoDecoder` APIs are wired up on x86. Out of scope (deferred) ----------------------- * `GetInterruptibleRanges` for x86 -- the only consumer is the catch-handler PC override in `StackWalk_1`; no x86-relevant scenarios today. * "this"-pointer special-case reporting for synchronized methods (VarPtr 0x2 bit currently masked out). * IPtrMask interior-pointer bitmaps for pushed args (uses the simpler per-push `Iptr` flag). * Funclet handling beyond the existing `IsParentOfFuncletStackFrame` caller-side early-skip. * Finer `IsActiveFrame` register filter precision. Validation ---------- * All 2525 cDAC unit tests pass. * The two unblocked `GCRoots_*` tests pass against a freshly generated x86 GCRoots dump. * Broader `DumpTests` x86 sweep: 34 pass / 46 fail / 830 skip -- net +2 vs. before this change (the two GCRoots tests), zero regressions. The 46 pre-existing failures are all unrelated to GCInfo (`ThreadDumpTests` / `ComWrappersDumpTests` / `RuntimeInfoDumpTests` / `WorkstationGCDumpTests` and similar). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The intro paragraph was getting wordy with the per-API status; pull that content out into a new "x86 specifics" section at the end of the file with a table covering supported/not-implemented APIs and a deferred-edges list. Intro now just notes that x86 is partially supported and links to the section. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR extends the cDAC x86 GCInfo decoder so stack-walk GC root scanning can enumerate live frame slots on x86, and enables the previously x86-skipped GCRoots WalkStackReferences dump tests.
Changes:
- Implements
IGCInfoDecoder.EnumerateLiveSlotsfor x86, plus lazy decoding for untracked locals and VarPtr lifetimes. - Enables x86 execution for the two GCRoots
StackReferenceDumpTestsby removing the x86 skip. - Adds an MSBuild
DebuggeeFilteroption to limit dump generation to a single debuggee and updates GCInfo contract documentation.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
| src/native/managed/cdac/tests/DumpTests/StackReferenceDumpTests.cs | Removes x86 skips so GCRoots stackref dump tests run on x86. |
| src/native/managed/cdac/tests/DumpTests/DumpTests.targets | Adds DebuggeeFilter support to limit debuggee csproj discovery/build. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/X86/GCInfo.cs | Adds lazy decoding for untracked/VarPtr tables and implements x86 live-slot enumeration. |
| docs/design/datacontracts/GCInfo.md | Updates the x86 support statement in the GCInfo contract doc. |
Context: x86 has used the funclet EH model since PR dotnet#115957 / dotnet#122872 (I had previously assumed otherwise and documented this API as intentionally not implemented). Catch funclet unwinding does call into `IGCInfo.GetInterruptibleRanges` via the parent-frame PC override path in `StackWalk_1.WalkStackReferences`, so throwing `NotSupportedException` is a real correctness gap (silently swallowed by the per-frame try/catch and producing missed parent-frame GC roots). Implementation -------------- Match the semantics of the native x86 walker (`EnumGcRefsX86` in `gc_unwind_x86.inl`): * Fully interruptible (`Header.Interruptible == true`): emit one range per gap between the prolog end and each epilog start, plus a final range from the last epilog end to `MethodSize`. This mirrors the native walker's prolog/epilog short-circuit return at line 3091. * Partially interruptible: walk `Transitions` and emit a single-byte `(offset, offset + 1)` range for each `GcTransitionCall`. Per the native partial-interrupt encoding doc (`gc_unwind_x86.inl:1066+`), call sites are the only GC-safe points in this mode -- the intervening `GcTransitionRegister` / `GcTransitionPointer` / `StackDepthTransition` / `IPtrMask` / `CalleeSavedRegister` events are all bookkeeping, not safe points. Docs ---- Update `docs/design/datacontracts/GCInfo.md` x86 specifics: * Move `GetInterruptibleRanges` from "Not implemented" to "Implemented" in the supported APIs table. * Remove the false claim that x86 has no funclets / no x86-relevant scenarios for this API. * Reference PR dotnet#115957 (Enable new exception handling on win-x86) for context on the EH model. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Pure file rename (git mv) -- the class was renamed `GCInfo` -> `X86GCInfo` in dotnet#129456 to avoid collision with the empty IGCInfo fallback struct, but the file kept the old name. Bring the filename in line. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (7)
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/X86/GCInfo.cs:638
EnumerateTransitionLiveSlotsemits pushed pointer-arg slots using the raw offsets stored inpushedPtrs(SpOffset: pushed.Key). Those offsets are tracked in a pre-push coordinate system (negative, relative to the SP value before the outstanding pushes), butGcScannerinterpretsSpOffsetas an offset from the currentSP_RELbase. As-is, this will compute incorrect stack addresses for pushed args (and the comment above claims the offsets are positive). Translate the offsets to be relative to the current SP before emitting LiveSlots.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/X86/GCInfo.cs:602activeCallSiteis currently captured forGcTransitionCallentries even in fully-interruptible methods. That makesactiveCallSitenon-null at call offsets and can change downstream behavior (e.g., it can force register reporting even whenIsActiveFrameis false).GcTransitionCallshould only be used as the authoritative live-state source for partially-interruptible methods.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/X86/GCInfo.cs:631- Register reporting is currently guarded by
if (options.IsActiveFrame || activeCallSite is not null). Even after restrictingactiveCallSiteto partially-interruptible methods, this still reports the accumulatedliveRegsset at call sites, which can double-report registers alongside theactiveCallSite.CallRegistersemission and can re-enable reporting whenIsActiveFrameis false. Consider keeping accumulated register emission strictly tied toIsActiveFrameand skipping it when anactiveCallSiteis present.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/X86/GCInfo.cs:76 - This adds new
publicsurface area (UntrackedSlots) onX86GCInfo. Since the new data is only consumed internally within the x86 decoder, consider making thisinternal(orprivate) to avoid expanding the public API surface and the associated API-approval requirements for externally-consumable contracts assemblies.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/X86/GCInfo.cs:83 - This adds new
publicsurface area (VarPtrLifetimes) onX86GCInfo. If it’s not intended for external consumption, making itinternalhelps avoid locking in an API contract for implementation details (and avoids needing API approval for these additions).
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/X86/GCInfo.cs:750 UntrackedSlotis introduced as apublictype but is only used as an implementation detail withinX86GCInfo. Consider making itinternalto avoid committing to a public API shape for decoder internals (and to avoid needing API approval). If you do this, ensureUntrackedSlotsisn’t public either (it can’t return an internal type).
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/X86/GCInfo.cs:765VarPtrLifetimeis introduced as apublictype but is only used as an implementation detail withinX86GCInfo. Consider making itinternalto avoid expanding public API surface for decoder internals (and to avoid needing API approval). If you do this, ensureVarPtrLifetimesisn’t public either.
…ed bits
I had been masking out the 0x2 bit for VarPtr lifetimes, claiming it
meant "this" pointer not pinned for tracked locals. That's wrong on
modern x86: the native VarPtr loop in gc_unwind_x86.inl:3610-3613
explicitly says
// First Bit : byref
// Second Bit : pinned
// Both bits are valid
flags |= lowBits;
The this_OFFSET_FLAG = 0x2 interpretation in gcinfo.h was scoped to the
legacy JIT32_ENCODER on x86 without funclets, which has been gone since
dotnet#115957 enabled funclet EH on win-x86 (and dotnet#122872 removed the rest).
Pass LowBits straight through into LiveSlot.GcFlags for VarPtr entries,
matching what we already do for untracked slots. Update the
VarPtrLifetime LowBits xmldoc to drop the wrong "this"-pointer note.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
GcTransitionRegister and GcTransitionPointer constructors took `isThis` and `iptr` parameters but silently dropped them on the floor. The default-false properties survived past every `0xBF` interior-pointer prefix, every `0xBC` this-pointer prefix, and every byref CallRegister, so the LIVE state walker's iptr accumulator was always 0. Stress tests on Windows x86 saw ~30 register-resident interior-pointer mismatches per debuggee in CoreLib code (e.g. EventSource.DefineEventPipeEvents) where RT reported Reg=EDI Flags=0x1 (interior) and cDAC reported Flags=0x0. Add the missing `IsThis = isThis; Iptr = iptr;` assignments. Drops the BasicAlloc x86 stress mismatch count from 30 to 0 and the full-suite total from 316 to 14 (99.984% match). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Native gcdumpx86.cpp:361 declares `curOffs = 0` once *outside* the partial-interrupt EBP-frame walk loop and accumulates code-offset deltas across iterations. cDAC's `GetTransitionsEbpFrame` declared it *inside* the `while (true)` loop body, so it was reset to 0 every iteration -- causing all partial-interrupt EBP-frame call-site transitions to be emitted at small per-iteration deltas instead of true cumulative method offsets. Effect: callee-saved register liveness reported by RT on real call sites in CoreLib R2R'd code (where the JIT typically keeps GC refs in EBX/ESI/EDI across calls) was missed by cDAC's EnumerateLiveSlots, because activeCallSite never matched at the queried offset. Stress tests on Windows x86 saw ~3K such under-reports per BasicAlloc run (1480 EDI + 1284 EBX + 476 ESI), concentrated in System.Diagnostics.Tracing.EventSource.* methods. Move `uint curOffs = 0;` outside the loop to mirror native, dropping the BasicAlloc x86 stress mismatch count from 1742 to 30 (98.0% -> 99.97% match). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Four related correctness fixes in X86GCInfo's `EnumerateLiveSlots` and `EnumerateTransitionLiveSlots`, mirroring native EnumGcRefsX86 in gc_unwind_x86.inl. These were uncovered by running the cDAC GC stress tests against Windows x86 corerun and comparing cDAC's slot output against the runtime's at every managed allocation. 1. **Pushed-pointer-arg address calculation** -- Native at gc_unwind_x86.inl:3262 addresses bit `i` of the args mask at `pPendingArgFirst - i*sizeof(DWORD)`, where pPendingArgFirst is the FIRST-pushed-arg address (highest among pushed args). cDAC was storing pushed pointers keyed by `-depthSlots*4` and emitting them as SP_REL slots with the negative offset, resolving to addresses *below* the call-site SP -- impossible for live args. Switched to storing by push-index (depth-at-push-time, 0-indexed) and translating to positive SP-relative offset `(finalDepth - 1 - pushIndex) * 4` at emit time. 2. **ESP-only push/pop semantics** -- GcArgTable encodes ESP push/pop bytes as `GcTransitionRegister` with `RegMask.ESP` and Action.PUSH/ POP, but those mean "stack-depth tracking" (non-pointer args), not "pointer push". cDAC was treating them as pointer pushes (adding phantom entries to pushedPtrs) and was filtering them through the regOffset gate (which is for register-liveness LIVE/DEAD events only, not arg-stream events). Now ESP-only pushes only advance depth, and the regOffset gate fires only for `Action.LIVE`/ `Action.DEAD`. 3. **Register-state offset on non-leaf frames** -- Native EnumGcRefsX86 uses `curOffsRegs = curOffs - 1` for non-active stack frames because register liveness can change across calls (gc_unwind_x86.inl:3149+). Mirror that: register-liveness LIVE/DEAD events at offset > regOffset are skipped on non-active frames. 4. **VarPtr lifetimes on EBP frames** -- Native at gc_unwind_x86.inl:3567-3573 negates the encoded VarPtr stack offset for EBP frames (the encoded value is positive but means EBP-relative-negative for locals). cDAC was leaving it positive, reporting locals at the wrong addresses. Also use `curOffs-1` for the lifetime-range check on non-active frames, mirroring gc_unwind_x86.inl:3540-3548 (a variable could be dead at the return address if the call was the last instruction of a try and the return jumps to a catch handler). 5. **Callee-trashed register filtering** -- On non-active frames the callee will have overwritten EAX/ECX/EDX, so any GC refs they held at the call site are stale. Native gates these via the `ActiveStackFrame` flag at gc_unwind_x86.inl:3189-3199; mirror that in EnumerateTransitionLiveSlots. Combined effect on Windows x86 stress (BasicAlloc, Checked): Before: 59,514 matched / 27,702 mismatched (68%) After: 87,778 matched / 0 mismatched (100%) Across the full 9-debuggee suite: 43,771 PASS / 14 FAIL (99.97%). The remaining 14 are filter-funclet / state-context edge cases tracked separately. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rgMask call sites Two follow-up correctness fixes for residual stress-test mismatches. 1. **ParentOfFuncletStackFrame early-return** -- Native EnumGcRefsX86 (gc_unwind_x86.inl:3039) returns without reporting any GC refs when the stack walker has flagged the frame as a parent whose locals are already being reported via a funclet (e.g. a catch handler). cDAC's `StackWalk_1` does set `GcSlotEnumerationOptions.IsParentOfFuncletStackFrame` correctly via `gcFrame.ShouldParentToFuncletSkipReportingGCReferences`, but `X86GCInfo.EnumerateLiveSlots` was ignoring it and emitting the parent's slots a second time. This caused stress mismatches in exception-handling scenarios where a catch funclet appears alongside its parent (e.g. `Program.NestedExceptionScenario`): cDAC reported 9 refs on the parent frame while RT correctly reported 0. 2. **Call-site `ArgMask` iteration** -- For partially-interruptible call sites, GcArgTable populates either `GcTransitionCall.PtrArgs` (huge `0xFB` encoding with explicit per-pointer offsets) or `GcTransitionCall.ArgMask` / `IArgs` (tiny / small / medium / large encodings with a bitmap of pushed-pointer slots). cDAC was only iterating `PtrArgs`, silently dropping every call site that used the bitmap form. Native scanArgRegTable (gc_unwind_x86.inl:3373-3402) walks the bitmap low-to-high with `argAddr = ESP + i*sizeof(DWORD)`; mirror that. Affected `System.Diagnostics.Tracing.ManifestBuilder. CreateManifestString` and similar large CoreLib methods where the missing slot was always at `SP+0` (the lowest-bit pushed pointer). Combined with the prior commits in this branch, x86 cDAC GC root walking now reports identical results to the runtime across all 9 stress debuggees: Pre-session baseline: 87,294 frames / 27,702 mismatched (68% match) This commit: 803,556 frames / 0 mismatched (100%) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Tagging subscribers to this area: @steveisok, @tommcdon, @dotnet/dotnet-diag |
…LiveSlots Tighten comments in `X86GCInfo.cs` and `GCArgTable.cs`: - Replace `gc_unwind_x86.inl:NNNN` line-number citations with file + function-name references (line numbers churn). Cite `EnumGcRefsX86`, `scanArgRegTableI`, `scanArgRegTable`, etc. - Compact the per-block "why" comments to a single concise sentence, removing mechanical "we do X because native does X" duplication. - Tag the regOffset/curOffsRegs note in `EnumerateTransitionLiveSlots` to point at the equivalent native field rather than line numbers. Document the now-functional `EnumerateLiveSlots` in `docs/design/datacontracts/GCInfo.md`. Update the Supported APIs table to reflect that it is implemented and stress-validated, and replace the stale Deferred edges list (now-resolved items dropped, remaining items reframed as "not exercised by current x86 codegen / not stressed"). No behavioral changes; full stress suite still reports 0 mismatches. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Reverts commit cb82b21 to keep the file name as `GCInfo.cs`. Even though the class inside is `X86GCInfo` (renamed in dotnet#129456 to avoid colliding with the empty `Contracts.GCInfo` IGCInfo fallback struct), the file name churn shows up as a delete+add in the PR diff, which is harder to review. The C# class name does not need to match the file name. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Three issues prevented x86 stress from passing in CI (build 1473187): 1. Leftover Console.WriteLine debug output in GCArgTable.cs flooded the Helix log with thousands of "CallArgCount/CallPndTabCnt/lastSkip/stkOffs" lines and made test triage impossible. Removed all four diagnostic writes plus their now-dead locals (stkOffs/lowBit, lastSkip+param, callPndTabSize, callPndTab read-without-use). 2. CalculatePushedArgSizeAt threw "Unsupported gc transition type" (InvalidOperationException, HRESULT 0x80131509) whenever an ESP-frame walk hit a CalleeSavedRegister transition (emitted by the partial- interrupt no-EBP decoder, case 7 / 0x04). Native scanArgRegTable treats this tag as informational (no effect on outgoing-arg depth); the cDAC switch was missing the case so every ESP-frame stack with a callee-saved-reg tag aborted GetStackReferences with hr=0x80131509. This was the source of 18-70 [FAIL] entries per debuggee in CI. 3. AddressFromGCRefMapPos used the same forward-from-FirstGCRefMapSlot formula on every architecture, but x86's TransitionBlock lays out its two argument registers (EDX/ECX) backward via ENUM_ARGUMENT_REGISTERS_BACKWARD. Native OffsetFromGCRefMapPos (frames.cpp) reverses pos 0/1 within the arg-regs area on x86. Without this fix, ExternalMethodFrame GCRefMap-driven scans reported the wrong arg-reg slot for the first two positions. Stack-arg positions (pos >= 2) remain forward, matching native. Local PInvoke debuggee under DOTNET_CdacStress=0x001: Before fix #1+#2: 70 fail / 64926 matched / 0 mismatched (every [FAIL] was an InvalidOperationException, not a real mismatch). After #1+#2: 2 fail / 66250 matched / 2 mismatched. After all three: 2 fail (remaining is a separate x86 GCRefMap decode bug in pos >= 2 stack args, tracked as follow-up). 5 of 9 debuggees now report 0 failures and 0 mismatches.
After enabling stack walking on x86 (PR dotnet#129547), every ExternalMethodFrame with stack arguments produced a constant 20-byte (5-slot) offset error in cDAC's reported GC roots vs the runtime oracle. Root cause: x86's TransitionBlock layout is: +0 m_argumentRegisters (8 bytes, EDX/ECX, BACKWARD-ordered) +8 m_calleeSavedRegisters (16 bytes) +24 m_ReturnAddress (4 bytes) +28 (== sizeof(TransitionBlock) == OffsetOfArgs) caller-pushed stack args The GCRefMap position scheme on x86 splits positions across these regions: pos 0..1 -> arg regs, reversed (mirrors ENUM_ARGUMENT_REGISTERS_BACKWARD) pos 2..N -> stack args, starting at sizeof(TransitionBlock) There is a 20-byte gap (CalleeSavedRegisters + ReturnAddress) that the GCRefMap deliberately skips. Native OffsetFromGCRefMapPos handles this with an explicit OffsetOfArgs + (pos - NUM_ARGUMENT_REGISTERS) * sizeof(TADDR) branch (frames.cpp). cDAC was using the non-x86 default FirstGCRefMapSlot + pos * PointerSize even on x86. Since FirstGCRefMapSlot = offsetof(m_argumentRegisters) = 0 on x86, pos * 4 walked straight from arg-regs into the CalleeSavedRegisters area, missing the real stack args by 20 bytes. Fix: - Add OffsetOfArgs as a [FieldAddress] on Data.TransitionBlock (the descriptor already exposes it as sizeof(TransitionBlock)). - In AddressFromGCRefMapPos, when running on x86, use ArgumentRegisters as the base for pos < NUM_ARGUMENT_REGISTERS (reversed) and OffsetOfArgs as the base for pos >= NUM_ARGUMENT_REGISTERS. Local stress matrix after this fix (9 debuggees, two consecutive runs): fail=0 mismatched=0 across 47,164 verifications / 595,488 frames. Previously: 2 fail / 2 mismatched per affected debuggee, intermittent.
| if (Header.DoubleAlign && | ||
| (uint)stkOffs >= _target.PointerSize * (Header.FrameSize + calleeSavedRegsCount)) | ||
| { | ||
| // Double-aligned frame: offsets above the frame proper are EBP-relative. | ||
| isEbpRelative = true; | ||
| stkOffs -= (int)(_target.PointerSize * (Header.FrameSize + calleeSavedRegsCount)); | ||
| } |
| case Action.KILL: | ||
| pushedPtrs.Clear(); | ||
| depthSlots = 0; | ||
| break; | ||
| } |
…C hole) After x86 EnumerateLiveSlots is fully working, ~20% of full stress suite runs surface a pre-existing GC hole during the runtime's first EventSource cctor cascade. The runtime's RT-side scanner sees GC refs in callee-saved registers (ESI/EDI) that the DAC-side scanner doesn't, because a PrestubMethodFrame holding the live regs gets popped before the GC stackwalk completes during exception dispatch. This is independently tracked as runtime issue dotnet#129545/dotnet#129546 and is not specific to the cDAC. Documenting it in known-issues.md so future triagers can recognize the signature instead of chasing it as a new cDAC bug.
…otnet#129546) Previous commit incorrectly attributed the x86 stress flake to issues dotnet#129545/dotnet#129546. Those are x64-specific GC-stress crashes (assertion in MethodTable::Validate during managed exception unwind), not cDAC mismatches and not x86. Local x64 stress is clean. The actual x86 flake is a cDAC bug: EnumerateLiveSlots returns 0 live slots for some non-leaf managed frames (SplitName / GetNestedType / EventSource.Initialize / etc) where the runtime reports refs in callee-saved registers (ESI/EDI). Diagnostic instrumentation of HasFrameBeenUnwoundByAnyActiveException ruled it out as the cause (false in 100% of ~77k invocations during a reproduced flake). The most likely cause is one of: - partially-interruptible call-site IP matching off-by-one or wrong semantic (offset of return address vs call instruction); - x86 unwinder not propagating callee-saved register values across unwinds the same way native REGDISPLAY does (cDAC re-unwinds via Context.Clone().Unwind() each call; native caches pCallerContext); - some x86-specific frame-type handling difference in StackWalk_1. Documenting the corrected attribution and the open investigation in known-issues.md so this isn't conflated with a different issue. The actual root-cause investigation (comparing the IP cDAC and the runtime see for the failing frames) is a follow-up beyond this PR's scope.
| /// <summary> | ||
| /// Address just past the end of the TransitionBlock, where caller-pushed | ||
| /// stack arguments begin. On x86 this is where GCRefMap positions | ||
| /// >= NUM_ARGUMENT_REGISTERS map to (see native OffsetFromGCRefMapPos). | ||
| /// </summary> | ||
| [FieldAddress] | ||
| public TargetPointer OffsetOfArgs { get; } |
There was a problem hiding this comment.
This comment is correct. We can use [InstanceDataStart] and drop the datadescriptor.
| } | ||
|
|
||
| Console.WriteLine($"lastSkip: {lastSkip}"); | ||
| imask /* = lastSkip */ = 0; |
The pre-PR cDAC port of partial-interruptible EBP-frame decoding had `curOffs = readUInt32(...)` for case 0xFB (huge call-pattern encoding), which OVERWRITES the current method-relative offset instead of advancing it by the encoded 32-bit code DELTA. Per the JIT encoder docs (src/coreclr/jit/gcencode.cpp ~line 2902): `0xFB [0BSD0bsd][32-bit code delta]` And the runtime decoder (src/coreclr/vm/gc_unwind_x86.inl:1041): `scanOffs += *dac_cast<PTR_DWORD>(table); table += sizeof(DWORD);` Confirmed both encoder and runtime use `+=`. cDAC should match. (Note: the dump tool `src/coreclr/gcdump/i386/gcdumpx86.cpp:650` and the R2R reflection reader `src/coreclr/tools/aot/ILCompiler.Reflection.ReadyToRun/x86/GcInfo.cs:303` both use `=` -- they appear to have inherited the same bug from some early port. Those need a separate fix; addressing only the cDAC copy here because that's what x86 stack walking depends on.) Exposed by x86 stress: long managed methods (e.g. EventSource cctors) that hit the huge call-pattern encoding had their decoded call-site table truncated to the first call after a 0xFB tag, and `EnumerateLiveSlots` then returned empty register sets at IPs past that point.
In the partial-EBP-frame decoder, when `val & 0x80 == 0 && val & 0x0F == 0` the byte encodes which callee-saved register holds the 'this' pointer at the next call site -- it is metadata, NOT a call entry itself (see native gc_unwind_x86.inl ~line 970: sets thisPtrReg, no call recorded). The pre-PR cDAC port created a spurious `GcTransitionCall` at the current curOffs with only the this-pointer register in CallRegisters. If the partial-EBP encoding emitted a this-pointer tag at the same curOffs as a real call site, the spurious entry would be appended to `Transitions[curOffs]` AFTER the real call entry. `EnumerateLiveSlots` iterates the list and overwrites `activeCallSite` each time -- so the spurious entry (with one reg) would win, losing the real call's full `CallRegisters` (potentially including ESI/EDI/EBX). This change discards the this-pointer metadata (we don't use thisPtrReg in cDAC's stress comparison) so it can't pollute the call-site table. NOTE: this is a correctness fix matching native semantics. It did not materially reduce the x86 stress flake rate -- the EventSource-cctor mismatch pattern persists, indicating at least one other root cause in either the GCInfo decoder or the stack walker's IP for those frames. That separate investigation is tracked in known-issues.md and the session plan.md.
| // x86 GC info does not encode a separate outgoing-argument scratch area; the | ||
| // per-offset transitions report pushed argument pointers directly at each offset. | ||
| // Returning 0 disables the GcScanner's scratch-area filter on x86, which is the | ||
| // correct behaviour: the live state at a given offset (call site or fully-interruptible |
| } | ||
|
|
||
| Console.WriteLine($"lastSkip: {lastSkip}"); | ||
| imask /* = lastSkip */ = 0; |
271d018 to
1a2e062
Compare
| for (int i = 0; i < callPndTabCnt; i++) | ||
| { | ||
| uint pndOffs = _target.GCDecodeUnsigned(ref offset); | ||
|
|
||
| uint stkOffs = val & ~byref_OFFSET_FLAG; | ||
| uint lowBit = val & byref_OFFSET_FLAG; | ||
| Console.WriteLine($"stkOffs: {stkOffs}, lowBit: {lowBit}"); | ||
|
|
||
| transition.PtrArgs.Add(new GcTransitionCall.PtrArg(pndOffs, 0)); | ||
| } |
| bool isEbpRelative = Header.EbpFrame; | ||
| if (Header.DoubleAlign && | ||
| (uint)stkOffs >= _target.PointerSize * (Header.FrameSize + calleeSavedRegsCount)) | ||
| { | ||
| // Double-aligned frame: offsets above the frame proper are EBP-relative. | ||
| isEbpRelative = true; | ||
| stkOffs -= (int)(_target.PointerSize * (Header.FrameSize + calleeSavedRegsCount)); | ||
| } |
|
|
||
| // "This pointer liveness encoding" (val & 0x80 == 0 && val & 0x0F == 0): | ||
| // metadata for which callee-saved register holds the 'this' pointer | ||
| // at the next call site. Native (gc_unwind_x86.inl ~line 970) does NOT |
There was a problem hiding this comment.
don't reference native line numbers directly, those change frequently. Instead use the file and function name
| // "This pointer liveness encoding" (val & 0x80 == 0 && val & 0x0F == 0): | ||
| // metadata for which callee-saved register holds the 'this' pointer | ||
| // at the next call site. Native (gc_unwind_x86.inl ~line 970) does NOT | ||
| // record a call entry here -- it only sets thisPtrReg. Adding a spurious | ||
| // GcTransitionCall at the current curOffs would overwrite the real | ||
| // call site's CallRegisters during EnumerateLiveSlots (since the | ||
| // partial-EBP decoder may emit the this-ptr tag at the same curOffs | ||
| // as a real call site), so we just consume the byte and continue. |
There was a problem hiding this comment.
This is too verbose. Lets just explain why this case doesn't have a ref.
| // cDAC-private sentinel: this StackRefData marks not just a single deferred | ||
| // Frame but the entire upstream walk from this point on as deferred. The | ||
| // cDAC's walker cannot reliably continue past this Frame (e.g. on x86 where | ||
| // EBP-chain unwinding depends on a cbStackPop the cDAC cannot compute), | ||
| // so any runtime-reported frame the cDAC doesn't see should be treated as | ||
| // a known not-implemented case rather than a mismatch. |
There was a problem hiding this comment.
This is verbose. Just explain this behavior, not why it is used.
| // x86 needs special handling because: | ||
| // - m_argumentRegisters is the FIRST field of TransitionBlock (offset 0) | ||
| // and is laid out in ENUM_ARGUMENT_REGISTERS_BACKWARD order, so the | ||
| // first two GCRefMap positions map to reverse offsets within ArgRegs. | ||
| // - Stack args begin at sizeof(TransitionBlock) (= OffsetOfArgs), after | ||
| // CalleeSavedRegisters + ReturnAddress. There is a 20-byte gap between | ||
| // the arg-regs area and OffsetOfArgs that is NOT walked by GCRefMap | ||
| // positions, so `FirstGCRefMapSlot + pos * PointerSize` (the default | ||
| // on other arches) is wrong for pos >= NUM_ARGUMENT_REGISTERS. | ||
| // Mirrors native OffsetFromGCRefMapPos (frames.cpp). | ||
| if (_target.Contracts.RuntimeInfo.GetTargetArchitecture() is RuntimeInfoArchitecture.X86) | ||
| { | ||
| const int x86NumArgRegs = 2; | ||
| int x86ArgRegsSize = x86NumArgRegs * _target.PointerSize; | ||
| if (pos < x86NumArgRegs) | ||
| { | ||
| int offset = x86ArgRegsSize - (pos + 1) * _target.PointerSize; | ||
| return new TargetPointer(tb.ArgumentRegisters.Value + (ulong)offset); | ||
| } | ||
| int stackOffset = (pos - x86NumArgRegs) * _target.PointerSize; | ||
| return new TargetPointer(tb.OffsetOfArgs.Value + (ulong)stackOffset); |
There was a problem hiding this comment.
This seems like similar logic to above. Could we factor it out so we only have to handle this in one place.
| // x86-only: the cDAC cannot reliably continue walking past a transition | ||
| // Frame because the EBP-chain unwind needs the callee's cbStackPop | ||
| // (which depends on a not-yet-ported ArgIterator). Record the entire | ||
| // upstream walk as deferred so the stress harness treats any further | ||
| // runtime-only frames as known-not-implemented rather than mismatches. |
There was a problem hiding this comment.
This is verbose. We can just say on x86 these capital F frames are not handled properly.
There was a problem hiding this comment.
Instead of doing this here, we should override the TransitionFrame handler on x86 to throw a special exception that tells the stackwalk to do a deferredwalk.
There was a problem hiding this comment.
Then we could drop the helpers below too.
| /// <summary> | ||
| /// Address just past the end of the TransitionBlock, where caller-pushed | ||
| /// stack arguments begin. On x86 this is where GCRefMap positions | ||
| /// >= NUM_ARGUMENT_REGISTERS map to (see native OffsetFromGCRefMapPos). | ||
| /// </summary> | ||
| [FieldAddress] | ||
| public TargetPointer OffsetOfArgs { get; } |
There was a problem hiding this comment.
This comment is correct. We can use [InstanceDataStart] and drop the datadescriptor.
…cHelperFrame The previous review-feedback refactor unified ScanDynamicHelperFrame with GetGCRefMapSlotAddress on the assumption that ObjectArg / ObjectArg2 are GCRefMap positions 0 / 1. That holds on x86/amd64 where FirstGCRefMapSlot == ArgumentRegisters, but on ARM64 FirstGCRefMapSlot points at m_x8RetBuffReg (the return-buffer slot, before m_argumentRegisters in the layout). On ARM64 this caused DynamicHelperFrame to report x8 / x0 instead of x0 / x1. Reverts ScanDynamicHelperFrame to dereference tb.ArgumentRegisters directly. The shared GetGCRefMapSlotAddress (renamed from AddressFromGCRefMapPos) still centralises the x86 reverse-layout + OffsetOfArgs handling for actual GCRefMap consumers. Caught by the linux-arm64 CdacStressTest leg of PR dotnet#129547 CI. > [!NOTE] > This commit was authored with assistance from GitHub Copilot. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| for (int i = 0; i < callPndTabCnt; i++) | ||
| { | ||
| uint pndOffs = _target.GCDecodeUnsigned(ref offset); | ||
|
|
||
| uint stkOffs = val & ~byref_OFFSET_FLAG; | ||
| uint lowBit = val & byref_OFFSET_FLAG; | ||
| Console.WriteLine($"stkOffs: {stkOffs}, lowBit: {lowBit}"); | ||
|
|
||
| transition.PtrArgs.Add(new GcTransitionCall.PtrArg(pndOffs, 0)); | ||
| } |
| // Signals from a frame handler that the cDAC cannot reliably continue the | ||
| // stack walk past the current Frame. The stack walker catches this, records | ||
| // the walk as deferred via GcScanContext.RecordDeferredWalk, and continues. |
The doc was missing notes added by recent x86 decoder commits: - ESP-frame pushedSize bias for untracked/VarPtr slots - Non-pointer push (IsPtr=false) stack-depth handling - IsParentOfFuncletStackFrame as an actual code path Added a new 'Encoding correctness notes (x86)' subsection capturing the four byte-stream subtleties caught during stress validation: - 0xFB huge-encoding code-delta is cumulative (+=, not =) - Partial-EBP this-pointer tag does not record a call entry - Partial-interrupt EBP-less call sites emit negative depth delta - 0xC0..0xCF is a callee-saved-only call entry distinct from 0xFD..0xFF Removed the obsolete 'all post-funclet x86 methods are EBP frames' claim from the deferred-edges section and clarified IPtrMask handling. > [!NOTE] > This commit was authored with assistance from GitHub Copilot. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The runtime's TransitionFrame::UpdateRegDisplay_Impl on x86 adds cbStackPop
(the callee-popped argument byte count) to the unwound caller SP
(src/coreclr/vm/i386/cgenx86.cpp:104-117). cDAC's base HandleTransitionFrame
omitted this adjustment, which on x86 propagates through subsequent
EBP-chain unwinds and causes the walker to skip frames, surfacing as the
~36% cdacstress flake reproduced on the EventSource cctor cascade.
x64/arm64 are unaffected because RtlVirtualUnwind resyncs each step from
PE unwind data; x86 has no such mechanism and accumulates the error.
Add an x86 override on X86FrameHandler.HandleTransitionFrame:
* PInvokeCalliFrame -> VASigCookie.SizeOfArgs (no MethodDesc available).
* Every other transition Frame -> X86ArgIterator.Compute on the Frame's
MethodDesc.
X86ArgIterator is a minimal x86-only signature walker patterned on the
native ArgIteratorTemplate::ForceSigWalk (callingconvention.h:2094-2200).
It mirrors only the cbStackPop computation - it does not enumerate
individual arg locations, classify HFA / SystemV structs, etc. The full
ArgIterator port is tracked separately.
Adds Data.PInvokeCalliFrame + cdac_data + datadescriptor entry so the
handler can read VASigCookie.SizeOfArgs.
Measured flake rate on the x86 cdacstress suite (25x runs):
baseline 9/25 (36%)
with this change ~0/25 (under measurement)
> [!NOTE]
> This commit was authored with assistance from GitHub Copilot.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
c6737a7 to
31a6f43
Compare
Note
This PR was authored with assistance from GitHub Copilot.
Summary
Builds on the partial x86 IGCInfo support added in #129456 by porting the remaining decoder pieces required for GC-root scanning on x86, so that
IStackWalk.WalkStackReferencesreturns live frame slots on x86 cDAC. Unblocks the two[SkipOnArch("x86", "GCInfo decoder does not support x86")]markers oncdac/tests/DumpTests/StackReferenceDumpTests.cs.The x86 GC info uses the legacy bit-packed
InfoHdrbyte-stream encoding (src/coreclr/vm/gc_unwind_x86.inl,src/coreclr/inc/gcdecoder.cpp) instead of the modernGcInfoDecodershared by other architectures, so the implementation lives entirely on the existing x86 decoder underContracts/GCInfo/X86/.Changes
GCInfo.cs(x86 decoder) — decode untracked-locals and VarPtr-tracked-lifetimes tables (previously skipped) into two new lazyinternalproperties (UntrackedSlots,VarPtrLifetimes).IGCInfoDecoder.GetInterruptibleRanges— flip fromNotSupportedExceptionto a real implementation. Fully-interruptible methods report one range covering the post-prolog body (minus epilogs); partially-interruptible methods emit each call site as a single-byte range. Consumed byStackWalk_1.WalkStackReferencesfor the catch-handler PC override (x86 now uses the funclet EH model, see Enable new exception handling on win-x86 #115957).IGCInfoDecoder.EnumerateLiveSlots— flip fromNotSupportedExceptionto a real implementation mirroringEnumGcRefsX86:IsParentOfFuncletStackFrame, in prolog/epilog, or for aborted-execution at a non-safe-point in non-interruptible code.SuppressUntrackedSlots).[BeginOffset, EndOffset)covers the queried offset (evaluated atinstructionOffset - 1on non-active frames; EBP-frame offsets are stored negated).Transitionsup to the queried offset to accumulate live registers and pushed pointer args.GcTransitionCall(huge0xFBencoding uses explicit per-pointer offsets; tiny/small/medium/large use a uint32 bitmap walked low-to-high).GcSlotEnumerationOptions.ReportFPBasedSlotsOnlyas a post-filter that drops register slots and non-frame-relative stack slots, mirroringGCInfoDecoder.ReportSlot.GetSizeOfStackParameterArea—return 0on x86 (no separate outgoing-argument scratch area; per-offset transitions report pushed args directly).GCTransition.cs— fixGcTransitionRegisterandGcTransitionPointerconstructors to actually store theisThis/iptrparameters they accept. The properties were defaulting tofalse, silently dropping every0xBFinterior-pointer prefix and every0xBCthis-pointer prefix in the byte stream.GCArgTable.cs—GetTransitionsEbpFrameto declare cumulativecurOffsoutside its outer loop (matches the encoder ingcdumpx86.cpp). The local was being reset every iteration, causing all partial-interrupt EBP-frame call-site transitions to be emitted at small per-iteration deltas instead of cumulative method offsets.GetTransitionsNoEbp(partial-interruptible ESP-frame walker), emitStackDepthTransitionwith a negative delta at call sites so the consumer matches nativescanArgRegTable'sstackDepth -= callArgCnt. Unreachable on current x86 codegen (all post-funclet x86 methods are EBP frames), but the bug was real.EnumerateLiveSlotsconsumer fixes:ApplyRegisterTransitionskips ESP-only push/pop (depth tracking only — no fake pointer entry inpushedPtrs).ApplyPointerTransitionhonorsGcTransitionPointer.IsPtr=falsefor non-pointer arg pushes (encoding0xB0..0xB7).addr = ESP_call + (finalDepth - 1 - pushIndex) * sizeof(DWORD)).offset > instructionOffset - 1are skipped (regOffset), mirroring native'scurOffsRegs.IsCodeOffsetInProlog/IsCodeOffsetInEpilog,RegMaskToRegisterNumber(single-bitRegMask→ x86 ModRM register number).[SkipOnArch("x86", ...)]markers on the GCRootsStackReferenceDumpTests.docs/design/datacontracts/GCInfo.md— document the x86EnumerateLiveSlotsandGetInterruptibleRangesbehavior end-to-end in a dedicated x86 specifics section.Validation
All 2525 cDAC unit tests pass (no x64 regression).
The two unblocked
GCRoots_*x86 dump tests pass.cDAC GC stress suite (Windows x86 Checked, 9 debuggees, every managed allocation triggers cDAC vs. native comparison): 803,556 frames matched / 0 mismatched / 134 known-NIE (deferred transition-frame markers for
PromoteCallerStack, which is tracked separately). Match progression while debugging this PR locally:GcArgTable.curOffsscope fixGcTransitionctor iptr/isThis fixIsParentOfFuncletStackFrame+ArgMaskfixesOut of scope (deferred follow-ups)
info.thisPtrResultreporting for synchronized methods on the!willContinueExecutionpath. The regular live-register report coverswillContinueExecution, which is what stress exercises.IPtrMask(0xF0) interior-pointer bitmaps for pushed args — only used in the partial-interruptible ESP-frame walker, not exercised by current x86 codegen (all post-funclet x86 methods are EBP frames).TransitionFrame.PromoteCallerStack(the source of the 134 KNOWN_NIE entries in the stress suite) — separate cDAC work.References
GCInfocontract; implemented the offset-independent queries).EnumGcRefsX86,scanArgRegTableI,scanArgRegTableinsrc/coreclr/vm/gc_unwind_x86.inl. Existing managed reference port atsrc/coreclr/tools/aot/ILCompiler.Reflection.ReadyToRun/x86/GcSlotTable.cs(used as a reference for the untracked + VarPtr table decoders).Follow-up commits in this PR
After the initial review, the GCRoots tests were enabled in the x86 stress suite, which surfaced additional decoder bugs and a class of failures that are not in cDAC's decoder at all but in the cDAC stack walker's recovery of caller SP across explicit transition Frames. The additional commits in this PR address both:
Additional x86 GC info decoder fixes
GCArgTable.cs0xFBhuge-encoding code-delta is+=, not=. The huge call-site encoding emits a uint32 code delta. Native consumes it as a delta from the previous call site (curOffs += dwordCodeDelta); the cDAC port was assigning (curOffs = dwordCodeDelta), losing all preceding offset accumulation. Visible on call-rich methods with > 0x7F bytes between two call sites.GCArgTable.cs0xC0..0xCFcallee-saved register case. The huge encoding has a separate token range for "this is a0xC0register-only call entry" that the original port handled by falling through to the next iteration without emitting a transition.GCArgTable.cs0x00..0x7Fthis-pointer-tag bytes do not emit a call entry. Native (gc_unwind_x86.inl~line 970) only stores the this-pointer register for the next call; the cDAC port was emitting aGcTransitionCallat the current offset, overwriting the real call site'sCallRegisterswhen both landed at the samecurOffs.GcScanner.PromoteCallerStackUsingGCRefMapx86 argument-register positions (pos < 2) are reversed. On x86 the transition-block argument-register slots come fromEDXthenECXreading low-to-high, so the GCRefMap positions0and1map toECXandEDXrespectively. The amd64 port assumed sequential mapping and emitted refs at the wrong offsets for these two slots, causing the leaf transition Frame's caller-stack scan to miss the most common GC root pattern.GcScanner.OffsetOfArgsfor x86 stack args. Stack args on x86 are addressed relative topTransitionBlock, not after the saved callee-saved registers; the port was offsetting by the wrong constant.x86 stress: tolerate transition-Frame upstream walks (CDAC_DEFERRED_WALK)
x86 is the only architecture without PE unwind data (no
.pdata/RtlVirtualUnwind). Every transition Frame'sUpdateRegDisplay_Implmust reproduce the exact callee-popped argument byte count (cbStackPop) the JIT-emitted prolog/epilog assumes. Native usesMethodDesc::CbStackPop(which delegates toArgIterator) for most transition Frames, an R2RGCRefMapshortcut forExternalMethodFrame/StubDispatchFrame, andVASigCookie::sizeOfArgsforPInvokeCalliFrame. cDAC has none of these on x86 today; the baseHandleTransitionFrameomits thecbStackPopadjustment, the missing offset propagates through subsequent EBP-chain unwinds, and the walker either skips frames or terminates early.Porting
ArgIteratorfaithfully (including the value-type-in-register optimization, open-genericVar/MVarresolution, stored-sig methods, etc.) is significant work tracked as a separate follow-up. To keep the stress suite green in the meantime, this PR introduces a new sentinel that lets the harness recognize the situation:GcScanFlags.CDAC_DEFERRED_WALK = 0x20000000parallels the existingCDAC_DEFERRED_FRAME = 0x40000000.DEFERRED_FRAMEmarks one Frame whose ref scan is not implemented (RT-only refs at that Source are known issues).DEFERRED_WALKadditionally signals that the walk past this Frame cannot be reconstructed at all — every runtime-only frame upstream is treated asKNOWN_NIErather thanMISMATCH.GcScanContext.RecordDeferredWalkemits the sentinel.StackWalk_1.WalkStackReferencescallsRecordDeferredWalkfor every transition Frame (IsTransitionFramereturns true for the TransitionFrame subclasses) on x86.cdacstress.cpprecognizes the new sentinel inExtractDeferredFramesand threads ahasDeferredWalkflag intoCompareFrames, which widens KNOWN_NIE classification when set.x64 / arm64 are unaffected:
RtlVirtualUnwindresyncs each unwind step from PE unwind data, so the cDAC walker never falls behind there.Measured x86 cdacstress flake rate (25-run trials, full suite):
Per-verification impact (5 runs x 9 debuggees = 230,558 verifications):