Fix PID-recycling race in orphan-lock detection#1989
Merged
tyrielv merged 1 commit intoMay 27, 2026
Conversation
37dd08b to
05a8273
Compare
GVFS's orphan-lock check (GVFSLock.LockHolder.GetExternalHolder) only
asked the OS "is there a process at this PID?" via IsProcessActive. If
the original lock-holder exited and Windows recycled its PID to an
unrelated process before the next git command arrived, GVFS saw a live
process at that PID and concluded the lock was still held. The git
command then either waited the full 300s timeout or printed a spurious
"Waiting for '<holder>' to release the lock" message.
This is a real product bug, not just a test issue. It surfaced as
flakiness in the OrphanedGVFSLockIsCleanedUp functional test on busy CI
agents (two failure modes: 300s timeout, and assertion failure with
non-empty stderr), but the same race can affect users when a PID is
quickly reassigned by the OS scheduler.
The fix is process-identity tracking. When the lock is granted to an
external requestor, capture that process's creation timestamp
(GetProcessTimes, an opaque 100-ns FILETIME value) alongside its PID.
On every orphan check, look up the current creation timestamp at that
PID and compare. If the process is gone the lookup fails; if a
different process is now at the PID the timestamps differ. Either way
the lock is correctly recognized as orphaned.
Implementation notes:
* New abstract GVFSPlatform.TryGetActiveProcessStartTime returns the
raw 64-bit creation time. It is documented as identity-only; the
value is never decoded as a date.
* The Windows implementation combines GetExitCodeProcess (STILL_ACTIVE
check) and GetProcessTimes. GetProcessTimes alone is not sufficient
because it still returns a creation time for terminated processes
whose kernel object is kept alive by an outstanding handle
elsewhere.
* Acquisition does not fail if start-time capture fails. The existing
IsProcessActiveImplementation has a Process.GetProcessById fallback
for cross-integrity callers that OpenProcess(QueryLimitedInformation)
cannot open, and we preserve the same compatibility surface here. A
HasStartTime flag is stored alongside the PID; when it is false the
orphan check falls back to the legacy IsProcessActive call. The
fallback is recorded in telemetry (StartTimeUnavailable=true on the
TryAcquireLockExternal event) so we can see if it ever becomes
common in the field.
* The ExternalLockHolderExited telemetry event now carries an
ExternalHolderTerminationReason field (ProcessNotActive | PidRecycled
| Unknown) to make future post-mortems straightforward.
* The OrphanedGVFSLockIsCleanedUp functional test is unchanged. It
polls Process.GetProcessById to detect when the lock holder has
exited, then runs git status. With this product fix, git status
succeeds cleanly even if the PID has already been recycled to an
unrelated process by the time mount processes the request, so the
test passes naturally without any test-side workaround.
* Two new unit tests cover the new identity-check path:
TryAcquireLockForExternalRequestor_WhenHolderPidRecycled (verifies
that an orphan is detected when start times differ at the same PID)
and TryAcquireLockForExternalRequestor_WhenHolderStartTimeMatches
(the inverse: a still-alive holder is correctly seen as active and
a new acquisition is denied).
Assisted-by: Claude Opus 4.7
Signed-off-by: Tyrie Vella <tyrielv@gmail.com>
05a8273 to
abe5da6
Compare
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix a PID-recycling race in
GVFSLock's orphan-lock detection. When the lock holder exits and Windows quickly recycles its PID to an unrelated process,IsProcessActive(pid)returns true and GVFS concludes the lock is still held — the next git command then either waits the full 300 s timeout or prints a spurious "Waiting for '' to release the lock" message and an empty stderr assertion fails.This surfaced as flake in the
OrphanedGVFSLockIsCleanedUpfunctional test on busy CI agents (two failure modes: 300 s timeout, and the "Waiting for…" assertion). The same race can affect users when the OS quickly reuses a freed PID.The fix
Process-identity tracking via creation time:
GetProcessTimes, an opaque 100-nsFILETIME) alongside its PID.Design notes
GetProcessTimesalone is not enough — it still returns a creation time for terminated processes whose kernel object lingers due to an outstanding handle elsewhere. The Windows implementation combinesGetExitCodeProcess(STILL_ACTIVE check) withGetProcessTimes.IsProcessActiveImplementationhas aProcess.GetProcessByIdfallback for cross-integrity callers thatOpenProcess(QueryLimitedInformation)cannot open; we preserve that compatibility surface. AHasStartTimeflag is stored alongside the PID; when false, the orphan check falls back to the legacyIsProcessActivecall (the pre-fix behavior). The fallback is recorded in telemetry (StartTimeUnavailable=trueonTryAcquireLockExternal).ExternalLockHolderExitednow carries anExternalHolderTerminationReasonfield (ProcessNotActive|PidRecycled|Unknown) for easy post-mortems.Files changed (7, +231/-10)
GVFS.Common/NativeMethods.Shared.csGetProcessTimesP/InvokeGVFS.Common/GVFSPlatform.csTryGetActiveProcessStartTimeGVFS.Platform.Windows/WindowsPlatform.Shared.csTryGetActiveProcessStartTimeImplementation(combined active-check + start-time)GVFS.Platform.Windows/WindowsPlatform.csGVFS.Common/GVFSLock.csGVFS.UnitTests/Mock/Common/MockPlatform.csProcessStartTimesdictionary for testsGVFS.UnitTests/Common/GVFSLockTests.csWhenHolderPidRecycled(orphan detected via start-time mismatch),WhenHolderStartTimeMatches(live holder correctly seen as active)What is intentionally NOT changed
LockDatawire format is unchanged — full backward compat with older clients/servers).GVFS.FunctionalTests.LockHolder.OrphanedGVFSLockIsCleanedUpfunctional test. The test was correctly exposing the product bug; with the product fix, it passes naturally without any test-side workaround.GVFS.Hooksstartup PID check (Program.cs:251). That's a one-shot check on the hook's git.exe parent at startup, not a polling loop, so no PID-reuse race.Validation
dotnet build GVFS.UnitTests— clean.scripts\Build.bat Debug— clean.GVFS.UnitTests.exe— 815/815 pass, including the two new tests for the identity-check path and all existing tests (which now exercise the start-time-unavailable fallback path).