Call Launcher afterDisconnect() only once when disconnecting an agent#26402
Call Launcher afterDisconnect() only once when disconnecting an agent#26402Anvarjon7 wants to merge 1 commit intojenkinsci:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes JENKINS-35272 by ensuring ComputerLauncher.afterDisconnect(...) is executed only once per agent connection lifecycle, while also making concurrent disconnect paths block until teardown completes (to avoid Windows file-lock races).
Changes:
- Add a
disconnectLock+afterDisconnectCalledguard to serialize and deduplicateafterDisconnect()execution. - Apply the guard in both the Remoting
onClosed()listener and the explicitdisconnect()code path. - Reset the guard when a new channel is established.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Reset the disconnect guard for the new connection lifecycle | ||
| synchronized (disconnectLock) { | ||
| this.afterDisconnectCalled = false; | ||
| } |
There was a problem hiding this comment.
afterDisconnectCalled is reset fairly late in setChannel() (after the channel listener is already registered and multiple remote calls have run). If the new channel closes quickly (e.g., setup failure) before this reset executes, onClosed() can run while afterDisconnectCalled is still true from the previous lifecycle and will skip launcher.afterDisconnect, potentially leaving teardown undone for that failed connection attempt.
Consider resetting the guard under disconnectLock earlier (right after the initial this.channel != null check, before channel.addListener(...) / remote calls). This also avoids holding channelLock while waiting for disconnectLock, which can unnecessarily block other channel state transitions.
Fixes #17204
Takes over the work from #26188 to fix JENKINS-35272, addressing the Windows test failure that caused the previous attempt to be reverted.
Root Cause
When an agent disconnects,
SlaveComputer.javaexecuteslauncher.afterDisconnect()from both the explicitdisconnect()thread and theonClosed()channel listener. This caused duplicate teardown actions, such as shutting down VMs twice.Why the Previous Fix Failed on Windows (
UnsupportedRemotingAgentTest)The previous attempt used a non-blocking
AtomicBooleanto prevent the second call. However, if the test framework triggered a disconnect while the background Remoting thread was already killing the process, the test framework's thread would instantly returnfalse(thinking the job was done) and proceed to the JUnit@TempDircleanup phase. On Windows, this caused anAccessDeniedExceptionbecause the background thread was still holding the file lock onunsupported-agent.jarwhile killing the process.The Solution
Instead of an
AtomicBoolean, this PR introduces asynchronized(disconnectLock)block around the execution flag.afterDisconnect()is only ever executed once.Testing done
Verified locally on Windows native CLI. Executed
mvn test -Dtest=UnsupportedRemotingAgentTestwhich now consistently passes, confirming the file-lock race condition is fully resolved and the asynchronous disconnect sequence safely completes before the test teardown.Screenshots (UI changes only)
N/A
Proposed changelog entries
Proposed changelog category
/label bug
Proposed upgrade guidelines
N/A
Submitter checklist
@Restrictedor have@since TODOJavadocs, as appropriate.@Deprecated(since = "TODO")or@Deprecated(forRemoval = true, since = "TODO"), if applicable.evalto ease future introduction of Content Security Policy (CSP) directives.Desired reviewers
@jenkinsci/core-pr-reviewers
Before the changes are marked as
ready-for-merge:Maintainer checklist
upgrade-guide-neededlabel is set and there is a Proposed upgrade guidelines section in the pull request title (see example).lts-candidateto be considered.