Skip to content

Call Launcher afterDisconnect() only once when disconnecting an agent#26402

Open
Anvarjon7 wants to merge 1 commit intojenkinsci:masterfrom
Anvarjon7:fix-jenkins-35272
Open

Call Launcher afterDisconnect() only once when disconnecting an agent#26402
Anvarjon7 wants to merge 1 commit intojenkinsci:masterfrom
Anvarjon7:fix-jenkins-35272

Conversation

@Anvarjon7
Copy link

@Anvarjon7 Anvarjon7 commented Mar 4, 2026

Fixes #17204

Takes over the work from #26188 to fix JENKINS-35272, addressing the Windows test failure that caused the previous attempt to be reverted.

Root Cause
When an agent disconnects, SlaveComputer.java executes launcher.afterDisconnect() from both the explicit disconnect() thread and the onClosed() channel listener. This caused duplicate teardown actions, such as shutting down VMs twice.

Why the Previous Fix Failed on Windows (UnsupportedRemotingAgentTest)
The previous attempt used a non-blocking AtomicBoolean to prevent the second call. However, if the test framework triggered a disconnect while the background Remoting thread was already killing the process, the test framework's thread would instantly return false (thinking the job was done) and proceed to the JUnit @TempDir cleanup phase. On Windows, this caused an AccessDeniedException because the background thread was still holding the file lock on unsupported-agent.jar while killing the process.

The Solution
Instead of an AtomicBoolean, this PR introduces a synchronized(disconnectLock) block around the execution flag.

  • It ensures afterDisconnect() is only ever executed once.
  • It forces any subsequent calling threads to block and wait until the first thread has fully completed the teardown logic.
  • This restores the synchronous contract expected by the test framework, ensuring that by the time a test finishes, the agent process is completely dead and file locks are gracefully released.

Testing done

Verified locally on Windows native CLI. Executed mvn test -Dtest=UnsupportedRemotingAgentTest which now consistently passes, confirming the file-lock race condition is fully resolved and the asynchronous disconnect sequence safely completes before the test teardown.

Screenshots (UI changes only)

N/A

Proposed changelog entries

  • Prevent afterDisconnect() from being called twice when an agent disconnects

Proposed changelog category

/label bug

Proposed upgrade guidelines

N/A

Submitter checklist

  • The issue, if it exists, is well-described.
  • The changelog entries and upgrade guidelines are appropriate for the audience affected by the change (users or developers, depending on the change) and are in the imperative mood. Fill in the Proposed upgrade guidelines section only if there are breaking changes or changes that may require extra steps from users during upgrade.
  • There is automated testing or an explanation as to why this change has no tests.
  • New public classes, fields, and methods are annotated with @Restricted or have @since TODO Javadocs, as appropriate.
  • New deprecations are annotated with @Deprecated(since = "TODO") or @Deprecated(forRemoval = true, since = "TODO"), if applicable.
  • UI changes do not introduce regressions when enforcing the current default rules of Content Security Policy Plugin. In particular, new or substantially changed JavaScript is not defined inline and does not call eval to ease future introduction of Content Security Policy (CSP) directives.
  • For dependency updates, there are links to external changelogs and, if possible, full differentials.
  • For new APIs and extension points, there is a link to at least one consumer.

Desired reviewers

@jenkinsci/core-pr-reviewers

Before the changes are marked as ready-for-merge:

Maintainer checklist

  • There are at least two (2) approvals for the pull request and no outstanding requests for change.
  • Conversations in the pull request are over, or it is explicit that a reviewer is not blocking the change.
  • Changelog entries in the pull request title and/or Proposed changelog entries are accurate, human-readable, and in the imperative mood.
  • Proper changelog labels are set so that the changelog can be generated automatically.
  • If the change needs additional upgrade steps from users, the upgrade-guide-needed label is set and there is a Proposed upgrade guidelines section in the pull request title (see example).
  • If it would make sense to backport the change to LTS, be a Bug or Improvement, and either the issue or pull request must be labeled as lts-candidate to be considered.

@comment-ops-bot comment-ops-bot bot added the bug For changelog: Minor bug. Will be listed after features label Mar 4, 2026
@Anvarjon7 Anvarjon7 changed the title [JENKINS-35272] Launcher's afterDisconnect() method is called twice #17204 [JENKINS-35272] Launcher's afterDisconnect() method is called twice Mar 4, 2026
@MarkEWaite MarkEWaite requested a review from Copilot March 7, 2026 15:29
@MarkEWaite MarkEWaite changed the title [JENKINS-35272] Launcher's afterDisconnect() method is called twice Call Launcher afterDisconnect() only once when disconnecting an agent Mar 7, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes JENKINS-35272 by ensuring ComputerLauncher.afterDisconnect(...) is executed only once per agent connection lifecycle, while also making concurrent disconnect paths block until teardown completes (to avoid Windows file-lock races).

Changes:

  • Add a disconnectLock + afterDisconnectCalled guard to serialize and deduplicate afterDisconnect() execution.
  • Apply the guard in both the Remoting onClosed() listener and the explicit disconnect() code path.
  • Reset the guard when a new channel is established.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +765 to +768
// Reset the disconnect guard for the new connection lifecycle
synchronized (disconnectLock) {
this.afterDisconnectCalled = false;
}
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

afterDisconnectCalled is reset fairly late in setChannel() (after the channel listener is already registered and multiple remote calls have run). If the new channel closes quickly (e.g., setup failure) before this reset executes, onClosed() can run while afterDisconnectCalled is still true from the previous lifecycle and will skip launcher.afterDisconnect, potentially leaving teardown undone for that failed connection attempt.

Consider resetting the guard under disconnectLock earlier (right after the initial this.channel != null check, before channel.addListener(...) / remote calls). This also avoids holding channelLock while waiting for disconnectLock, which can unnecessarily block other channel state transitions.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug For changelog: Minor bug. Will be listed after features

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[JENKINS-35272] Launcher's afterDisconnect() method is called twice

2 participants