Skip to content

Commit d834f8a

Browse files
authored
Fix deadlock when afterDisconnect is called during launch() (#304)
Deadlock happens between: (a) https://github.com/jenkinsci/ssh-slaves-plugin/blob/main/src/main/java/hudson/plugins/sshslaves/SSHLauncher.java#L953 (b) https://github.com/jenkinsci/ssh-slaves-plugin/blob/main/src/main/java/hudson/plugins/sshslaves/SSHLauncher.java#L413 Launch() is stuck here: https://github.com/jenkinsci/ssh-slaves-plugin/blob/main/src/main/java/hudson/plugins/sshslaves/SSHLauncher.java#L486 , and `callable` is stuck waiting here: https://github.com/jenkinsci/trilead-ssh2/blob/8e7118e74deacd7ef044df9076fcb9b95280c163/src/com/trilead/ssh2/channel/FifoBuffer.java#L212 After some time `afterDisconnect` is called which initiates tear down process. Teardown process calls launcherExecutorService.shutdown() (here: https://github.com/jenkinsci/ssh-slaves-plugin/blob/main/src/main/java/hudson/plugins/sshslaves/SSHLauncher.java#L941) and continues to (a). But because there is no writer writing to FifoBuffer, launch() never finishes and is stuck in (b), and because `shutdown()` only prevents new tasks being submitted (https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/concurrent/ExecutorService.html) there is no one to interrupt lock.wait() in FifoBuffer, thus deadlock. The solutions is to use recommended way of shutting down ExecutorService from docs.
1 parent e0badc9 commit d834f8a

File tree

1 file changed

+26
-11
lines changed

1 file changed

+26
-11
lines changed

src/main/java/hudson/plugins/sshslaves/SSHLauncher.java

Lines changed: 26 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -505,11 +505,7 @@ public void launch(final SlaveComputer computer, final TaskListener listener) th
505505
System.out.println(Messages.SSHLauncher_LaunchFailed(getTimestamp(),
506506
nodeName, host));
507507
} finally {
508-
ExecutorService srv = launcherExecutorService;
509-
if (srv != null) {
510-
srv.shutdownNow();
511-
launcherExecutorService = null;
512-
}
508+
shutdownAndAwaitTerminationOfLauncher();
513509
}
514510
}
515511
if (node != null && getTrackCredentials()) {
@@ -934,12 +930,7 @@ public void afterDisconnect(SlaveComputer slaveComputer, final TaskListener list
934930
// Nothing to do here, the connection is not established
935931
return;
936932
}
937-
938-
ExecutorService srv = launcherExecutorService;
939-
if (srv != null) {
940-
// If the service is still running, shut it down and interrupt the operations if any
941-
srv.shutdown();
942-
}
933+
shutdownAndAwaitTerminationOfLauncher();
943934

944935
if (tearingDownConnection) {
945936
// tear down operation is in progress, do not even try to synchronize the call.
@@ -982,6 +973,30 @@ private void tearDownConnectionImpl(@NonNull SlaveComputer slaveComputer, final
982973
}
983974
}
984975

976+
private void shutdownAndAwaitTerminationOfLauncher() {
977+
ExecutorService srv = launcherExecutorService;
978+
if (srv == null) {
979+
return;
980+
}
981+
srv.shutdown(); // Disable new tasks from being submitted
982+
launcherExecutorService = null;
983+
try {
984+
// Wait a while for existing tasks to terminate
985+
if (!srv.awaitTermination(10, TimeUnit.SECONDS)) {
986+
srv.shutdownNow(); // Cancel currently executing tasks
987+
// Wait a while for tasks to respond to being cancelled
988+
if (!srv.awaitTermination(10, TimeUnit.SECONDS)) {
989+
LOGGER.log(WARNING, "launcherExecutorService thread pool did not terminate cleanly");
990+
}
991+
}
992+
} catch (InterruptedException ie) {
993+
// (Re-)Cancel if current thread also interrupted
994+
srv.shutdownNow();
995+
// Preserve interrupt status
996+
Thread.currentThread().interrupt();
997+
}
998+
}
999+
9851000
/**
9861001
* If the SSH connection as a whole is lost, report that information.
9871002
*/

0 commit comments

Comments
 (0)