Skip to content

Jobs Hung When Jenkins Temporarily In Quietdown Mode #113

@David-Villeneuve

Description

@David-Villeneuve

Version report

Jenkins and plugins versions report:

This has been happening for a long time, several versions of Jenkins and combinations of plug-ins. In this case, Jenkins 2.303.1, and version 1.11 of the swarm plug-in.

  • What Operating System are you using (both controller, and any agents involved in the problem)?

Ubuntu 16, and agents are a variety of Ubuntu versions (16/18/20)

Reproduction steps

  • Setup a job which uses a swarm template
  • Put Jenkins in shutdown mode
  • Trigger the job. The job will be held, and the agent will start
  • After 2-3 minutes, Jenkins takes the node down because it is idle
  • Cancel shutdown mode.

Results

Expected result:

Build runs on agent.

Actual result:

Build never happens, and new builds can't run until it is manually cancelled.

Possible Solution

I made the following change to prevent calling done(c). It results in the agent being taken down and a new one being created. This will repeat until Jenkins shutdown is cancelled:

index 31e4878..41cc23d 100644
--- a/src/main/java/org/jenkinsci/plugins/docker/swarm/DockerSwarmAgentRetentionStrategy.java
+++ b/src/main/java/org/jenkinsci/plugins/docker/swarm/DockerSwarmAgentRetentionStrategy.java
@@ -19,6 +19,7 @@ import hudson.model.Executor;
 import hudson.model.ExecutorListener;
 import hudson.model.Queue;
 import hudson.slaves.RetentionStrategy;
+import jenkins.model.Jenkins;
 
 public class DockerSwarmAgentRetentionStrategy extends RetentionStrategy<DockerSwarmComputer>
         implements ExecutorListener {
@@ -45,7 +46,7 @@ public class DockerSwarmAgentRetentionStrategy extends RetentionStrategy<DockerS
             final long connectTime = System.currentTimeMillis() - c.getConnectTime();
             final long idleTime = System.currentTimeMillis() - c.getIdleStartMilliseconds();
             final boolean isTimeout = connectTime > timeout && idleTime > timeout;
-            if (isTimeout && (!isTaskAccepted || isTaskCompleted)) {
+            if (isTimeout && (!isTaskAccepted || isTaskCompleted ) && !Jenkins.getInstance().isQuietingDown()) {
                 LOGGER.log(Level.INFO, "Disconnecting due to idle {0}", c.getName());
                 done(c);
             }```

I don't know enough about the interactions with the caller, so this may not be the most optimal solution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions