Skip to content

idleMinutes config leads to limit counter leak with ephemeral templates (and restart before idle timeout) #2783

@apuig

Description

@apuig

Jenkins and plugins versions report

Environment
Jenkins: 2.528.3.35200
OS: Linux - 6.14.0-37-generic
Java: 21.0.9 - Red Hat, Inc. (OpenJDK 64-Bit Server VM)
---
kubernetes:4392.v19cea_fdb_5913
kubernetes-client-api:7.3.1-256.v788a_0b_787114
kubernetes-credentials:206.vde31a_b_0f71a_c

What Operating System are you using (both controller, and any agents involved in the problem)?

OS: Linux - 6.14.0-37-generic

Reproduction steps

  1. Kubernetes cloud with containerCap: 5

  2. Create pipeline using podTemplate with idleMinutes: 5

podTemplate(
    cloud: 'k8',    
    idleMinutes: 5,
    containers: [
        containerTemplate(
            name: 'jnlp', image: 'jenkins/inbound-agent:latest-jdk17'
        )
    ]
) {
    node(POD_LABEL) {
        stage('Test Job') {            
            sleep 5
        }
    }
}
  1. Run pipeline 5 times, wait for each to complete

  2. Restart Jenkins during idleMinutes timeout (before agents are deleted)

  3. Run the pipeline again after restart

Expected Results

  • A new agent can be provisioned

  • Agents are always removed after an idle timeout, reducing the current total and making room for new agents to be created.

Actual Results

  • The new agent cannot be provisioned because current limit is reached

  • After Jenkins restarts:

  1. 5 nodes (jobname-uuid) exists in manage/computer/
  2. limit counter is 0
  3. on first pipeline run, limit counter is updated to 5
  4. after idle timeout, the nodes are removed, but the limit counter is not decremented
  5. new pipeline builds cannot run because the limit is set to 5, even when no computer nodes are available

Anything else?

Possible root cause: agents lose their transient template references during reload, which causes the reaper cleanup process to skip node unregistration

There may be other ways to reproduce this issue besides a manual restart. Other plugins or processes could also serialize and restore agents, resulting in the same problem.

Are you interested in contributing a fix?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions