Skip to content

Executor stucked due to checkForMinimumInstances() when AWS Spot instance capacity is running low #1993

@NSpenlihauer

Description

@NSpenlihauer

Jenkins and plugins versions report

Environment
Jenkins: 2.528.3
OS: Linux - 4.14.193-113.317.amzn1.x86_64
Java: 21.0.9 - Eclipse Adoptium (OpenJDK 64-Bit Server VM)
---
ace-editor:1.1
amazon-ecs:1.49
analysis-model-api:13.18.0-935.v784ca_107400a_
ansicolor:536.v13fa_b_860c267
antisamy-markup-formatter:173.v680e3a_b_69ff3
apache-httpcomponents-client-4-api:4.5.14-269.vfa_2321039a_83
apache-httpcomponents-client-5-api:5.6-183.ve5a_8a_b_e71e59
asm-api:9.9.1-189.vb_5ef2964da_91
audit-trail:436.vc0d1e79fc5a_3
authentication-tokens:1.144.v5ff4a_5ec5c33
aws-credentials:254.v978a_5e206a_d7
aws-java-sdk:1.12.780-480.v4a_0819121a_9e
aws-java-sdk-api-gateway:1.12.780-480.v4a_0819121a_9e
aws-java-sdk-autoscaling:1.12.780-480.v4a_0819121a_9e
aws-java-sdk-cloudformation:1.12.780-480.v4a_0819121a_9e
aws-java-sdk-cloudfront:1.12.780-480.v4a_0819121a_9e
aws-java-sdk-cloudwatch:1.12.780-480.v4a_0819121a_9e
aws-java-sdk-codebuild:1.12.780-480.v4a_0819121a_9e
aws-java-sdk-codedeploy:1.12.780-480.v4a_0819121a_9e
aws-java-sdk-ec2:1.12.780-480.v4a_0819121a_9e
aws-java-sdk-ecr:1.12.780-480.v4a_0819121a_9e
aws-java-sdk-ecs:1.12.780-480.v4a_0819121a_9e
aws-java-sdk-efs:1.12.780-480.v4a_0819121a_9e
aws-java-sdk-elasticbeanstalk:1.12.780-480.v4a_0819121a_9e
aws-java-sdk-elasticloadbalancingv2:1.12.780-480.v4a_0819121a_9e
aws-java-sdk-iam:1.12.780-480.v4a_0819121a_9e
aws-java-sdk-kinesis:1.12.780-480.v4a_0819121a_9e
aws-java-sdk-lambda:1.12.780-480.v4a_0819121a_9e
aws-java-sdk-logs:1.12.780-480.v4a_0819121a_9e
aws-java-sdk-minimal:1.12.780-480.v4a_0819121a_9e
aws-java-sdk-organizations:1.12.780-480.v4a_0819121a_9e
aws-java-sdk-secretsmanager:1.12.780-480.v4a_0819121a_9e
aws-java-sdk-sns:1.12.780-480.v4a_0819121a_9e
aws-java-sdk-sqs:1.12.780-480.v4a_0819121a_9e
aws-java-sdk-ssm:1.12.780-480.v4a_0819121a_9e
aws-java-sdk2-core:2.33.4-62.vc1a_8df64b_4c9
aws-java-sdk2-ec2:2.33.4-62.vc1a_8df64b_4c9
aws-java-sdk2-secretsmanager:2.33.4-62.vc1a_8df64b_4c9
aws-parameter-store:1.2.2
aws-secrets-manager-credentials-provider:2.222.v376939a_9ffb_b
blueocean:1.27.25
blueocean-autofavorite:1.2.5
blueocean-bitbucket-pipeline:1.27.25
blueocean-commons:1.27.25
blueocean-config:1.27.25
blueocean-core-js:1.27.25
blueocean-dashboard:1.27.25
blueocean-display-url:2.4.4
blueocean-events:1.27.25
blueocean-git-pipeline:1.27.25
blueocean-github-pipeline:1.27.25
blueocean-i18n:1.27.25
blueocean-jira:1.27.25
blueocean-jwt:1.27.25
blueocean-personalization:1.27.25
blueocean-pipeline-api-impl:1.27.25
blueocean-pipeline-editor:1.27.25
blueocean-pipeline-scm-api:1.27.25
blueocean-rest:1.27.25
blueocean-rest-impl:1.27.25
blueocean-web:1.27.25
bootstrap4-api:4.6.0-6
bootstrap5-api:5.3.8-895.v4d0d8e47fea_d
bouncycastle-api:2.30.1.82-277.v70ca_0b_877184
branch-api:2.1268.v044a_87612da_8
build-discarder:158.vce570d01ce4c
build-failure-analyzer:2.6.1
build-timeout:1.39
build-user-vars-plugin:212.vd6b_e9f6d0cdb_
build-with-parameters:81.ve4a_9c2499d9a
caffeine-api:3.2.3-194.v31a_b_f7a_b_5a_81
checks-api:402.vca_263b_f200e3
cloudbees-bitbucket-branch-source:937.2.3
cloudbees-disk-usage-simple:256.v20ec4eb_884f1
cloudbees-folder:6.1073.va_7888eb_dd514
cloudbees-jenkins-advisor:392.v6ca_b_ff4e12fa_
command-launcher:123.v37cfdc92ef67
commons-collections4-api:4.5.0-8.va_d5448ef9011
commons-compress-api:1.28.0-2
commons-lang3-api:3.20.0-109.ve43756e2d2b_4
commons-text-api:1.15.0-210.v7480a_da_70b_9e
config-file-provider:1006.vc7366c201f57
coverage:2.3060.v035a_5557cdb_c
credentials:1480.v2246fd131e83
credentials-binding:702.vfe613e537e88
dark-theme:574.va_19f05d54df5
dashboard-view:2.543.vca_9da_3cb_9c60
data-tables-api:2.3.5-1497.v38449eb_7d5a_1
display-url-api:2.217.va_6b_de84cc74b_
docker-commons:457.v0f62a_94f11a_3
docker-workflow:634.vedc7242b_eda_7
durable-task:651.v1f5e074fc83f
ec2:2045.v06da_da_a_46422
echarts-api:6.0.0-1165.vd1283a_3e37d4
eddsa-api:0.3.0.1-27.v6ea_07b_e90d1a_
extended-read-permission:68.vd270568a_7520
external-monitor-job:223.vb_fddcf42c9b_3
favorite:2.253.v9b_413168133b_
flatpickr-api:4.6.13-18.vcf5f6a_5b_8468
font-awesome-api:7.1.0-882.v1dfb_771e3278
forensics-api:3.1832.va_1179842528b_
generic-webhook-trigger:2.4.1
git:5.8.1
git-client:6.5.0
git-server:137.ve0060b_432302
github:1.45.0
github-api:1.330-492.v3941a_032db_2a_
github-branch-source:1917.v9ee8a_39b_3d0d
global-build-stats:347.v32a_eb_0493c4f
greenballs:1.15.1
groovy:497.v7b_061a_a_de65d
gson-api:2.13.2-173.va_a_092315913c
h2-api:11.1.4.199-36.vb_ee07e965744
handlebars:3.0.8
handy-uri-templates-2-api:2.1.8-38.vcea_5d521d5f3
htmlpublisher:427
http_request:1.24
instance-identity:203.v15e81a_1b_7a_38
ionicons-api:94.vcc3065403257
jackson2-api:2.20.1-423.v13951f6b_6532
jacoco:3.3.7
jakarta-activation-api:2.1.4-1
jakarta-mail-api:2.1.5-1
jakarta-xml-bind-api:4.0.6-12.vb_1833c1231d3
javadoc:354.vee1a_660b_4990
javax-activation-api:1.2.0-8
javax-mail-api:1.6.2-11
jaxb:2.3.9-143.v5979df3304e6
jdk-tool:83.v417146707a_3d
jenkins-design-language:1.27.25
jersey2-api:2.47-165.ve7809a_3e87e0
jira:3.21
jjwt-api:0.11.5-120.v0268cf544b_89
job-dsl:1.93
joda-time-api:2.14.0-177.vd7e9347b_e7d5
jquery-detached:1.2.1
jquery3-api:3.7.1-619.vdb_10e002501a_
jsch:0.2.16-95.v3eecb_55fa_b_78
json-api:20251224-185.v0cc18490c62c
json-path-api:2.10.0-202.va_9cc16c1e476
jsoup:1.22.1-76.v9cdb_2456c0e3
junit:1369.v15da_00283f06
junit-attachments:330.v25180b_263160
ldap:793.v754d6b_41b_ea_4
lockable-resources:1438.v3c0f8c9e2060
mailer:525.v2458b_d8a_1a_71
matrix-auth:3.2.9
matrix-project:870.v9db_fcfc2f45b_
maven-plugin:3.27
mercurial:1323.ve69d2a_db_8a_b_d
metrics:4.2.37-494.v06f9a_939d33a_
mina-sshd-api-common:2.16.0-167.va_269f38cc024
mina-sshd-api-core:2.16.0-167.va_269f38cc024
mina-sshd-api-scp:2.16.0-167.va_269f38cc024
momentjs:1.1.1
node-iterator-api:72.vc90e81737df1
nodejs:1.6.6
okhttp-api:4.12.0-195.vc02552c04ffd
opsgenie:1.11
oss-symbols-api:424.ved751e062911
pam-auth:1.12
parameter-separator:322.vc4a_ff1cde55a_
parameterized-scheduler:374.v531b_4f4d99b_3
pipeline-build-step:571.v08a_fffd4b_0ce
pipeline-github:2.8-162.382498405fdc
pipeline-graph-analysis:245.v88f03631a_b_21
pipeline-graph-view:661.v6003f4542123
pipeline-groovy-lib:787.ve2fef0efdca_6
pipeline-input-step:540.v14b_100d754dd
pipeline-maven:1611.v6a_00c04177b_b_
pipeline-maven-api:1611.v6a_00c04177b_b_
pipeline-milestone-step:138.v78ca_76831a_43
pipeline-model-api:2.2277.v00573e73ddf1
pipeline-model-definition:2.2277.v00573e73ddf1
pipeline-model-extensions:2.2277.v00573e73ddf1
pipeline-rest-api:2.39
pipeline-stage-step:322.vecffa_99f371c
pipeline-stage-tags-metadata:2.2277.v00573e73ddf1
pipeline-stage-view:2.39
pipeline-utility-steps:2.20.0
plain-credentials:199.v9f8e1f741799
plugin-usage-plugin:4.10
plugin-util-api:6.1192.v30fe6e2837ff
popper-api:1.16.1-3
popper2-api:2.11.6-5
postbuildscript:3.4.1-695.vf6b_0b_8053979
prism-api:1.30.0-630.va_e19d17f83b_0
prometheus:847.v8440e5c21e7c
pubsub-light:1.19
rebuild:338.va_0a_b_50e29397
resource-disposer:0.25
role-strategy:848.va_a_ea_673cf0b_c
saml:4.595.vec7523b_5d543
scm-api:724.v7d839074eb_5c
script-security:1385.v7d2d9ec4d909
slack:795.v4b_9705b_e6d47
snakeyaml-api:2.5-143.v93b_c004f89de
sse-gateway:1.28
ssh-agent:386.v36cc0c7582f0
ssh-credentials:361.vb_f6760818e8c
ssh-slaves:3.1096.v0b_cc466e4323
ssh-steps:2.0.91.v76620e7082d0
sshd:3.374.v19b_d59ce6610
structs:362.va_b_695ef4fdf9
support-core:1795.v8935198c272d
theme-manager:327.v780d7096ec29
throttle-concurrents:624.vc427fa_e0e503
timestamper:1.30
token-macro:477.vd4f0dc3cb_cf1
trilead-api:2.284.v1974ea_324382
uno-choice:2.8.8
variant:70.va_d9f17f859e0
versioncolumn:400.v3c5c3004f31d
view-job-filters:406.va_0ec67147ee2
warnings-ng:12.9989.v3eb_01467fe59
workflow-aggregator:608.v67378e9d3db_1
workflow-api:1384.vdc05a_48f535f
workflow-basic-steps:1098.v808b_fd7f8cf4
workflow-cps:4250.v2eecc0881a_e6
workflow-durable-task-step:1464.v2d3f5c68f84c
workflow-job:1559.va_a_533730b_ea_d
workflow-multibranch:821.vc3b_4ea_780798
workflow-scm-step:466.va_d69e602552b_
workflow-step-api:710.v3e456cc85233
workflow-support:1010.vb_b_39488a_9841
ws-cleanup:0.49

What Operating System are you using (both controller, and any agents involved in the problem)?

Jenkins Controller :

  • Running in docker environment through AWS ECS task (m5a.4xlarge)
  • JVM version: 21.0.9+10-LTS
  • Remoting Version: 3327.v868139a_d00e0
  • OS: amzn-ami-2018.03.20201013-amazon-ecs-optimized

Jenkins Agents:

  • Running as EC2 instances (mulitple instance type such as t3.medium, t3a.medium, c6a.large, etc...)
  • JVM version : 21.0.9+11-LTS
  • Remoting Version : 3327.v868139a_d00e0
  • OS: al2023-default-x8664-20260122-124942

Reproduction steps

Jenkins with EC2 Plugin v2045.v06da_da_a_46422

EC2 cloud configured with an AMI template:

  • T2 unlimited enabled
  • Instance type : t3.medium
  • Spot instance: Enabled
  • Idle termination time
  • Number of executors: 2
  • Connect by SSH Process : enabled
  • Maximum Total Uses: 15
  • Avoid Using Orphaned nodes : disabled
  • Minimum number of instances: 10
  • Minimum number of spare instances: 10

When AWS has limited Spot instance capacity, multiple executors may become stuck in situations involving hundreds of agents.

Expected Results

The executor should not be blocked while requesting the deployment of a new Spot instance.

Actual Results

Symptomes

Agents affected by this issue will remain in this state. :
Image

Status:

  • The executor is busy from the UI.
  • The Maximum Total Uses is 1.

Yet:

  • The executor is not doing anything
  • The agent is not removed
  • The build queue increase heavily as more and more agents are in this state.
Image

Root Cause analysis

By navigating through the thread dump in the Jenkins controller, we discovered that a given thread is capable of locking dozens of agents/builds.

Thread dump analysis
Example 1 Image

Note:

Agents' executor (different agent templates) are blocked :

  • i-0e3b7196b736d5052
  • i-0e8343b988c77ab2c
  • i-0e8343b988c77ab2c
  • 89 other agents are also impacted, not included in the screen

These threads are blocked by an executor that is associated with another agent : i-02fe9f95d9948916d

Here is the log of the faulty executor:

"Executor #1 for  (i-02fe9f95d9948916d)" Id=19201173 Group=main RUNNABLE
   at java.base@21.0.9/sun.nio.ch.Net.poll(Native Method)
   at java.base@21.0.9/sun.nio.ch.NioSocketImpl.park(Unknown Source)
   at java.base@21.0.9/sun.nio.ch.NioSocketImpl.timedRead(Unknown Source)
   at java.base@21.0.9/sun.nio.ch.NioSocketImpl.implRead(Unknown Source)
   at java.base@21.0.9/sun.nio.ch.NioSocketImpl.read(Unknown Source)
   at java.base@21.0.9/sun.nio.ch.NioSocketImpl$1.read(Unknown Source)
   at java.base@21.0.9/java.net.Socket$SocketInputStream.read(Unknown Source)
   at java.base@21.0.9/sun.security.ssl.SSLSocketInputRecord.read(Unknown Source)
   at java.base@21.0.9/sun.security.ssl.SSLSocketInputRecord.readHeader(Unknown Source)
   at java.base@21.0.9/sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(Unknown Source)
   at java.base@21.0.9/sun.security.ssl.SSLSocketImpl.readApplicationRecord(Unknown Source)
   at java.base@21.0.9/sun.security.ssl.SSLSocketImpl$AppInputStream.read(Unknown Source)
   at PluginClassLoader for apache-httpcomponents-client-4-api//org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
   at PluginClassLoader for apache-httpcomponents-client-4-api//org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
   at PluginClassLoader for apache-httpcomponents-client-4-api//org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280)
   at PluginClassLoader for apache-httpcomponents-client-4-api//org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
   at PluginClassLoader for apache-httpcomponents-client-4-api//org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
   at PluginClassLoader for apache-httpcomponents-client-4-api//org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
   at PluginClassLoader for apache-httpcomponents-client-4-api//org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
   at PluginClassLoader for apache-httpcomponents-client-4-api//org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157)
   at PluginClassLoader for apache-httpcomponents-client-4-api//org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
   at PluginClassLoader for apache-httpcomponents-client-4-api//org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
   at PluginClassLoader for apache-httpcomponents-client-4-api//org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
   at PluginClassLoader for apache-httpcomponents-client-4-api//org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
   at PluginClassLoader for apache-httpcomponents-client-4-api//org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
   at PluginClassLoader for apache-httpcomponents-client-4-api//org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
   at PluginClassLoader for apache-httpcomponents-client-4-api//org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.http.apache.internal.impl.ApacheSdkHttpClient.execute(ApacheSdkHttpClient.java:72)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.http.apache.ApacheHttpClient.execute(ApacheHttpClient.java:261)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.http.apache.ApacheHttpClient.access$600(ApacheHttpClient.java:106)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.http.apache.ApacheHttpClient$1.call(ApacheHttpClient.java:238)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.http.apache.ApacheHttpClient$1.call(ApacheHttpClient.java:235)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.util.MetricUtils.measureDurationUnsafe(MetricUtils.java:103)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.stages.MakeHttpRequestStage.executeHttpRequest(MakeHttpRequestStage.java:88)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.stages.MakeHttpRequestStage.execute(MakeHttpRequestStage.java:64)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.stages.MakeHttpRequestStage.execute(MakeHttpRequestStage.java:46)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:74)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:43)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:79)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:41)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:55)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:39)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.executeRequest(RetryableStage.java:93)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:56)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:53)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:35)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:82)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:62)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:43)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:50)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:32)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:210)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:173)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:80)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler$$Lambda/0x0000000080ed78e0.get(Unknown Source)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:182)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:74)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:53)
   at PluginClassLoader for aws-java-sdk2-ec2//software.amazon.awssdk.services.ec2.DefaultEc2Client.describeSecurityGroups(DefaultEc2Client.java:22326)
   at PluginClassLoader for ec2//hudson.plugins.ec2.SlaveTemplate.getSecurityGroupsBy(SlaveTemplate.java:3056)
   at PluginClassLoader for ec2//hudson.plugins.ec2.SlaveTemplate.getEc2SecurityGroups(SlaveTemplate.java:3010)
   at PluginClassLoader for ec2//hudson.plugins.ec2.SlaveTemplate.makeRunInstancesRequestAndFilters(SlaveTemplate.java:2237)
   at PluginClassLoader for ec2//hudson.plugins.ec2.SlaveTemplate.makeRunInstancesRequestAndFilters(SlaveTemplate.java:2142)
   at PluginClassLoader for ec2//hudson.plugins.ec2.SlaveTemplate.provisionOndemand(SlaveTemplate.java:2359)
   at PluginClassLoader for ec2//hudson.plugins.ec2.SlaveTemplate.provisionSpot(SlaveTemplate.java:2651)
   at PluginClassLoader for ec2//hudson.plugins.ec2.SlaveTemplate.provision(SlaveTemplate.java:2098)
   at PluginClassLoader for ec2//hudson.plugins.ec2.EC2Cloud.getNewOrExistingAvailableSlave(EC2Cloud.java:1017)
   at PluginClassLoader for ec2//hudson.plugins.ec2.EC2Cloud.provision(EC2Cloud.java:1116)
   at PluginClassLoader for ec2//hudson.plugins.ec2.util.MinimumInstanceChecker.lambda$checkForMinimumInstances$5(MinimumInstanceChecker.java:150)
   at PluginClassLoader for ec2//hudson.plugins.ec2.util.MinimumInstanceChecker$$Lambda/0x0000000080d4b0f0.accept(Unknown Source)
   at java.base@21.0.9/java.util.ArrayList.forEach(Unknown Source)
   at java.base@21.0.9/java.util.Collections$UnmodifiableCollection.forEach(Unknown Source)
   at PluginClassLoader for ec2//hudson.plugins.ec2.util.MinimumInstanceChecker.lambda$checkForMinimumInstances$6(MinimumInstanceChecker.java:104)
   at PluginClassLoader for ec2//hudson.plugins.ec2.util.MinimumInstanceChecker$$Lambda/0x0000000080d4aec0.accept(Unknown Source)
   at java.base@21.0.9/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(Unknown Source)
   at java.base@21.0.9/java.util.stream.ReferencePipeline$3$1.accept(Unknown Source)
   at java.base@21.0.9/java.util.stream.ReferencePipeline$2$1.accept(Unknown Source)
   at java.base@21.0.9/java.util.Iterator.forEachRemaining(Unknown Source)
   at java.base@21.0.9/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Unknown Source)
   at java.base@21.0.9/java.util.stream.AbstractPipeline.copyInto(Unknown Source)
   at java.base@21.0.9/java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown Source)
   at java.base@21.0.9/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(Unknown Source)
   at java.base@21.0.9/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(Unknown Source)
   at java.base@21.0.9/java.util.stream.AbstractPipeline.evaluate(Unknown Source)
   at java.base@21.0.9/java.util.stream.ReferencePipeline.forEach(Unknown Source)
   at PluginClassLoader for ec2//hudson.plugins.ec2.util.MinimumInstanceChecker.checkForMinimumInstances(MinimumInstanceChecker.java:104)
   -  locked java.lang.Class@56449513
   at PluginClassLoader for ec2//hudson.plugins.ec2.EC2RetentionStrategy.taskAccepted(EC2RetentionStrategy.java:358)
   at hudson.slaves.SlaveComputer.taskAccepted(SlaveComputer.java:341)
   at hudson.model.queue.WorkUnitContext$1.onCriteriaMet(WorkUnitContext.java:92)
   at hudson.model.queue.Latch.synchronize(Latch.java:77)
   -  locked hudson.model.queue.WorkUnitContext$1@50644081
   at hudson.model.queue.WorkUnitContext.synchronizeStart(WorkUnitContext.java:132)
   at hudson.model.Executor.run(Executor.java:419)

   Number of locked synchronizers = 3
   - java.util.concurrent.locks.ReentrantLock$NonfairSync@4fdbf55
   - java.util.concurrent.locks.ReentrantLock$NonfairSync@12f0e14d
   - java.util.concurrent.locks.ReentrantLock$NonfairSync@65005683
Example 2 Image

Note:

Agents' executor (different agent templates) are blocked :

  • i-0d000044b0cfeb845
  • i-0e7f9fa8a2ff691fe
  • i-04030ae5261925783
  • 10 other agents are also impacted, not shown in the screen

These threads are blocked by an executor that is associated with another agent : i-09aa1253678fa3c6b

Here is the log of the faulty executor:

"Executor #1 for EC2 (i-09aa1253678fa3c6b)" Id=19250639 Group=main TIMED_WAITING
   at java.base@21.0.9/java.lang.Thread.sleep0(Native Method)
   at java.base@21.0.9/java.lang.Thread.sleep(Unknown Source)
   at java.base@21.0.9/java.util.concurrent.TimeUnit.sleep(Unknown Source)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:71)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:53)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:35)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:82)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:62)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:43)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:50)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:32)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:210)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:173)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:80)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler$$Lambda/0x0000000080ed78e0.get(Unknown Source)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:182)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:74)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)
   at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:53)
   at PluginClassLoader for aws-java-sdk2-ec2//software.amazon.awssdk.services.ec2.DefaultEc2Client.runInstances(DefaultEc2Client.java:39697)
   at PluginClassLoader for ec2//hudson.plugins.ec2.SlaveTemplate.provisionOndemand(SlaveTemplate.java:2405)
   at PluginClassLoader for ec2//hudson.plugins.ec2.SlaveTemplate.provisionSpot(SlaveTemplate.java:2651)
   at PluginClassLoader for ec2//hudson.plugins.ec2.SlaveTemplate.provision(SlaveTemplate.java:2098)
   at PluginClassLoader for ec2//hudson.plugins.ec2.EC2Cloud.getNewOrExistingAvailableSlave(EC2Cloud.java:1017)
   at PluginClassLoader for ec2//hudson.plugins.ec2.EC2Cloud.provision(EC2Cloud.java:1116)
   at PluginClassLoader for ec2//hudson.plugins.ec2.util.MinimumInstanceChecker.lambda$checkForMinimumInstances$5(MinimumInstanceChecker.java:150)
   at PluginClassLoader for ec2//hudson.plugins.ec2.util.MinimumInstanceChecker$$Lambda/0x0000000080d4b0f0.accept(Unknown Source)
   at java.base@21.0.9/java.util.ArrayList.forEach(Unknown Source)
   at java.base@21.0.9/java.util.Collections$UnmodifiableCollection.forEach(Unknown Source)
   at PluginClassLoader for ec2//hudson.plugins.ec2.util.MinimumInstanceChecker.lambda$checkForMinimumInstances$6(MinimumInstanceChecker.java:104)
   at PluginClassLoader for ec2//hudson.plugins.ec2.util.MinimumInstanceChecker$$Lambda/0x0000000080d4aec0.accept(Unknown Source)
   at java.base@21.0.9/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(Unknown Source)
   at java.base@21.0.9/java.util.stream.ReferencePipeline$3$1.accept(Unknown Source)
   at java.base@21.0.9/java.util.stream.ReferencePipeline$2$1.accept(Unknown Source)
   at java.base@21.0.9/java.util.Iterator.forEachRemaining(Unknown Source)
   at java.base@21.0.9/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Unknown Source)
   at java.base@21.0.9/java.util.stream.AbstractPipeline.copyInto(Unknown Source)
   at java.base@21.0.9/java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown Source)
   at java.base@21.0.9/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(Unknown Source)
   at java.base@21.0.9/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(Unknown Source)
   at java.base@21.0.9/java.util.stream.AbstractPipeline.evaluate(Unknown Source)
   at java.base@21.0.9/java.util.stream.ReferencePipeline.forEach(Unknown Source)
   at PluginClassLoader for ec2//hudson.plugins.ec2.util.MinimumInstanceChecker.checkForMinimumInstances(MinimumInstanceChecker.java:104)
   -  locked java.lang.Class@56449513
   at PluginClassLoader for ec2//hudson.plugins.ec2.EC2RetentionStrategy.taskAccepted(EC2RetentionStrategy.java:358)
   at hudson.slaves.SlaveComputer.taskAccepted(SlaveComputer.java:341)
   at hudson.model.queue.WorkUnitContext$1.onCriteriaMet(WorkUnitContext.java:92)
   at hudson.model.queue.Latch.synchronize(Latch.java:77)
   -  locked hudson.model.queue.WorkUnitContext$1@276767eb
   at hudson.model.queue.WorkUnitContext.synchronizeStart(WorkUnitContext.java:132)
   at hudson.model.Executor.run(Executor.java:419)

   Number of locked synchronizers = 1
   - java.util.concurrent.locks.ReentrantLock$NonfairSync@4fdbf55

We've observed this thread dump pattern during each incident period for which we had the associated symptoms.

Every examples led to the conclusion that the common point between these thread dump are the usage of MinimumInstanceChecker.checkForMinimumInstances method.

This (synchronized) function will; when having some templates with more than 0 defined for either MinimumNumberOfInstances or MinimumNumberOfSpareInstaces; potentially ask for the deployment of spot instances in AWS.

When AWS is struggling to provider such Spot instance, they will reply with the following error message :
Request attempt 1 failure: We currently do not have sufficient c7i.large capacity in the Availability Zone you requested (eu-west-1c). Our system will be working on provisioning additional capacity. You can currently get c7i.large capacity by not specifying an Availability Zone in your request or choosing eu-west-1a, eu-west-1b.

Note: It looks like the plugin is arbitrary choosing an AZ even if multiple AZ/subnets are defined within the template.

This spot instance requests will continue until we reach 16 retries (hard-coded value defined here). With the exponential back-off, it may take up to 230 seconds (almost 4 minutes) before finishing the retries.

Example 2
Also:   software.amazon.awssdk.core.exception.SdkClientException: Request attempt 1 failure: There is no Spot capacity available that matches your request. (Service: Ec2, Status Code: 500, Request ID: 0802d736-ff9f-48d5-bab8-cef6f4dd1683)
Also:   software.amazon.awssdk.core.exception.SdkClientException: Request attempt 2 failure: There is no Spot capacity available that matches your request. (Service: Ec2, Status Code: 500, Request ID: 7f386542-68e0-4033-9739-1ebeed56e1e0)
Also:   software.amazon.awssdk.core.exception.SdkClientException: Request attempt 3 failure: There is no Spot capacity available that matches your request. (Service: Ec2, Status Code: 500, Request ID: 943041cc-a05a-42a5-b049-7d511750e70b)
Also:   software.amazon.awssdk.core.exception.SdkClientException: Request attempt 4 failure: There is no Spot capacity available that matches your request. (Service: Ec2, Status Code: 500, Request ID: b625754c-ff42-4eec-b0c9-02adfd8392dc)
Also:   software.amazon.awssdk.core.exception.SdkClientException: Request attempt 5 failure: There is no Spot capacity available that matches your request. (Service: Ec2, Status Code: 500, Request ID: 33d25981-a506-4ffb-bfcf-9675de6f83c1)
Also:   software.amazon.awssdk.core.exception.SdkClientException: Request attempt 6 failure: There is no Spot capacity available that matches your request. (Service: Ec2, Status Code: 500, Request ID: 6400997c-6011-45f6-b990-8991d5527eac)
Also:   software.amazon.awssdk.core.exception.SdkClientException: Request attempt 7 failure: There is no Spot capacity available that matches your request. (Service: Ec2, Status Code: 500, Request ID: 31f8db93-5e4a-44aa-8b84-471b68402fb2)
Also:   software.amazon.awssdk.core.exception.SdkClientException: Request attempt 8 failure: There is no Spot capacity available that matches your request. (Service: Ec2, Status Code: 500, Request ID: 3e25fb44-9e06-45cb-abc6-a6ee6d452632)
Also:   software.amazon.awssdk.core.exception.SdkClientException: Request attempt 9 failure: There is no Spot capacity available that matches your request. (Service: Ec2, Status Code: 500, Request ID: 76f7be80-c3c3-40ab-853f-89bf264e81d5)
Also:   software.amazon.awssdk.core.exception.SdkClientException: Request attempt 10 failure: There is no Spot capacity available that matches your request. (Service: Ec2, Status Code: 500, Request ID: 21ffbbf5-3d7c-46cf-a93c-aabb00eaca35)
Also:   software.amazon.awssdk.core.exception.SdkClientException: Request attempt 11 failure: There is no Spot capacity available that matches your request. (Service: Ec2, Status Code: 500, Request ID: 052fd93b-5110-424e-9219-16da7f973d91)
Also:   software.amazon.awssdk.core.exception.SdkClientException: Request attempt 12 failure: There is no Spot capacity available that matches your request. (Service: Ec2, Status Code: 500, Request ID: 34c40ec5-95b3-4c9c-97a1-9eb879bfc290)
Also:   software.amazon.awssdk.core.exception.SdkClientException: Request attempt 13 failure: There is no Spot capacity available that matches your request. (Service: Ec2, Status Code: 500, Request ID: 787be975-c8ba-46b7-bedd-0285d2ac948e)
Also:   software.amazon.awssdk.core.exception.SdkClientException: Request attempt 14 failure: There is no Spot capacity available that matches your request. (Service: Ec2, Status Code: 500, Request ID: 1776874f-c74c-4af7-a273-ce34b6e569a3)
Also:   software.amazon.awssdk.core.exception.SdkClientException: Request attempt 15 failure: There is no Spot capacity available that matches your request. (Service: Ec2, Status Code: 500, Request ID: b3f85ab5-af19-4e9a-a7b4-7d5be89dfb28)
Also:   software.amazon.awssdk.core.exception.SdkClientException: Request attempt 16 failure: There is no Spot capacity available that matches your request. (Service: Ec2, Status Code: 500, Request ID: 85188cee-8f31-4552-81c8-37ed0283b68c)
software.amazon.awssdk.services.ec2.model.Ec2Exception: There is no Spot capacity available that matches your request. (Service: Ec2, Status Code: 500, Request ID: ddded224-d1da-4a2e-9c94-83678451a2d8) (SDK Attempt Count: 17)
	at PluginClassLoader for aws-java-sdk2-ec2//software.amazon.awssdk.services.ec2.model.Ec2Exception$BuilderImpl.build(Ec2Exception.java:113)
	at PluginClassLoader for aws-java-sdk2-ec2//software.amazon.awssdk.services.ec2.model.Ec2Exception$BuilderImpl.build(Ec2Exception.java:61)
	at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.retryPolicyDisallowedRetryException(RetryableStageHelper.java:168)
	at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:73)
	at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36)
	at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
	at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:53)
	at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:35)
	at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:82)
	at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:62)
	at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:43)
	at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:50)
	at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:32)
	at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
	at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
	at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
	at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
	at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:210)
	at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103)
	at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:173)
	at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:80)
	at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:182)
	at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:74)
	at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)
	at PluginClassLoader for aws-java-sdk2-core//software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:53)
	at PluginClassLoader for aws-java-sdk2-ec2//software.amazon.awssdk.services.ec2.DefaultEc2Client.runInstances(DefaultEc2Client.java:39697)
	at PluginClassLoader for ec2//hudson.plugins.ec2.SlaveTemplate.provisionOndemand(SlaveTemplate.java:2414)
	at PluginClassLoader for ec2//hudson.plugins.ec2.SlaveTemplate.provisionSpot(SlaveTemplate.java:2651)
	at PluginClassLoader for ec2//hudson.plugins.ec2.SlaveTemplate.provision(SlaveTemplate.java:2098)
	at PluginClassLoader for ec2//hudson.plugins.ec2.EC2Cloud.getNewOrExistingAvailableSlave(EC2Cloud.java:1017)
	at PluginClassLoader for ec2//hudson.plugins.ec2.EC2Cloud.provision(EC2Cloud.java:1116)
	at PluginClassLoader for ec2//hudson.plugins.ec2.util.MinimumInstanceChecker.lambda$checkForMinimumInstances$5(MinimumInstanceChecker.java:150)
	at java.base/java.util.ArrayList.forEach(Unknown Source)
	at java.base/java.util.Collections$UnmodifiableCollection.forEach(Unknown Source)
	at PluginClassLoader for ec2//hudson.plugins.ec2.util.MinimumInstanceChecker.lambda$checkForMinimumInstances$6(MinimumInstanceChecker.java:104)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(Unknown Source)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(Unknown Source)
	at java.base/java.util.stream.ReferencePipeline$2$1.accept(Unknown Source)
	at java.base/java.util.Iterator.forEachRemaining(Unknown Source)
	at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Unknown Source)
	at java.base/java.util.stream.AbstractPipeline.copyInto(Unknown Source)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown Source)
	at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(Unknown Source)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(Unknown Source)
	at java.base/java.util.stream.AbstractPipeline.evaluate(Unknown Source)
	at java.base/java.util.stream.ReferencePipeline.forEach(Unknown Source)
	at PluginClassLoader for ec2//hudson.plugins.ec2.util.MinimumInstanceChecker.checkForMinimumInstances(MinimumInstanceChecker.java:104)
	at PluginClassLoader for ec2//hudson.plugins.ec2.EC2RetentionStrategy.taskAccepted(EC2RetentionStrategy.java:358)
	at hudson.slaves.SlaveComputer.taskAccepted(SlaveComputer.java:341)
	at hudson.model.queue.WorkUnitContext$1.onCriteriaMet(WorkUnitContext.java:92)
	at hudson.model.queue.Latch.synchronize(Latch.java:77)
	at hudson.model.queue.WorkUnitContext.synchronizeStart(WorkUnitContext.java:132)
	at hudson.model.Executor.run(Executor.java:419)

During this period, as the checkForMinimumInstances method is synchronized, every impacted controller thread are completely locked and won't run their associated Builds.

Summary
  • One or multiple agent reach the Maximum Total Uses of 1
  • This state, combined with having agent template with minimum instances (or spare instance) above 0, will trigger the call to checkForMinimumInstances, while locking other Agent's executor thread
  • Upon heavy AWS EC2 usage, AWS won't anwser positively to the Spot request
  • The current configuration will retry 16 times (almost 4 minutes with the exponential backoff) before unlocking the thread

Anything else?

Full analysis have been provided in the previous section

Are you interested in contributing a fix?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions