Skip to content

Commit f810159

Browse files
committed
[JENKINS-76200] Exclude STOPPED instances from capacity calculation
The NoDelayProvisionerStrategy was counting offline STOPPED EC2 instances as "available capacity", preventing provisioning from being triggered when jobs were queued. This caused STOPPED instances to remain stopped forever, with jobs waiting indefinitely. Root cause: - countProvisionedButNotExecutingNodes() counted ALL offline nodes - STOPPED instances were included in available capacity - When capacity >= demand, provisioning was skipped - provisionOndemand() was never called to start the stopped instances Fix: - Check AWS instance state for offline nodes - Exclude STOPPED/STOPPING instances from capacity count - Only count instances that will come online (PENDING/RUNNING) - Fail-safe: if state check fails, count the instance to avoid over-provisioning This preserves the fixes from: - JENKINS-76151: EC2RetentionStrategy still only reconnects RUNNING instances - JENKINS-76171: Offline PENDING/RUNNING instances still counted to prevent over-provisioning Testing: 1. Stop an EC2 instance (via AWS or Jenkins stopOnTerminate) 2. Queue a job requiring that label 3. Verify provisioning is triggered and instance starts in AWS 4. Check logs for "Excluding STOPPED instance {id} from available capacity"
1 parent e2ec8ee commit f810159

File tree

1 file changed

+28
-3
lines changed

1 file changed

+28
-3
lines changed

src/main/java/hudson/plugins/ec2/NoDelayProvisionerStrategy.java

Lines changed: 28 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -90,15 +90,19 @@ public NodeProvisioner.StrategyDecision apply(NodeProvisioner.StrategyState stra
9090
* Counts executors in EC2 nodes that have been provisioned (exist in Jenkins) but are NOT yet counted in the
9191
* LoadStatistics snapshot. This specifically targets the gap where nodes exist but are:
9292
* - Offline (just added to Jenkins, before connecting starts)
93+
* - Instance is PENDING or RUNNING in AWS (will come online soon)
9394
*
9495
* We explicitly DO NOT count:
9596
* - Connecting nodes (already in snapshot.getConnectingExecutors())
9697
* - Online nodes (already in snapshot.getAvailableExecutors() or busy executors)
98+
* - STOPPED instances (won't come online without explicit start action)
9799
*
98100
* This prevents over-provisioning by accounting for nodes in the critical gap between:
99101
* 1) Node added to Jenkins (after PlannedNode future completes)
100102
* 2) Node starts connecting (shows up in snapshot.getConnectingExecutor())
101103
*
104+
* JENKINS-76200: Exclude STOPPED instances - they won't come online on their own.
105+
*
102106
* @param label the label to match, or null for unlabeled nodes
103107
* @return the number of executors from provisioned EC2 nodes in the offline->connecting gap
104108
*/
@@ -113,6 +117,7 @@ int countProvisionedButNotExecutingNodes(Label label) {
113117
int offlineNodes = 0;
114118
int connectingNodes = 0;
115119
int onlineNodes = 0;
120+
int stoppedNodes = 0;
116121

117122
for (Node node : nodes) {
118123
// Only count EC2 nodes
@@ -136,16 +141,36 @@ int countProvisionedButNotExecutingNodes(Label label) {
136141
}
137142

138143
// Only count nodes that are OFFLINE (not connecting, not online)
139-
// These are in the gap between being added to Jenkins and starting to connect
144+
// and not STOPPED in AWS (won't come online without explicit start)
140145
if (computer.isOffline() && !computer.isConnecting()) {
146+
// JENKINS-76200: Check if instance is STOPPED in AWS
147+
if (computer instanceof EC2Computer ec2Computer) {
148+
try {
149+
InstanceState state = ec2Computer.getState();
150+
if (state == InstanceState.STOPPED || state == InstanceState.STOPPING) {
151+
stoppedNodes++;
152+
LOGGER.log(
153+
Level.FINE,
154+
"Excluding STOPPED instance {0} from available capacity",
155+
ec2Computer.getInstanceId());
156+
continue; // Don't count stopped instances
157+
}
158+
} catch (Exception e) {
159+
LOGGER.log(
160+
Level.FINE,
161+
"Could not get state for " + ec2Computer.getName() + ", counting as available",
162+
e);
163+
// If we can't determine state, count it to avoid over-provisioning
164+
}
165+
}
141166
count += node.getNumExecutors();
142167
}
143168
}
144169

145170
LOGGER.log(
146171
Level.FINER,
147-
"EC2 nodes for label {0}: total={1}, offline={2}, connecting={3}, online={4}",
148-
new Object[] {label, totalEC2Nodes, offlineNodes, connectingNodes, onlineNodes});
172+
"EC2 nodes for label {0}: total={1}, offline={2}, connecting={3}, online={4}, stopped={5}",
173+
new Object[] {label, totalEC2Nodes, offlineNodes, connectingNodes, onlineNodes, stoppedNodes});
149174

150175
return count;
151176
}

0 commit comments

Comments
 (0)