possible fix #1154

mikecirioli · 2025-10-15T12:10:52Z

No description provided.

Previous commit started stopped instances whenever check() ran, even when no jobs were waiting. This caused unnecessary instance starts. Added check to only start stopped instance if itemsInQueueForThisSlave() returns true, meaning there are actually jobs waiting for this specific node. Changes: - Check for queued jobs before calling startInstances() - Log whether jobs are queued: "jobs in queue: true/false" - Skip starting if no jobs: "No jobs waiting - leaving it stopped" - Only start if jobs waiting: "Jobs are waiting - attempting to start" This ensures stopped instances remain stopped until actually needed for work.

Changed from checking only explicit node assignment (selfLabel) to using Label.contains() which properly checks if a node can execute jobs based on label matching. This fixes the issue where stopped instances would only start for jobs explicitly tied to the node name, not for jobs that match the node's labels. Changes: - Use assignedLabel.contains(selfNode) instead of assignedLabel == selfLabel - Handle null assignedLabel (jobs that can run on any node) - Added comment explaining the label matching logic Now stopped instances will start for: - Jobs with no label requirement (assignedLabel == null) - Jobs whose labels match this node's capabilities (assignedLabel.contains(selfNode)) Before this fix, stopped instances only started for jobs explicitly tied to the specific node name.

The NoDelayProvisionerStrategy was counting offline STOPPED EC2 instances as "available capacity", preventing provisioning from being triggered when jobs were queued. This caused STOPPED instances to remain stopped forever, with jobs waiting indefinitely. Root cause: - countProvisionedButNotExecutingNodes() counted ALL offline nodes - STOPPED instances were included in available capacity - When capacity >= demand, provisioning was skipped - provisionOndemand() was never called to start the stopped instances Fix: - Check AWS instance state for offline nodes - Exclude STOPPED/STOPPING instances from capacity count - Only count instances that will come online (PENDING/RUNNING) - Fail-safe: if state check fails, count the instance to avoid over-provisioning This preserves the fixes from: - JENKINS-76151: EC2RetentionStrategy still only reconnects RUNNING instances - JENKINS-76171: Offline PENDING/RUNNING instances still counted to prevent over-provisioning Testing: 1. Stop an EC2 instance (via AWS or Jenkins stopOnTerminate) 2. Queue a job requiring that label 3. Verify provisioning is triggered and instance starts in AWS 4. Check logs for "Excluding STOPPED instance {id} from available capacity"

mikecirioli added 6 commits October 14, 2025 11:58

possible fix

ad5e732

debug logging

aeb4953

revert

b6d01c8

mikecirioli closed this Oct 15, 2025

mikecirioli deleted the JENKINS-76200 branch October 15, 2025 14:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

possible fix #1154

possible fix #1154

Uh oh!

mikecirioli commented Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

possible fix #1154

possible fix #1154

Uh oh!

Conversation

mikecirioli commented Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant