Skip to content

Conversation

@mikecirioli
Copy link
Contributor

No description provided.

Previous commit started stopped instances whenever check() ran, even when
no jobs were waiting. This caused unnecessary instance starts.

Added check to only start stopped instance if itemsInQueueForThisSlave()
returns true, meaning there are actually jobs waiting for this specific node.

Changes:
- Check for queued jobs before calling startInstances()
- Log whether jobs are queued: "jobs in queue: true/false"
- Skip starting if no jobs: "No jobs waiting - leaving it stopped"
- Only start if jobs waiting: "Jobs are waiting - attempting to start"

This ensures stopped instances remain stopped until actually needed for work.
Changed from checking only explicit node assignment (selfLabel) to using
Label.contains() which properly checks if a node can execute jobs based
on label matching. This fixes the issue where stopped instances would
only start for jobs explicitly tied to the node name, not for jobs that
match the node's labels.

Changes:
- Use assignedLabel.contains(selfNode) instead of assignedLabel == selfLabel
- Handle null assignedLabel (jobs that can run on any node)
- Added comment explaining the label matching logic

Now stopped instances will start for:
- Jobs with no label requirement (assignedLabel == null)
- Jobs whose labels match this node's capabilities (assignedLabel.contains(selfNode))

Before this fix, stopped instances only started for jobs explicitly tied to
the specific node name.
The NoDelayProvisionerStrategy was counting offline STOPPED EC2 instances
as "available capacity", preventing provisioning from being triggered when
jobs were queued. This caused STOPPED instances to remain stopped forever,
with jobs waiting indefinitely.

Root cause:
- countProvisionedButNotExecutingNodes() counted ALL offline nodes
- STOPPED instances were included in available capacity
- When capacity >= demand, provisioning was skipped
- provisionOndemand() was never called to start the stopped instances

Fix:
- Check AWS instance state for offline nodes
- Exclude STOPPED/STOPPING instances from capacity count
- Only count instances that will come online (PENDING/RUNNING)
- Fail-safe: if state check fails, count the instance to avoid over-provisioning

This preserves the fixes from:
- JENKINS-76151: EC2RetentionStrategy still only reconnects RUNNING instances
- JENKINS-76171: Offline PENDING/RUNNING instances still counted to prevent over-provisioning

Testing:
1. Stop an EC2 instance (via AWS or Jenkins stopOnTerminate)
2. Queue a job requiring that label
3. Verify provisioning is triggered and instance starts in AWS
4. Check logs for "Excluding STOPPED instance {id} from available capacity"
@mikecirioli mikecirioli deleted the JENKINS-76200 branch October 15, 2025 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant