Skip to content

Conversation

@jgarciacloudbees
Copy link
Contributor

@jgarciacloudbees jgarciacloudbees commented Oct 15, 2025

JENKINS-76200 - Agents never start after upgrade to 2039.v829e6272107c

It seems some code in #1151 and #1149 where not working well together, and it was leading to stopped instances not restarted, as the system considers it should be already started (done in #1142).

Changed the implementation to not consider stopped instances as available, as they need to be provisioned (Started).

Testing done

Manually tested #1151 and #1149.

  • Stopped nodes with the cloud definig a cap limit (so working as a pool) are properly provisioned, started, stopped on idle timeout and restarted properly on demand.
  • Tested the overprovisioning based on the steps in JENKINS-76171 (with an automated script launching 100 jobs), and this is not provisioning more EC2 instances than the required ones

Submitter checklist

  • Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
  • Ensure that the pull request title represents the desired changelog entry
  • Please describe what you did
  • Link to relevant issues in GitHub or Jira
  • Link to relevant pull requests, esp. upstream and downstream changes
  • Ensure you have provided tests that demonstrate the feature works or the issue is fixed

@jgarciacloudbees jgarciacloudbees marked this pull request as ready for review October 15, 2025 14:30
Copy link
Contributor

@mikecirioli mikecirioli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested this locally as well and it seems to handle all the corner cases i can think of.

Copy link

@tuky tuky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this PR mostly removes code I think it is worth mentioning that @mikecirioli was the author of that code and approves its removal. THX for the review and I'd say LGTM. @jglick you had been very much involved in that (#1149) as well. you might want to add in here?

@jglick
Copy link
Member

jglick commented Oct 23, 2025

No opinion.

@jgarciacloudbees
Copy link
Contributor Author

Hi @res0nance, this PR has been waiting for 2 weeks and I hope it will fix the issues we are facing when restarting stopped nodes, just in case you can review it.
Sorry for pinging you directly, but as you are the main plugin maintainer, probably you will have merge permission in this repo.

Thanks in advance.

@res0nance
Copy link
Contributor

Hi @res0nance, this PR has been waiting for 2 weeks and I hope it will fix the issues we are facing when restarting stopped nodes, just in case you can review it. Sorry for pinging you directly, but as you are the main plugin maintainer, probably you will have merge permission in this repo.

Thanks in advance.

Apologies just saw this, is it possible to add tests here? I see you've just removed every test.

@jgarciacloudbees
Copy link
Contributor Author

Hi @res0nance.

Yes, I removed the tests as the main change on hudson.plugins.ec2.NoDelayProvisionerStrategy was to delete the method countProvisionedButNotExecutingNodes, and all the tests on the file src/test/java/hudson/plugins/ec2/NoDelayProvisionerStrategyTest.java were specific to this deleted countProvisionedButNotExecutingNodes.

So without that method, the tests have no sense.

@res0nance res0nance merged commit 89d6f19 into jenkinsci:master Oct 30, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants