Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed EphemeralRunners block launching new pods #3685

Open
4 tasks done
igaskin opened this issue Jul 26, 2024 · 3 comments
Open
4 tasks done

Failed EphemeralRunners block launching new pods #3685

igaskin opened this issue Jul 26, 2024 · 3 comments
Labels
bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers

Comments

@igaskin
Copy link

igaskin commented Jul 26, 2024

Checks

Controller Version

0.8.3

Deployment Method

Helm

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

1. Trigger a `FailedScheduling` event.
2. Wait for 5 failures in pod scheduling.
3. Recover the cluster.
4. New ephemeral runner pods will not be scheduled to meet capacity.

Describe the bug

When EphemeralRunners are in Failed state they get stuck in that state, which prevents other pods from being launched. This issue has been previously noted in these discussions.

status:
  currentRunners: 17
  failedEphemeralRunners: 16
  pendingEphemeralRunners: 0
  runningEphemeralRunners: 1 

#3300
#3610

Describe the expected behavior

Failed Ephemeral runners will be cleared, so scheduling can be retired.

Additional Context

https://github.com/actions/actions-runner-controller/discussions/3610
https://github.com/actions/actions-runner-controller/discussions/3300

Controller Logs

2024-06-20T19:18:03Z	INFO	listener-app.worker.kubernetesworker	Ephemeral runner set scaled.	{"namespace": "my-scaleset-ns", "name": "my-runner-6pzbd", "replicas": 3}
2024-06-20T19:18:03Z	INFO	listener-app.listener	Getting next message	{"lastMessageID": 11}
2024-06-20T19:18:11Z	INFO	listener-app.listener	Getting next message	{"lastMessageID": 14}
2024-06-20T19:18:53Z	INFO	listener-app.listener	Getting next message	{"lastMessageID": 11}
2024-06-20T19:19:01Z	INFO	listener-app.listener	Getting next message	{"lastMessageID": 14}

Runner Pod Logs

2024-06-21T16:22:44Z	INFO	listener-app.worker.kubernetesworker	Ephemeral runner set scaled.	{"namespace": "my-scaleset", "name": "my-runner-rpvp2", "replicas": 10}
@igaskin igaskin added bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers labels Jul 26, 2024
Copy link
Contributor

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

@singlewind
Copy link

singlewind commented Jul 30, 2024

This happened on me recently as well when I upgrade to 0.9.3 with github application. My situation is all the ephemeral runners were stuck in state of terminating status.

@shanesavoie
Copy link

I've also noticed that a failedEphemeralRunners will consume a min-runner slot. We have to schedule cleaning up these resources otherwise we will experience queue time degradation due to insufficient min-runners.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers
Projects
None yet
Development

No branches or pull requests

3 participants