Min Runners Scaling comparison delay cause additional runner to be created and terminated unnecessarily #3866
Open
Description
Checks
- I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
- I am using charts that are officially provided
Controller Version
0.9.3 , 0.10.1
Deployment Method
Helm
Checks
- This isn't a question or user support case (For Q&A and community support, go to Discussions).
- I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
To Reproduce
1. Set minRunners to 1 (Now we have one runner registered - Runner A)
2. Trigger two jobs simultaneously
3. Now we have two more runners registered ( Runners B & C online - Total 3)
4. The Runners A and B received the jobs and C is waiting for the job maintaining min runner as 1
5. A complete the job and terminates, now we have new runner D spinning up.
6. Now B and C also gets terminated, and D remains..
Here C is unnecessarily terminated because of the latency in the scaling calculation. it was not executing any job, it was on stand by as per minRunners setup.
Describe the bug
when we set up minRunners, the additional runner which is in standby also gets terminated without even executing any job due to delay in scaling comparison of desired runners
Describe the expected behavior
In this case, the Runner C should remain active and Runner D should not have get created.
Additional Context
minRunners:1
maxRunners:150
Controller Logs
START OF THE 2 JOBS:
EphemeralRunnerSet Scaling comparison {"version": "0.10.1", "ephemeralrunnerset": {"name":"sh-ubuntu-latest-np-l","namespace":"arc-runners"}, "current": 3, "desired": 3} │
AFTER ONE JOB COMPLETED, RUNNEr A TERMINATED (note , desired still shows 3)
EphemeralRunnerSet Scaling comparison {"version": "0.10.1", "ephemeralrunnerset": {"name":"sh-ubuntu-latest-np-l","namespace":"arc-runners"}, "current": 2, "desired": 3} │
NOW RUNNER D GET CREATED(UNNECESSARY RUNNER),
EphemeralRunnerSet Scaling comparison {"version": "0.10.1", "ephemeralrunnerset": {"name":"sh-ubuntu-latest-np-l","namespace":"arc-runners"}, "current": 3, "desired": 2} │
NOW ANOTHER JOB COMPLETED, ONLY RUNNER C and D ARE ACTIVE, STILL DELAY IN UPDATING DESIRED
EphemeralRunnerSet Scaling comparison {"version": "0.10.1", "ephemeralrunnerset": {"name":"sh-ubuntu-latest-np-l","namespace":"arc-runners"}, "current": 2, "desired": 2} │
NOW WE HAVE RUNNER C and D BOTH ARE WAITING FOR NEW JOB
EphemeralRunnerSet Scaling comparison {"version": "0.10.1", "ephemeralrunnerset": {"name":"sh-ubuntu-latest-np-l","namespace":"arc-runners"}, "current": 2, "desired": 1} │
NOW RUNNER C GOT DELETED
EphemeralRunnerSet Scaling comparison {"version": "0.10.1", "ephemeralrunnerset": {"name":"sh-ubuntu-latest-np-l","namespace":"arc-runners"}, "current": 1, "desired": 1} │
Runner Pod Logs
LOGS OF RUNNER C waiting for job and gets terminated due to scaling comparison
[RUNNER 2025-01-02 16:42:01Z INFO Terminal] WRITE LINE:
√ Connected to GitHub
[RUNNER 2025-01-02 16:42:01Z INFO Terminal] WRITE LINE:
[RUNNER 2025-01-02 16:42:01Z INFO RSAFileKeyManager] Loading RSA key parameters from file /home/runner/.credentials_rsaparams
[RUNNER 2025-01-02 16:42:01Z INFO RSAFileKeyManager] Loading RSA key parameters from file /home/runner/.credentials_rsaparams
[RUNNER 2025-01-02 16:42:01Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown
[RUNNER 2025-01-02 16:42:02Z INFO MessageListener] Session created.
[RUNNER 2025-01-02 16:42:02Z INFO Terminal] WRITE LINE: Current runner version: '2.321.0'
Current runner version: '2.321.0'
[RUNNER 2025-01-02 16:42:02Z INFO Terminal] WRITE LINE: 2025-01-02 16:42:02Z: Listening for Jobs
2025-01-02 16:42:02Z: Listening for Jobs
[RUNNER 2025-01-02 16:42:02Z INFO JobDispatcher] Set runner/worker IPC timeout to 30 seconds.
[RUNNER 2025-01-02 16:42:02Z INFO RSAFileKeyManager] Loading RSA key parameters from file /home/runner/.credentials_rsaparams
[RUNNER 2025-01-02 16:42:02Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown
time="2025-01-02T16:53:24.543067339Z" level=info msg="Processing signal 'terminated'"
time="2025-01-02T16:53:24.544010095Z" level=info msg="stopping event stream following graceful shutdown" error="<nil>" module=libcontainerd namespace=moby
time="2025-01-02T16:53:24.544396972Z" level=info msg="Daemon shutdown complete"
time="2025-01-02T16:53:24.544453030Z" level=info msg="stopping healthcheck following graceful shutdown" module=libcontainerd
time="2025-01-02T16:53:24.544463570Z" level=info msg="stopping event stream following graceful shutdown" error="context canceled" module=libcontainerd namespace=plugins.moby
Exiting runner...
[RUNNER 2025-01-02 16:53:24Z INFO Terminal] WRITE LINE: Exiting...
Exiting...
[RUNNER 2025-01-02 16:53:24Z INFO ConfigurationStore] IsServiceConfigured()
[RUNNER 2025-01-02 16:53:24Z INFO ConfigurationStore] IsServiceConfigured: False
[RUNNER 2025-01-02 16:53:24Z INFO HostContext] Runner will be shutdown for UserCancelled
[RUNNER 2025-01-02 16:53:24Z WARN GitHubActionsService] GET request to https://broker.actions.githubusercontent.com/message?sessionId=ee4b09ac-0196-4a07-b068-f631314baf4b&status=Online&runnerVersion=2.321.0&os=Linux&architecture=X64&disableUpdate=true has been cancelled.
[RUNNER 2025-01-02 16:53:24Z ERR BrokerServer] Catch exception during request
[RUNNER 2025-01-02 16:53:24Z ERR BrokerServer] System.Threading.Tasks.TaskCanceledException: The operation was canceled.
[RUNNER 2025-01-02 16:53:24Z ERR BrokerServer] ---> System.Threading.Tasks.TaskCanceledException: The operation was canceled.
[RUNNER 2025-01-02 16:53:24Z ERR BrokerServer] ---> System.IO.IOException: Unable to read data from the transport connection: Operation canceled.
[RUNNER 2025-01-02 16:53:24Z ERR BrokerServer] ---> System.Net.Sockets.SocketException (125): Operation canceled
[RUNNER 2025-01-02 16:53:24Z ERR BrokerServer] --- End of inner exception stack trace ---
[RUNNER 2025-01-02 16:53:24Z ERR BrokerServer] at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
[RUNNER 2025-01-02 16:53:24Z ERR BrokerServer] at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource<System.Int32>.GetResult(Int16 token)
[RUNNER 2025-01-02 16:53:24Z ERR BrokerServer] at System.Net.Security.SslStream.EnsureFullTlsFrameAsync[TIOAdapter](CancellationToken cancellationToken, Int32 estimatedSize)
[RUNNER 2025-01-02 16:53:24Z ERR BrokerServer] at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16 token)
[RUNNER 2025-01-02 16:53:24Z ERR BrokerServer] at System.Net.Security.SslStream.ReadAsyncInternal[TIOAdapter](Memory`1 buffer, CancellationToken cancellationToken)
[RUNNER 2025-01-02 16:53:24Z ERR BrokerServer] at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16 token)
[RUNNER 2025-01-02 16:53:24Z ERR BrokerServer] at System.Net.Http.HttpConnection.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)