Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All runners offline - failed to acquire jobs #3689

Open
4 tasks done
WTPOptAxe opened this issue Jul 31, 2024 · 1 comment
Open
4 tasks done

All runners offline - failed to acquire jobs #3689

WTPOptAxe opened this issue Jul 31, 2024 · 1 comment
Labels
bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers

Comments

@WTPOptAxe
Copy link

Checks

Controller Version

0.9.2

Deployment Method

Helm

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

1. Allow controller to run as normal
2. Observe listener pod constantly crashing
3. Observe that runner-set in GitHub UI is online, but all runners are offline

The timing of this is around the update of ghcr.io/actions/actions-runner from 2.317.0 to 2.318.0, but rolling back to the image built from 2.317.0 has not resolved it.

Describe the bug

Up until yesterday this was working correctly. As of this morning, arc-runner-set-xxxx-listener fails with the following exception:

2024/07/31 08:15:48 Application returned an error: failed to handle message: failed to acquire jobs: failed to acquire jobs: Post "https://pipelinesghubeus7.actions.githubusercontent.com/WugTYvPOjBYXoVTZqLtkHQNo8dP79zLHH79vzLjE9k8ir38pq6//_apis/runtime/runnerscalesets/10/acquirejobs?api-version=6.0-preview": POST https://pipelinesghubeus7.actions.githubusercontent.com/WugTYvPOjBYXoVTZqLtkHQNo8dP79zLHH79vzLjE9k8ir38pq6//_apis/runtime/runnerscalesets/10/acquirejobs?api-version=6.0-preview giving up after 5 attempt(s)

The runner set remains online in github, but all runners are offline.

Screenshot 2024-07-31 at 09 19 15

Describe the expected behavior

arc-runner-set-XXXX-listener should run without crashing, and runners should be online in github at https://github.com/organizations/OptAxe/settings/actions/runners

Additional Context

No values changed in controller-deployment

runner-deployment has the following values and template:

values:
    githubConfigUrl: https://github.com/OurOrg
    githubConfigSecret: github-pat
    runnerGroup: arc-self-hosted-runners
    minRunners: 1
    maxRunners: 2
    # Template needs to be set to use latest docker:dind with iptables legacy
    # See https://github.com/actions/actions-runner-controller/issues/3159#issuecomment-1906905610
    template:
      spec:
        nodeSelector:
          node_pool: github-runners
        initContainers:
        - name: init-dind-externals
          image: image-from-ghcr.io/actions/actions-runner:2.317.0
          command: ["cp", "-r", "-v", "/home/runner/externals/.", "/home/runner/tmpDir/"]
          volumeMounts:
          - name: dind-externals
            mountPath: /home/runner/tmpDir
        containers:
        - name: runner
          image: image-from-ghcr.io/actions/actions-runner:2.317.0
          command: ["/home/runner/run.sh"]
          env:
          - name: DOCKER_HOST
            value: unix:///var/run/docker.sock
          resources:
            requests:
              memory: 5Gi
          volumeMounts:
          - name: work
            mountPath: /home/runner/_work
          - name: dind-sock
            mountPath: /var/run
        - name: dind
          image: docker:dind
          args:
          - dockerd
          - --host=unix:///var/run/docker.sock
          - --group=$(DOCKER_GROUP_GID)
          env:
          - name: DOCKER_GROUP_GID
            value: "123"
          - name: DOCKER_IPTABLES_LEGACY
            value: '1'
          securityContext:
            privileged: true
          volumeMounts:
          - name: work
            mountPath: /home/runner/_work
          - name: dind-sock
            mountPath: /var/run
          - name: dind-externals
            mountPath: /home/runner/externals
        volumes:
        - name: work
          emptyDir: {}
        - name: dind-sock
          emptyDir: {}
        - name: dind-externals
          emptyDir: {}


### Controller Logs

```shell
https://gist.github.com/WTPOptAxe/f57e05eeb0989a968f3b30ab584baada

Runner Pod Logs

https://gist.github.com/WTPOptAxe/11be8a39ca690877e878cf539327561f
@WTPOptAxe WTPOptAxe added bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers labels Jul 31, 2024
Copy link
Contributor

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers
Projects
None yet
Development

No branches or pull requests

1 participant