Open
Description
Checks
- I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
- I am using charts that are officially provided
Controller Version
0.9.3
Deployment Method
Helm
Checks
- This isn't a question or user support case (For Q&A and community support, go to Discussions).
- I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
To Reproduce
The issue occurs randomly, so a specific reproduction method has not been identified.
Describe the bug
A GitHub Actions job is failing with an error:
KeyboardInterrupt
make: *** [Makefile:24: activate-venv-nexus] Interrupt
Error: Process completed with exit code 130.
A corresponding Pod's CPU and RAM usage is below its Kubernetes Limits, and no OOM Killer was executed.
A Persistent Volume also isn't full.
When checking a Kubernetes Pod's logs, found the following error:
[RUNNER 2024-12-18 12:39:52Z ERR GitHubActionsService] POST request to https://pipelinesghubeus6.actions.githubusercontent.com/***/_apis/oauth2/token failed. HTTP Status: BadRequest
And in the github-controller
namespace:
EphemeralRunner Runner does not exist in GitHub service {"version": "0.9.3", "ephemeralrunner": {"name":"kraken-eks-runners-hjn6j-runner-cv6qd","namespace":"ops-github-runners-ns"}, "runnerId": 4103}
2024-12-18 14:39:53.532
EphemeralRunner Checking if runner exists in GitHub service {"version": "0.9.3", "ephemeralrunner": {"name":"kraken-eks-runners-hjn6j-runner-cv6qd","namespace":"ops-github-runners-ns"}, "runnerId": 4103}
Re-running a Job usually helps, but sometimes may need to be restarted a few times.
Describe the expected behavior
The job completes without errors.
Additional Context
containerMode:
type: "dind"
template:
spec:
initContainers:
- name: kube-init
image: ghcr.io/actions/actions-runner:latest
command: ["sudo", "chown", "-R", "1001:123", "/home/runner/_work"]
volumeMounts:
- name: work
mountPath: /home/runner/_work
containers:
- name: dind
image: 492***148.dkr.ecr.us-east-1.amazonaws.com/github-runners/docker-dind:latest
args:
- dockerd
- --host=unix:///var/run/docker.sock
- --group=$(DOCKER_GROUP_GID)
env:
- name: DOCKER_GROUP_GID
value: "123"
- name: runner
image: 492***148.dkr.ecr.us-east-1.amazonaws.com/github-runners/kraken:0.14
command: ["/home/runner/run.sh"]
env:
- name: RUNNER_EKS
value: "true"
securityContext:
capabilities:
add: ["SYS_PTRACE"]
allowPrivilegeEscalation: true
resources:
requests:
cpu: 2
memory: 4Gi
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app.kubernetes.io/name: kraken-eks-runners
volumes:
- name: work
ephemeral:
volumeClaimTemplate:
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "gp3-iops"
resources:
requests:
storage: 40Gi
The 492***148.dkr.ecr.us-east-1.amazonaws.com/github-runners/kraken:0.14
Docker image is built from the latest Runners:
FROM ghcr.io/actions/actions-runner:2.321.0
### Controller Logs
```shell
https://gist.github.com/arseny-zinchenko/5aacf7174840ba3d4e63287f749fcb4e
Runner Pod Logs
https://gist.github.com/arseny-zinchenko/06e6e99f3b0884d60370e3d67d78af85