Open
Description
Checks
- I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
- I am using charts that are officially provided
Controller Version
0.9.2
Deployment Method
Helm
Checks
- This isn't a question or user support case (For Q&A and community support, go to Discussions).
- I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
To Reproduce
Install ARC Controller + Runner set 0.9.2
define ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE with the podTemplate, and containerMode: "Kubernetes"
define a pod template like this
apiVersion: v1
data:
default.yml: |
"apiVersion": "v1"
"kind": "PodTemplate"
"metadata":
"name": "runner-pod-template"
"spec":
"containers":
- "name": "$job"
"resources":
"limits":
"cpu": "3000m"
"requests":
"cpu": "3000m"
Describe the bug
GHA jobs fail instantly if a pod is unscheduable due to waiting for node to become available (if the resource request for CPU/Memory is high, waiting for the node autoscaler)
Describe the expected behavior
There should be a timeout field either in the runner set or container hooks podtemplate that allows the workflow pod to wait for x minutes till the pod is scheduled after another node is alive.
Additional Context
template:
spec:
initContainers:
- name: kube-init
image: ghcr.io/actions/actions-runner:latest
command: ["/bin/sh", "-c"]
args:
- |
sudo chown -R 1001:123 /home/runner/_work
volumeMounts:
- name: work
mountPath: /home/runner/_work
securityContext:
fsGroup: 123 ## needed to resolve permission issues with mounted volume. https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors#error-access-to-the-path-homerunner_work_tool-is-denied
containers:
- name: runner
image: ghcr.io/actions/actions-runner:latest
command: ["/home/runner/run.sh"]
env:
- name: ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE
value: /home/runner/pod-templates/default.yml
- name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
value: "false" ## To allow jobs without a job container to run, set ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER to false on your runner container. This instructs the runner to disable this check.
volumeMounts:
- name: pod-templates
mountPath: /home/runner/pod-templates
readOnly: true
volumes:
- name: work
ephemeral:
volumeClaimTemplate:
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: "managed-csi"
resources:
requests:
storage: ${local.volume_claim_size}
- name: pod-templates
configMap:
name: "runner-pod-template"
containerMode:
type: "kubernetes" ## type can be set to dind or kubernetes
## the following is required when containerMode.type=kubernetes
kubernetesModeWorkVolumeClaim:
accessModes: ["ReadWriteOnce"]
# For local testing, use https://github.com/openebs/dynamic-localpv-provisioner/blob/develop/docs/quickstart.md to provide dynamic provision volume with storageClassName: openebs-hostpath
storageClassName: "managed-csi"
resources:
requests:
storage: 50Gi
Pod Template YAML:
apiVersion: v1
data:
default.yml: |
"apiVersion": "v1"
"kind": "PodTemplate"
"metadata":
"name": "runner-pod-template"
"spec":
"containers":
- "name": "$job"
"resources":
"limits":
"cpu": "3000m"
"requests":
"cpu": "3000m"
Controller Logs
https://gist.github.com/jonathan-fileread/602f6d5fd948bf505a2fa7f5dbd78069
Runner Pod Logs
https://gist.githubusercontent.com/jonathan-fileread/96db9941abc5faba985aae78ef6b3760/raw/196644c97c7698e51bf6ae9b50dbf769dd4f1825/gistfile1.txt
Activity