Skip to content

Actions Job on a self-hosted Runners fails with "Interrupt" and "POST request - HTTP Status: BadRequest" #3856

Open
@arseny-zinchenko

Description

Checks

Controller Version

0.9.3

Deployment Method

Helm

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

The issue occurs randomly, so a specific reproduction method has not been identified.

Describe the bug

A GitHub Actions job is failing with an error:

KeyboardInterrupt
make: *** [Makefile:24: activate-venv-nexus] Interrupt
Error: Process completed with exit code 130.

A corresponding Pod's CPU and RAM usage is below its Kubernetes Limits, and no OOM Killer was executed.
A Persistent Volume also isn't full.

When checking a Kubernetes Pod's logs, found the following error:

	
[RUNNER 2024-12-18 12:39:52Z ERR  GitHubActionsService] POST request to https://pipelinesghubeus6.actions.githubusercontent.com/***/_apis/oauth2/token failed. HTTP Status: BadRequest

And in the github-controller namespace:

EphemeralRunner	Runner does not exist in GitHub service	{"version": "0.9.3", "ephemeralrunner": {"name":"kraken-eks-runners-hjn6j-runner-cv6qd","namespace":"ops-github-runners-ns"}, "runnerId": 4103}
2024-12-18 14:39:53.532	
EphemeralRunner	Checking if runner exists in GitHub service	{"version": "0.9.3", "ephemeralrunner": {"name":"kraken-eks-runners-hjn6j-runner-cv6qd","namespace":"ops-github-runners-ns"}, "runnerId": 4103}

Re-running a Job usually helps, but sometimes may need to be restarted a few times.

Describe the expected behavior

The job completes without errors.

Additional Context

containerMode:
  type: "dind"

template:
  spec:
    initContainers:
    - name: kube-init
      image: ghcr.io/actions/actions-runner:latest
      command: ["sudo", "chown", "-R", "1001:123", "/home/runner/_work"]
      volumeMounts:
        - name: work
          mountPath: /home/runner/_work  
    containers:
      - name: dind
        image: 492***148.dkr.ecr.us-east-1.amazonaws.com/github-runners/docker-dind:latest
        args:
          - dockerd
          - --host=unix:///var/run/docker.sock
          - --group=$(DOCKER_GROUP_GID)
        env:
          - name: DOCKER_GROUP_GID
            value: "123"        
      - name: runner
        image: 492***148.dkr.ecr.us-east-1.amazonaws.com/github-runners/kraken:0.14
        command: ["/home/runner/run.sh"]
        env:
        - name: RUNNER_EKS
          value: "true"
        securityContext:
          capabilities:
            add: ["SYS_PTRACE"]
        allowPrivilegeEscalation: true
        resources:
          requests:
            cpu: 2
            memory: 4Gi
    topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app.kubernetes.io/name: kraken-eks-runners
    volumes:
      - name: work
        ephemeral:
          volumeClaimTemplate:
            spec:
              accessModes: [ "ReadWriteOnce" ]
              storageClassName: "gp3-iops"
              resources:
                requests:
                  storage: 40Gi

The 492***148.dkr.ecr.us-east-1.amazonaws.com/github-runners/kraken:0.14 Docker image is built from the latest Runners:

FROM ghcr.io/actions/actions-runner:2.321.0


### Controller Logs

```shell
https://gist.github.com/arseny-zinchenko/5aacf7174840ba3d4e63287f749fcb4e

Runner Pod Logs

https://gist.github.com/arseny-zinchenko/06e6e99f3b0884d60370e3d67d78af85

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinggha-runner-scale-setRelated to the gha-runner-scale-set modeneeds triageRequires review from the maintainers

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions