Skip to content

Some runners pods are never terminated #3903

Closed
@julien-michaud

Description

@julien-michaud

Checks

Controller Version

0.10.1

Deployment Method

Helm

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

start jobs

Describe the bug

Sometimes, the runner pods continue running in zombie mode after completing their jobs.

Describe the expected behavior

runner pods should should be terminated after job completion

Additional Context

gha-runner-scale-set-controller:
  enabled: true
  flags:
    logLevel: "warn"
  podLabels:
    finops.company.net/cloud_provider: gcp
    finops.company.net/cost_center: compute
    finops.company.net/product: tools
    finops.company.net/service: actions-runner-controller
    finops.company.net/region: europe-west1
  replicaCount: 3
  podAnnotations:
    ad.datadoghq.com/manager.checks: |
      {
        "openmetrics": {
          "instances": [
            {
              "openmetrics_endpoint": "http://%%host%%:8080/metrics",
              "histogram_buckets_as_distributions": true,
              "namespace": "actions-runner-system",
              "metrics": [".*"]
            }
          ]
        }
      }
  metrics:
    controllerManagerAddr: ":8080"
    listenerAddr: ":8080"
    listenerEndpoint: "/metrics"

gha-runner-scale-set:
  enabled: true
  githubConfigUrl: https://github.com/company
  githubConfigSecret:
    github_token: <path:secret/github_token/actions_runner_controller#token>

  maxRunners: 100
  minRunners: 1

  containerMode:
    type: "dind"  ## type can be set to dind or kubernetes

  listenerTemplate:
    metadata:
      labels:
        finops.company.net/cloud_provider: gcp
        finops.company.net/cost_center: compute
        finops.company.net/product: tools
        finops.company.net/service: actions-runner-controller
        finops.company.net/region: europe-west1
      annotations:
        ad.datadoghq.com/listener.checks: |
          {
            "openmetrics": {
              "instances": [
                {
                  "openmetrics_endpoint": "http://%%host%%:8080/metrics",
                  "histogram_buckets_as_distributions": true,
                  "namespace": "actions-runner-system",
                  "max_returned_metrics": 6000,
                  "metrics": [".*"],
                  "exclude_metrics": [
                    "gha_job_startup_duration_seconds",
                    "gha_job_execution_duration_seconds"
                  ],
                  "exclude_labels": [
                    "enterprise",
                    "event_name",
                    "job_name",
                    "job_result",
                    "job_workflow_ref",
                    "organization",
                    "repository",
                    "runner_name"
                  ]
                }
              ]
            }
          }
    spec:
      containers:
      - name: listener
        securityContext:
          runAsUser: 1000
  template:
    metadata:
      labels:
        finops.company.net/cloud_provider: gcp
        finops.company.net/cost_center: compute
        finops.company.net/product: tools
        finops.company.net/service: actions-runner-controller
        finops.company.net/region: europe-west1
    spec:
      restartPolicy: OnFailure
      imagePullSecrets:
        - name: company-prod-registry
      containers:
        - name: runner
          image: eu.gcr.io/company-production/devex/gha-runners:v1.0.0-snapshot5
          command: ["/home/runner/run.sh"]

  controllerServiceAccount:
    namespace: actions-runner-system
    name: actions-runner-controller-gha-rs-controller

Controller Logs

Date,Host,Service,Message
"2025-01-29T15:16:06.017Z","""node_name""","""manager""","Ephemeral runner container is still running"
"2025-01-29T15:15:52.677Z","""node_name""","""manager""","Ephemeral runner container is still running"
"2025-01-29T15:15:52.671Z","""node_name""","""manager""","Updated ephemeral runner status with pod phase"
"2025-01-29T15:15:52.657Z","""node_name""","""manager""","Updating ephemeral runner status with pod phase"
"2025-01-29T15:15:52.657Z","""node_name""","""manager""","Ephemeral runner container is still running"
"2025-01-29T15:15:51.652Z","""node_name""","""manager""","Ephemeral runner container is still running"
"2025-01-29T15:15:49.690Z","""node_name""","""manager""","Ephemeral runner container is still running"
"2025-01-29T15:15:48.461Z","""node_name""","""manager""","Ephemeral runner container is still running"
"2025-01-29T15:15:48.456Z","""node_name""","""manager""","Updated ephemeral runner status with pod phase"
"2025-01-29T15:15:48.440Z","""node_name""","""manager""","Updating ephemeral runner status with pod phase"
"2025-01-29T15:15:48.440Z","""node_name""","""manager""","Ephemeral runner container is still running"
"2025-01-29T15:15:48.424Z","""node_name""","""manager""","Waiting for runner container status to be available"
"2025-01-29T15:15:48.399Z","""node_name""","""manager""","Created ephemeral runner pod"
"2025-01-29T15:15:48.367Z","""node_name""","""manager""","Created new pod spec for ephemeral runner"
"2025-01-29T15:15:48.366Z","""node_name""","""manager""","Creating new pod for ephemeral runner"
"2025-01-29T15:15:48.366Z","""node_name""","""manager""","Creating new EphemeralRunner pod."
"2025-01-29T15:15:48.361Z","""node_name""","""manager""","Created ephemeral runner secret"
"2025-01-29T15:15:48.313Z","""node_name""","""manager""","Created new secret spec for ephemeral runner"
"2025-01-29T15:15:48.313Z","""node_name""","""manager""","Creating new secret for ephemeral runner"
"2025-01-29T15:15:48.313Z","""node_name""","""manager""","Creating new ephemeral runner secret for jitconfig."
"2025-01-29T15:15:48.308Z","""node_name""","""manager""","Updated ephemeral runner status with runnerId and runnerJITConfig"
"2025-01-29T15:15:48.294Z","""node_name""","""manager""","Updating ephemeral runner status with runnerId and runnerJITConfig"
"2025-01-29T15:15:48.294Z","""node_name""","""manager""","Created ephemeral runner JIT config"
"2025-01-29T15:15:48.093Z","""node_name""","""manager""","Creating ephemeral runner JIT config"
"2025-01-29T15:15:48.093Z","""node_name""","""manager""","Creating new ephemeral runner registration and updating status with runner config"
"2025-01-29T15:15:48.093Z","""node_name""","""manager""","Successfully added runner registration finalizer"
"2025-01-29T15:15:48.076Z","""node_name""","""manager""","Adding runner registration finalizer"
"2025-01-29T15:15:48.076Z","""node_name""","""manager""","Successfully added finalizer"
"2025-01-29T15:15:48.059Z","""node_name""","""manager""","Adding finalizer"

Runner Pod Logs

https://gist.github.com/julien-michaud/ce2a1e5c5d494d89e09453f0b270a26f

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinggha-runner-scale-setRelated to the gha-runner-scale-set modeneeds triageRequires review from the maintainers

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions