Skip to content

Some runners are never terminated correctly #3901

Open
@julien-michaud

Description

Checks

Controller Version

0.10.1

Deployment Method

Helm

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

install the 0.10.1 controller and start some jobs

Describe the bug

A few runners are stuck with this message

│ Events:                                                                                                                                                                                                                                                                    │
│   Type     Reason         Age                  From     Message                                                                                                                                                                                                            │
│   ----     ------         ----                 ----     -------                                                                                                                                                                                                            │
│   Normal   Killing        36m                  kubelet  Stopping container dind                                                                                                                                                                                            │
│   Warning  FailedKillPod  18m (x2 over 22m)    kubelet  error killing pod: [failed to "KillContainer" for "runner" with KillContainerError: "rpc error: code = DeadlineExceeded desc = an error occurs during waiting for container \"cfdc4b7bdc85e7ac5233ffa784edb90ff494 │
│ 07a6fa73a7279bf629acf0d6319c\" to be killed: wait container \"cfdc4b7bdc85e7ac5233ffa784edb90ff49407a6fa73a7279bf629acf0d6319c\": context deadline exceeded", failed to "KillPodSandbox" for "48d6b275-ff6b-41bb-afd2-dc7a8c092bb0" with KillPodSandboxError: "rpc error:  │
│ code = DeadlineExceeded desc = failed to stop container \"cfdc4b7bdc85e7ac5233ffa784edb90ff49407a6fa73a7279bf629acf0d6319c\": an error occurs during waiting for container \"cfdc4b7bdc85e7ac5233ffa784edb90ff49407a6fa73a7279bf629acf0d6319c\" to be killed: wait contain │
│ er \"cfdc4b7bdc85e7ac5233ffa784edb90ff49407a6fa73a7279bf629acf0d6319c\": context deadline exceeded"]                                                                                                                                                                       │
│   Warning  FailedKillPod  13m                  kubelet  error killing pod: [failed to "KillContainer" for "runner" with KillContainerError: "rpc error: code = DeadlineExceeded desc = an error occurs during waiting for container \"cfdc4b7bdc85e7ac5233ffa784edb90ff494 │
│ 07a6fa73a7279bf629acf0d6319c\" to be killed: wait container \"cfdc4b7bdc85e7ac5233ffa784edb90ff49407a6fa73a7279bf629acf0d6319c\": context deadline exceeded", failed to "KillPodSandbox" for "48d6b275-ff6b-41bb-afd2-dc7a8c092bb0" with KillPodSandboxError: "rpc error:  │
│ code = DeadlineExceeded desc = context deadline exceeded"]                                                                                                                                                                                                                 │
│   Warning  FailedKillPod  4m42s (x4 over 31m)  kubelet  error killing pod: [failed to "KillContainer" for "runner" with KillContainerError: "rpc error: code = DeadlineExceeded desc = context deadline exceeded", failed to "KillPodSandbox" for "48d6b275-ff6b-41bb-afd2 │
│ -dc7a8c092bb0" with KillPodSandboxError: "rpc error: code = DeadlineExceeded desc = context deadline exceeded"]                                                                                                                                                            │
│   Warning  FailedKillPod  11s                  kubelet  error killing pod: [failed to "KillContainer" for "runner" with KillContainerError: "rpc error: code = DeadlineExceeded desc = context deadline exceeded", failed to "KillPodSandbox" for "48d6b275-ff6b-41bb-afd2 │
│ -dc7a8c092bb0" with KillPodSandboxError: "rpc error: code = DeadlineExceeded desc = failed to stop container \"cfdc4b7bdc85e7ac5233ffa784edb90ff49407a6fa73a7279bf629acf0d6319c\": an error occurs during waiting for container \"cfdc4b7bdc85e7ac5233ffa784edb90ff49407a6 │
│ fa73a7279bf629acf0d6319c\" to be killed: wait container \"cfdc4b7bdc85e7ac5233ffa784edb90ff49407a6fa73a7279bf629acf0d6319c\": context deadline exceeded"]                                                                                                                  │
│   Normal   Killing        10s (x9 over 36m)    kubelet  Stopping container runner    

Is this something you guys encounter ?

Describe the expected behavior

runners are terminated normally

Additional Context

gke 1.30.8

Controller Logs

/

Runner Pod Logs

/

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinggha-runner-scale-setRelated to the gha-runner-scale-set modeneeds triageRequires review from the maintainers

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions