Add specific exit code or recovery mechanism when agent dies with active worker pods in Kubernetes #5771
Unanswered
Deddinho23
asked this question in
Request a Feature
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Currently, when using Woodpecker in Kubernetes mode, if an agent (master) pod dies unexpectedly, all its associated worker pods also terminate with exit code 0.
This behavior creates a problem for our external automated systems that monitor pipeline status: since the exit code is 0, it is interpreted as a successful execution, making it impossible to detect and handle this scenario properly.
Why this matters
In many cases, the failure is transient and could be resolved by simply restarting the pipeline. However, without a distinct error code or recovery mechanism, we cannot reliably identify this condition.
Proposed Solutions
Current Workaround
We are considering running agents on more reliable nodes (e.g., on-demand instances) to minimize the risk, but this does not fully solve the problem.
Benefits
Would this feature be feasible?
It would greatly improve resilience and observability for Kubernetes users of Woodpecker!
Beta Was this translation helpful? Give feedback.
All reactions