forked from kubernetes/kubernetes
-
Notifications
You must be signed in to change notification settings - Fork 2
pr for collaborating on disruption probe issue #89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
ryanmcnamara
wants to merge
11
commits into
master
Choose a base branch
from
rm/disruption-probe-issue
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
b1baed0
pr for collaborating on disruption probe issue
ryanmcnamara 4287eda
issue filled out with TODOs
ryanmcnamara 0b0a4d3
most todos addressed prep for meeting 4
ryanmcnamara ce27e5e
address todos
ryanmcnamara 24d37af
add examples
dhenkel92 3da43b2
add examples
dhenkel92 7289248
add examples
dhenkel92 a5802a0
add examples
dhenkel92 36322c9
wording
dhenkel92 bab8595
wording
dhenkel92 7f3773d
add wg mention
dhenkel92 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,85 @@ | ||
| # Feature request / problem statement | ||
|
|
||
| Our users rely heavily on eviction and pod disruption budgets for stability and for fleet management, see [this previous talk on rotating nodes](https://www.youtube.com/watch?v=KQ1obaC-ht0) for more context. | ||
|
|
||
| An issue our users often run into is that temporarily they do not want their pod to be evicted. The readiness probe is not an option because the pods critically do still need to serve traffic. | ||
|
|
||
| There are several examples where application owners had to build workarounds for the current behavior to distinguish between these two states: | ||
|
|
||
| ### Example 1) | ||
|
|
||
| We are running a custom-built distributed database. On pod startup, it is assigned a shard and synchronizes data with its siblings in the background. It can be configured to serve traffic once it has replicated enough data while continuing to sync in the background. | ||
| This means that the cluster is in a state where we need to serve traffic for stability reasons, but can't afford to lose another pod of the same shard during that time. | ||
|
|
||
| Readiness probes could be used by increasing the shard replica count, but it's tightly connected to the total cost of the application, which will increase significantly when running many small clusters. | ||
|
|
||
| ### Example 2) | ||
|
|
||
| The Elasticsearch operator has a similar problem. Elasticsearch clusters can be in different [health states (green / yellow / red)](https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-health.html). If the cluster health is not green, it means that it could still be ready, but the system shouldn't disrupt any of the pods. | ||
|
|
||
| Unfortunately, they can't rely on readiness probes only. If a [cluster is in a yellow state](https://www.elastic.co/guide/en/elasticsearch/reference/current/red-yellow-cluster-status.html), it means, for example, that one of the shards is missing replicas. So there should not be any disruption to prevent cluster stability issues, but the cluster nodes should still be ready to be able to serve traffic. | ||
|
|
||
| To mitigate the problem, the operator maintains logic to [update the cluster's PDB](https://github.com/elastic/cloud-on-k8s/blob/v2.16.1/pkg/controller/elasticsearch/pdb/reconcile.go#L193-L197) and change the `minAvailable` count depending on the health. | ||
|
|
||
| ### Impact | ||
|
|
||
| There are more cases like this, especially for stateful workloads. We have built multiple workarounds, such as implementing custom eviction API endpoints that behave similarly to the Kubernetes API but give more control to our users. | ||
|
|
||
| In all these cases, the failure domain is moved from pod level to an external controller. Ensuring application cluster stability is therefore coupled with the health of this controller, which is often less maintained and monitored. Furthermore, users have to actively work against Kubernetes primitives to reflect business needs. | ||
|
|
||
| The request is to have a mechanism provided by Kubernetes that can distinguish between whether a pod should be routable (readiness) and whether a pod should be disruptable. If not provided, the behavior will be as is today with readiness controlling disruptability. The solution outlined here is to have a disruption probe and pod status, similar to readiness. | ||
|
|
||
| # Proposal | ||
|
|
||
| The proposal is to add an additional probe the kubelet would perform, similar to [liveness and readiness](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/). See [probe definition](https://pkg.go.dev/k8s.io/api/core/v1#Probe), an example portion of the pod spec: | ||
|
|
||
| ``` | ||
| "readinessProbe": { # existing | ||
| ... | ||
| }, | ||
| "disruptionProbe": { # new | ||
| "failureThreshold": 3, | ||
| "httpGet": { | ||
| "path": "/disruptable", | ||
| "port": 8484, | ||
| "scheme": "HTTP" | ||
| }, | ||
| "initialDelaySeconds": 60, | ||
| "periodSeconds": 30, | ||
| "successThreshold": 1, | ||
| "timeoutSeconds": 1 | ||
| }, | ||
|
|
||
| ``` | ||
|
|
||
| Similarly there will be a corresponding status on the pod, an example portion of the pod status: | ||
|
|
||
| ``` | ||
| "status": { | ||
| "conditions": [ | ||
| ... | ||
| { # existing | ||
| ... | ||
| "status": "True", | ||
| "type": "Ready" | ||
| }, | ||
| { # new | ||
| "lastTransitionTime": "2025-02-05T20:26:15Z", | ||
| "status": "True", | ||
| "type": "Disruptable" | ||
| }, | ||
| ... | ||
| ] | ||
| ``` | ||
|
|
||
| ## Behavior, Implementation, and Details | ||
|
|
||
| If the disruption probe is defined, only pods with a `True` `Disruptable` condition status (see above status) will count towards the quota for disruption. The disruption controller will need to take this into account _instead_ of the `Ready` condition of pods it looks at today. To maintain current behaviour, if no disruption probe is defined, the kubelet will sync the `Ready` status to the `Disruptable` condition. | ||
|
|
||
| It's important that the system we build to support this use case fails closed. Meaning that if it doesn't work correctly we don't accidentally allow evictions that should not be allowed. Probes give this, because having a disruption probe initiated by the kubelet keeps the failure domain consistent with the kubelet, meaning that the eviction won't happen unless the kubelet is a healthy member of the cluster _and_ the kubelet gets an OK response from the relevant pods. An alternate solution could be to have a centralized gate controller effectively manage the `Disruptable` condition. This alternative disruption gates solution is not selected here since among other problems it would be difficult to make its behaviour fail closed. | ||
|
|
||
| ## Next steps | ||
|
|
||
| We are hoping to gauge interest with a lighter weight issue, if there is interest we will convert to a KEP. | ||
|
|
||
dhenkel92 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| Besides that, the issue might be related to the newly created [node-lifecycle WG](https://github.com/kubernetes/community/pull/8396). We would be happy to present the topic in that context too. | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.