Skip to content

Commit 7772213

Browse files
committed
add EvictionRequest Cancellation Examples section + fixes
1 parent 921e1df commit 7772213

File tree

1 file changed

+149
-20
lines changed
  • keps/sig-apps/4563-eviction-request-api

1 file changed

+149
-20
lines changed

keps/sig-apps/4563-eviction-request-api/README.md

+149-20
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,10 @@
3939
- [Pod Admission](#pod-admission)
4040
- [Immutability of EvictionRequest Spec Fields](#immutability-of-evictionrequest-spec-fields)
4141
- [EvictionRequest Process](#evictionrequest-process)
42+
- [EvictionRequest Cancellation Examples](#evictionrequest-cancellation-examples)
43+
- [Multiple Dynamic Requesters and No EvictionRequest Cancellation](#multiple-dynamic-requesters-and-no-evictionrequest-cancellation)
44+
- [Single Dynamic Requester and EvictionRequest Cancellation](#single-dynamic-requester-and-evictionrequest-cancellation)
45+
- [Single Dynamic Requester and Forbidden EvictionRequest Cancellation](#single-dynamic-requester-and-forbidden-evictionrequest-cancellation)
4246
- [Follow-up Design Details for Kubernetes Workloads](#follow-up-design-details-for-kubernetes-workloads)
4347
- [ReplicaSet Controller](#replicaset-controller)
4448
- [Deployment Controller](#deployment-controller)
@@ -268,7 +272,8 @@ or the last interceptor (lowest priority) has finished without terminating the p
268272
request controller will attempt to evict the pod using the existing API-initiated eviction.
269273

270274
Multiple requesters can request the eviction of the same pod, and optionally withdraw their request
271-
in certain scenarios.
275+
in certain scenarios
276+
([EvictionRequest Cancellation Examples](#evictionrequest-cancellation-examples)).
272277

273278
We can think of EvictionRequest as a managed and safer alternative to eviction.
274279

@@ -330,7 +335,8 @@ coincide with the deletion of the pod (evict or delete call). In some scenarios,
330335
terminated (e.g. by a remote call) if the pod `restartPolicy` allows it, to preserve the pod data
331336
for further processing or debugging.
332337

333-
The interceptor can also choose whether it handles EvictionRequest cancellation.
338+
The interceptor can also choose whether it handles EvictionRequest cancellation. See
339+
[EvictionRequest Cancellation Examples](#evictionrequest-cancellation-examples) for details.
334340

335341
We should discourage the creation of preventive EvictionRequests, so that they do not end up as
336342
another PDB. So we should design the API appropriately and also not allow behaviors that do not
@@ -497,7 +503,8 @@ already exists for this pod, the requester should still add itself to the finali
497503
are used for:
498504
- Tracking the requesters of this eviction request intent. This is used for observability and to
499505
handle concurrency for multiple requesters asking for the cancellation. The eviction request can
500-
be cancelled/deleted once all requesters have asked for the cancellation.
506+
be cancelled/deleted once all requesters have asked for the cancellation (see
507+
[EvictionRequest Cancellation Examples](#evictionrequest-cancellation-examples) for details).
501508
- Processing the eviction request result by the requester once the eviction process is complete.
502509

503510
If the eviction is no longer needed, the requester should remove itself from the finalizers of the
@@ -549,14 +556,14 @@ annotation. This annotation is parsed into the `Interceptor` type in the [Evicti
549556
characters (`63 - len("priority_")`)
550557
- `PRIORITY` and `ROLE`
551558
- `controller` should always set a `PRIORITY=10000` and `ROLE=controller`.
552-
- Other interceptors should set `PRIORITY` according to their own needs (minimum value is 0,
553-
maximum value is 100000). Higher priorities are selected first by the eviction request
554-
controller. They can use the `controller` interceptor as a reference point, if they want to be
555-
run before or after the `controller` interceptor. They can also observe pod annotations and
556-
detect what other interceptors have been registered for the eviction process. `ROLE` is optional
557-
and can be used as a signal to other interceptors. The `controller` value is reserved for pod
558-
controllers, but otherwise there is no guidance on how the third party interceptors should name
559-
their role.
559+
- Other interceptors should set `PRIORITY` according to their own needs (minimum value (lowest
560+
priority) is 0, maximum value (highest priority) is 100000). Higher priorities are selected
561+
first by the eviction request controller. They can use the `controller` interceptor as a
562+
reference point, if they want to be run before or after the `controller` interceptor. They can
563+
also observe pod annotations and detect what other interceptors have been registered for the
564+
eviction process. `ROLE` is optional and can be used as a signal to other interceptors. The
565+
`controller` value is reserved for pod controllers, but otherwise there is no guidance on how
566+
the third party interceptors should name their role.
560567
- Priorities `9900-10100` are reserved for interceptors with a class that has the same parent
561568
domain as the controller interceptor. Duplicate priorities are not allowed in this interval.
562569
- The number of interceptor annotations is limited to 30 in the 9900-10100 interval and to 70
@@ -609,7 +616,8 @@ it may update the status every 3 minutes. The status updates should look as foll
609616
request process of the pod cannot be stopped/cancelled. This will block any DELETE requests on the
610617
EvictionRequest object. If the interceptor supports eviction request cancellation, it should make
611618
sure that this field is set to `Allow`, and it should be aware that the EvictionRequest object can
612-
be deleted at any time.
619+
be deleted at any time. See
620+
[EvictionRequest Cancellation Examples](#evictionrequest-cancellation-examples) for details.
613621
- Update `.status.expectedInterceptorFinishTime` if a reasonable estimation can be made of how long
614622
the eviction process will take for the current interceptor. This can be modified later to change
615623
the estimate.
@@ -676,7 +684,7 @@ No attempt will be made to evict pods that are currently terminating.
676684
If the pod eviction fails, e.g. due to a blocking PodDisruptionBudget, the
677685
`.status.failedAPIEvictionCounter` is incremented and the pod is added back to the queue with
678686
exponential backoff (maximum approx. 15 minutes). If there is a positive progress update in the
679-
`.status.progressTimestamp` of the EvictionRequest, it will cancel the eviction.
687+
`.status.progressTimestamp` of the EvictionRequest, it will cancel the API-initated eviction.
680688

681689
#### Garbage Collection
682690

@@ -695,6 +703,9 @@ For convenience, we will also remove requester finalizers with
695703
`evictionrequest.coordination.k8s.io/` prefix when the eviction request task is complete (points 2
696704
and 3). Other finalizers will still block deletion.
697705

706+
For convenience, we will set `.status.evictionRequestCancellationPolicy` back to `Allow` if the
707+
value is `Forbid` and the pod has been fully terminated.
708+
698709
### EvictionRequest API
699710

700711
```golang
@@ -908,7 +919,11 @@ The pod labels are merged with the EvictionRequest labels (pod labels have a pre
908919
for custom label selectors when observing the eviction requests.
909920

910921
`.status.activeInterceptorClass` should be empty on creation as its selection should be left on the
911-
eviction request controller.
922+
eviction request controller. To strengthen the validation, we should check that it is possible to
923+
set only the highest priority interceptor in the beginning. After that it is possible to set only
924+
the next interceptor and so on. We can also condition this transition according to the other fields.
925+
`.status.ActiveInterceptorCompleted` should be true or `.status.ProgressTimestamp` has exceeded the
926+
deadline.
912927

913928
`.status.evictionRequestCancellationPolicy` should be `Allow` on creation, as its resolution should be
914929
left to the eviction request controller.
@@ -988,6 +1003,113 @@ The following diagrams describe what the EvictionRequest process will look like
9881003
![eviction-request-process](eviction-request-process.svg)
9891004

9901005

1006+
### EvictionRequest Cancellation Examples
1007+
1008+
Let's assume there is a single pod p-1 of application P with interceptors A and B:
1009+
1010+
```yaml
1011+
apiVersion: v1
1012+
kind: Pod
1013+
metadata:
1014+
annotations:
1015+
interceptor.evictionrequest.coordination.k8s.io/priority_actor-a.k8s.io: "10000/controller"
1016+
interceptor.evictionrequest.coordination.k8s.io/priority_actor-b.k8s.io: "11000/notifier-with-delay"
1017+
name: p-1
1018+
```
1019+
1020+
#### Multiple Dynamic Requesters and No EvictionRequest Cancellation
1021+
1022+
1. A node drain controller starts draining a node Z and makes it unschedulable.
1023+
2. The node drain controller creates an EvictionRequest for the only pod p-1 of application P to
1024+
evict it from a node. It sets the
1025+
`requester.evictionrequest.coordination.k8s.io/name_nodemaintenance.k8s.io` finalizer on the
1026+
EvictionRequest.
1027+
3. The descheduling controller notices that the pod p-1 is running in the wrong zone. It wants to
1028+
create an EvictionRequest (named after the pod's UID) for this pod, but the EvictionRequest
1029+
already exists. It sets the
1030+
`requester.evictionrequest.coordination.k8s.io/name_descheduling.avalanche.io` finalizer on the
1031+
EvictionRequest.
1032+
4. The eviction request controller designates Actor B as the next interceptor by updating
1033+
`.status.activeInterceptorClass`.
1034+
5. Actor B updates the EvictionRequest status and also sets
1035+
`.status.evictionRequestCancellationPolicy=Allow`.
1036+
6. Actor B begins notifying users of application P that the application will experience
1037+
a disruption and delays the disruption so that the users can finish their work.
1038+
7. The admin changes his/her mind and cancels the node drain of node Z and makes it schedulable
1039+
again.
1040+
8. The node drain controller removes the
1041+
`requester.evictionrequest.coordination.k8s.io/name_nodemaintenance.k8s.io` finalizer from the
1042+
EvictionRequest.
1043+
9. The eviction request controller notices the change in finalizers, but there is still a
1044+
descheduling finalizer, so no action is required.
1045+
10. Actor B sets `ActiveInterceptorCompleted=true` on the eviction requests of pod p-1, which is
1046+
ready to be deleted.
1047+
11. The eviction request controller designates Actor A as the next interceptor by updating
1048+
`.status.activeInterceptorClass`.
1049+
12. Actor A updates the EvictionRequest status and ensures that
1050+
`.status.evictionRequestCancellationPolicy=Allow`
1051+
13. Actor A deletes the p-1 pod.
1052+
14. EvictionRequest is garbage collected once the pods terminate even with the descheduling
1053+
finalizer present.
1054+
1055+
#### Single Dynamic Requester and EvictionRequest Cancellation
1056+
1057+
1. A node drain controller starts draining a node Z and makes it unschedulable.
1058+
2. The node drain controller creates an EvictionRequest for the only pod p-1 of application P to
1059+
evict it from a node. It sets the
1060+
`requester.evictionrequest.coordination.k8s.io/name_nodemaintenance.k8s.io` finalizer on the
1061+
EvictionRequest.
1062+
3. The eviction request controller designates Actor B as the next interceptor by updating
1063+
`.status.activeInterceptorClass`.
1064+
4. Actor B updates the EvictionRequest status and also sets
1065+
`.status.evictionRequestCancellationPolicy=Allow`.
1066+
5. Actor B begins notifying users of application P that the application will experience
1067+
a disruption and delays the disruption so that the users can finish their work.
1068+
6. The admin changes his/her mind and cancels the node drain of node Z and makes it schedulable
1069+
again.
1070+
7. The node drain controller removes the
1071+
`requester.evictionrequest.coordination.k8s.io/name_nodemaintenance.k8s.io` finalizer from the
1072+
EvictionRequest.
1073+
8. The eviction request controller notices the change in finalizers, and deletes (GC) the
1074+
EvictionRequest as there is no requester present.
1075+
9. Actor B can detect the removal of the EvictionRequest object and notify users of application P
1076+
that the disruption has been cancelled. If it misses the deletion event, then no notification
1077+
will be delivered. To avoid this, Actor B had the option of also setting a finalizer on the
1078+
EvictionRequest.
1079+
1080+
#### Single Dynamic Requester and Forbidden EvictionRequest Cancellation
1081+
1082+
1. A node drain controller starts draining a node Z and makes it unschedulable.
1083+
2. The node drain controller creates an EvictionRequest for the only pod p-1 of application P to
1084+
evict it from a node. It sets the
1085+
`requester.evictionrequest.coordination.k8s.io/name_nodemaintenance.k8s.io` finalizer on the
1086+
EvictionRequest.
1087+
3. The eviction request controller designates Actor B as the next interceptor by updating
1088+
`.status.activeInterceptorClass`.
1089+
4. Actor B updates the EvictionRequest status and also sets
1090+
`.status.evictionRequestCancellationPolicy=Forbid` to prevent the EvictionRequest from deletion
1091+
(enforced by API Admission).
1092+
5. Actor B begins notifying users of application P that the application will experience
1093+
a disruption and delays the disruption so that the users can finish their work.
1094+
6. The admin changes his/her mind and cancels the node drain of node Z and makes it schedulable
1095+
again.
1096+
7. The node drain controller removes the
1097+
`requester.evictionrequest.coordination.k8s.io/name_nodemaintenance.k8s.io` finalizer from the
1098+
EvictionRequest.
1099+
8. The eviction request controller notices the change in finalizers. Normally it should delete (GC)
1100+
the EvictionRequest as there is no requester present, but
1101+
`.status.evictionRequestCancellationPolicy=Forbid` prevents this.
1102+
9. Actor B sets `ActiveInterceptorCompleted=true` on the eviction requests of pod p-1, which is
1103+
ready to be deleted.
1104+
10. The eviction request controller designates Actor A as the next interceptor by updating
1105+
`.status.activeInterceptorClass`.
1106+
11. Actor A updates the EvictionRequest status and ensures that
1107+
`.status.evictionRequestCancellationPolicy=Forbid`. Alternatively, it could also change it to
1108+
`Allow` at this point, if it was just there, to ensure that Actor B's logic is atomic
1109+
12. Actor A deletes the p-1 pod.
1110+
13. EvictionRequest is garbage collected once the pods terminate. It has to first set
1111+
`.status.evictionRequestCancellationPolicy=Allow` to allow the object to be deleted.
1112+
9911113
### Follow-up Design Details for Kubernetes Workloads
9921114

9931115
Kubernetes Workloads should be made aware of the EvictionRequest API to properly support the
@@ -1095,7 +1217,8 @@ disruption for the underlying application. By scaling up first before terminatin
10951217
3. The node drain controller creates an EvictionRequests for a subset B of pods A to evict them from
10961218
a node.
10971219
4. The eviction request controller designates the deployment controller as the interceptor based on
1098-
the highest priority. No action (termination) is taken on the pods yet.
1220+
the highest priority by updating `.status.activeInterceptorClass`. No action (termination) is
1221+
taken on the pods yet.
10991222
5. The deployment controller creates a set of surge pods C to compensate for the future loss of
11001223
availability of pods B. The new pods are created by temporarily surging the `.spec.replicas`
11011224
count of the underlying replica sets up to the value of deployments `maxSurge`.
@@ -1104,7 +1227,8 @@ disruption for the underlying application. By scaling up first before terminatin
11041227
8. The deployment controller scales down the surging replica sets back to their original value.
11051228
9. The deployment controller sets `ActiveInterceptorCompleted=true` on the eviction requests of
11061229
pods B that are ready to be deleted.
1107-
10. The eviction request controller designates the replica set controller as the next interceptor.
1230+
10. The eviction request controller designates the replica set controller as the next interceptor by
1231+
updating `.status.activeInterceptorClass`.
11081232
11. The replica set controller deletes the pods to which an EvictionRequest object has been
11091233
assigned, preserving the availability of the application.
11101234

@@ -1194,15 +1318,17 @@ first before terminating the pods.
11941318
4. The node drain controller creates an EvictionRequest for the only pod of application W to evict
11951319
it from a node.
11961320
5. The eviction request controller designates the HPA as the interceptor based on the highest
1197-
priority. No action (termination) is taken on the single pod yet.
1321+
priority by updating `.status.activeInterceptorClass`. No action (termination) is taken on the
1322+
single pod yet.
11981323
6. The HPA controller creates a single surge pod B to compensate for the future loss of
11991324
availability of pod A. The new pod is created by temporarily scaling up the deployment.
12001325
7. Pod B is scheduled on a new schedulable node that is not under the node drain.
12011326
8. Pod B becomes available.
12021327
9. The HPA scales the surging deployment back down to 1 replica.
12031328
10. The HPA sets `ActiveInterceptorCompleted=true` on the eviction requests of pod A, which is ready
12041329
to be deleted.
1205-
11. The eviction request controller designates the replica set controller as the next interceptor.
1330+
11. The eviction request controller designates the replica set controller as the next interceptor by
1331+
updating `.status.activeInterceptorClass`.
12061332
12. The replica set controller deletes the pods to which an EvictionRequest object has been
12071333
assigned, preserving the availability of the webserver.
12081334

@@ -1230,11 +1356,13 @@ HPA Downscaling example:
12301356
priority. No action (termination) is taken on the pods yet.
12311357
6. The HPA downscales the Deployment workload.
12321358
7. The HPA sets `ActiveInterceptorCompleted=true` on its own eviction requests.
1233-
8. The eviction request controller designates the deployment controller as the next interceptor.
1359+
8. The eviction request controller designates the deployment controller as the next interceptor by
1360+
updating `.status.activeInterceptorClass`.
12341361
9. The deployment controller subsequently scales down the underlying ReplicaSet(s).
12351362
10. The deployment controller sets `ActiveInterceptorCompleted=true` on the eviction requests of
12361363
pods that are ready to be deleted.
1237-
11. The eviction request controller designates the replica set controller as the next interceptor.
1364+
11. The eviction request controller designates the replica set controller as the next interceptor by
1365+
updating `.status.activeInterceptorClass`.
12381366
12. The replica set controller deletes the pods to which an EvictionRequest object has been
12391367
assigned, preserving the scheduling constraints.
12401368

@@ -1772,6 +1900,7 @@ Pros:
17721900
- Versatility; users can use any name they see fit.
17731901
- `.metadata.generateName` is supported.
17741902
- Actors in the system have a greater incentive to use `.spec.podRef`.
1903+
17751904
Cons:
17761905
- Name conflict resolution is left up to the users, but as a workaround they can simply generate the
17771906
name.

0 commit comments

Comments
 (0)