edibble21
diff --git a/‎website/content/en/docs/concepts/disruption.md
Lines changed: 72 additions & 39 deletions b/‎website/content/en/docs/concepts/disruption.md
Lines changed: 72 additions & 39 deletions
@@ -70,18 +70,14 @@ Automated graceful methods, can be rate limited through [NodePool Disruption Bud
   * Nodes can be removed as their workloads will run on other nodes in the cluster.
   * Nodes can be replaced with lower priced variants due to a change in the workloads.
 * [**Drift**]({{<ref "#drift" >}}): Karpenter will mark nodes as drifted and disrupt nodes that have drifted from their desired specification. See [Drift]({{<ref "#drift" >}}) to see which fields are considered.
-* [**Interruption**]({{<ref "#interruption" >}}): Karpenter will watch for upcoming interruption events that could affect your nodes (health events, spot interruption, etc.) and will taint, drain, and terminate the node(s) ahead of the event to reduce workload disruption.
 
 {{% alert title="Defaults" color="secondary" %}}
-Disruption is configured through the NodePool's disruption block by the `consolidationPolicy`, and `consolidateAfter` fields. `expireAfter` can also be used to control disruption. Karpenter will configure these fields with the following values by default if they are not set:
+Disruption is configured through the NodePool's disruption block by the `consolidationPolicy`, and `consolidateAfter` fields. Karpenter will configure these fields with the following values by default if they are not set:
 
 ```yaml
 spec:
   disruption:
     consolidationPolicy: WhenEmptyOrUnderutilized
-  template:
-    spec:
-      expireAfter: 720h
 ```
 {{% /alert %}}
 
@@ -169,10 +165,22 @@ Karpenter will add the `Drifted` status condition on NodeClaims if the NodeClaim
 
 ## Automated Forceful Methods
 
-Automated forceful methods will begin draining nodes as soon as the condition is met. Note that these methods blow past NodePool Disruption Budgets, and do not wait for a pre-spin replacement node to be healthy for the pods to reschedule, unlike the graceful methods mentioned above. Use Pod Disruption Budgets and `do-not-disrupt` on your nodes to rate-limit the speed at which your applications are disrupted.
+Automated forceful methods will begin draining nodes as soon as the condition is met.
+Unlike the graceful methods mentioned above, these methods can not be rate-limited using [NodePool Disruption Budgets](#nodepool-disruption-budgets), and do not wait for a pre-spin replacement node to be healthy for the pods to reschedule.
+Pod disruption budgets may be used to rate-limit application disruption.
 
 ### Expiration
-Karpenter will disrupt nodes as soon as they're expired after they've lived for the duration of the NodePool's `spec.template.spec.expireAfter`. You can use expiration to periodically recycle nodes due to security concern. 
+
+A node is expired once it's lifetime exceeds the duration set on the owning NodeClaim's `spec.expireAfter` field.
+Changes to `spec.template.spec.expireAfter` on the owning NodePool will not update the field for existing NodeClaims - it will induce NodeClaim drift and the replacements will have the updated value.
+Expiration can be used, in conjunction with [`terminationGracePeriod`](#termination-grace-period), to enforce a maximum Node lifetime.
+By default, `expireAfter` is set to `720h` (30 days).
+
+{{% alert title="Warning" color="warning" %}}
+Misconfigured PDBs and pods with the `karpenter.sh/do-not-disrupt` annotation may block draining indefinitely.
+For this reason, it is not recommended to set `expireAfter` without also setting `terminationGracePeriod` **if** your cluster has pods with the `karpenter.sh/do-not-disrupt` annotation.
+Doing so can result in partially drained nodes stuck in the cluster, driving up cluster cost and potentially requiring manual intervention to resolve.
+{{% /alert %}}
 
 ### Interruption
 
@@ -197,13 +205,13 @@ Karpenter enables this feature by watching an SQS queue which receives critical
 
 To enable interruption handling, configure the `--interruption-queue` CLI argument with the name of the interruption queue provisioned to handle interruption events.
 
-### Node Auto Repair 
+### Node Auto Repair
 
 <i class="fa-solid fa-circle-info"></i> <b>Feature State: </b> Karpenter v1.1.0 [alpha]({{<ref "../reference/settings#feature-gates" >}})
 
 Node Auto Repair is a feature that automatically identifies and replaces unhealthy nodes in your cluster, helping to maintain overall cluster health. Nodes can experience various types of failures affecting their hardware, file systems, or container environments. These failures may be surfaced through node conditions such as network unavailability, disk pressure, memory pressure, or other conditions reported by node diagnostic agents. When Karpenter detects these unhealthy conditions, it automatically replaces the affected nodes based on cloud provider-defined repair policies. Once a node has been in an unhealthy state beyond its configured toleration duration, Karpenter will forcefully terminate the node and its corresponding NodeClaim, bypassing the standard drain and grace period procedures to ensure swift replacement of problematic nodes. To prevent cascading failures, Karpenter includes safety mechanisms: it will not perform repairs if more than 20% of nodes in a NodePool are unhealthy, and for standalone NodeClaims, it evaluates this threshold against all nodes in the cluster. This ensures your cluster remains in a healthy state with minimal manual intervention, even in scenarios where normal node termination procedures might be impacted by the node's unhealthy state.
 
-To enable Node Auto Repair: 
+To enable Node Auto Repair:
   1.  Ensure you have a [Node Monitoring Agent](https://docs.aws.amazon.com/en_us/eks/latest/userguide/node-health.html) deployed or any agent that will add status conditions to nodes that are supported (e.g., Node Problem Detector)
   2.  Enable the feature flag: `NodeRepair=true`
   3. Node AutoRepair will automatically terminate nodes when they have unhealthy status conditions based on your cloud provider's repair policies
@@ -214,36 +222,58 @@ Karpenter monitors nodes for the following node status conditions when initiatin
 
 #### Kubelet Node Conditions
 
-|   Type  |    Status     | Toleration Duration | 
+|   Type  |    Status     | Toleration Duration |
 | ------  | ------------- | ------------------- |
 |  Ready  |     False     |     30 minutes      |
-|  Ready  |     Unknown   |     30 minutes      |    
+|  Ready  |     Unknown   |     30 minutes      |
 
 #### Node Monitoring Agent Conditions
 
-|            Type            |    Status     | Toleration Duration | 
+|            Type            |    Status     | Toleration Duration |
 | ------------------------   | ------------| --------------------- |
 |  AcceleratedHardwareReady  |     False   |     10 minutes        |
-|  StorageReady              |     False   |     30 minutes        |    
-|  NetworkingReady           |     False   |     30 minutes        |    
-|  KernelReady               |     False   |     30 minutes        |    
-|  ContainerRuntimeReady     |     False   |     30 minutes        |       
+|  StorageReady              |     False   |     30 minutes        |
+|  NetworkingReady           |     False   |     30 minutes        |
+|  KernelReady               |     False   |     30 minutes        |
+|  ContainerRuntimeReady     |     False   |     30 minutes        |
 
 To enable the drift feature flag, refer to the [Feature Gates]({{<ref "../reference/settings#feature-gates" >}}).
 
 ## Controls
 
-### TerminationGracePeriod 
+### TerminationGracePeriod
 
-You can set a NodePool's `terminationGracePeriod` through the `spec.template.spec.terminationGracePeriod` field. This field defines  the duration of time that a node can be draining before it's forcibly deleted. A node begins draining when it's deleted. Pods will be deleted preemptively based on its TerminationGracePeriodSeconds before this terminationGracePeriod ends to give as much time to cleanup as possible. Note that if your pod's terminationGracePeriodSeconds is larger than this terminationGracePeriod, Karpenter may forcibly delete the pod before it has its full terminationGracePeriod to cleanup. 
+To configure a maximum termination duration, `terminationGracePeriod` should be used.
+It is configured through a NodePool's [`spec.template.spec.terminationGracePeriod`]({{<ref "../concepts/nodepools/#spectemplatespecterminationgraceperiod" >}}) field, and is persisted to created NodeClaims (`spec.terminationGracePeriod`).
+Changes to the [`spec.template.spec.terminationGracePeriod`]({{<ref "../concepts/nodepools/#spectemplatespecterminationgraceperiod" >}}) field on the NodePool will not result in a change for existing NodeClaims - it will induce NodeClaim drift and the replacements will have the updated `terminationGracePeriod`.
 
-This is especially useful in combination with `nodepool.spec.template.spec.expireAfter` to define an absolute maximum on the lifetime of a node, where a node is deleted at `expireAfter` and finishes draining within the `terminationGracePeriod` thereafter. Pods blocking eviction like PDBs and do-not-disrupt will block full draining until the `terminationGracePeriod` is reached. 
+Once a node is disrupted, via either a [graceful](#automated-graceful-methods) or [forceful](#automated-forceful-methods) disruption method, Karpenter will being draining the node.
+At this point, the countdown for `terminationGracePeriod` begins.
+Once the `terminationGracePeriod` elapses, remaining pods will be forcibly deleted and the unerlying instance will be terminated.
+A node may be terminated before the `terminationGracePeriod` has elapsed if all disruptable pods have been drained.
+
+In conjunction with `expireAfter`, `terminationGracePeriod` can be used to enforce an absolute maximum node lifetime.
+The node will begin to drain once its `expireAfter` has elapsed, and it will be forcibly terminated once its `terminationGracePeriod` has elapsed, making the maximum node lifetime the sum of the two fields.
+
+Additionally, configuring `terminationGracePeriod` changes the eligibility criteria for disruption via `Drift`.
+When configured, a node may be disrupted via drift even if there are pods with blocking PDBs or the `karpenter.sh/do-not-disrupt` annotation scheduled to it.
+This enables cluster administrators to ensure crucial updates (e.g. AMI updates addressing CVEs) can't be blocked by misconfigured applications.
+
+{{% alert title="Warning" color="warning" %}}
+To ensure that the `terminationGracePeriodSeconds` value for draining pods is respected, pods will be preemptively deleted before the Node's `terminationGracePeriod` has elapsed.
+This includes pods with blocking [pod disruption budgets](https://kubernetes.io/docs/tasks/run-application/configure-pdb/) or the [`karpenter.sh/do-not-disrupt` annotation]({{<ref "#pod-level-controls" >}}).
 
-For instance, a NodeClaim with `terminationGracePeriod` set to `1h` and an `expireAfter` set to `23h` will begin draining after it's lived for `23h`. Let's say a `do-not-disrupt` pod has `TerminationGracePeriodSeconds` set to `300` seconds. If the node hasn't been fully drained after `55m`, Karpenter will delete the pod to allow it's full `terminationGracePeriodSeconds` to cleanup. If no pods are blocking draining, Karpenter will cleanup the node as soon as the node is fully drained, rather than waiting for the NodeClaim's `terminationGracePeriod` to finish.
+Consider the following example: a Node with a 1 hour `terminationGracePeriod` has been disrupted and begins to drain.
+A pod with the `karpenter.sh/do-not-disrupt` annotation and a 300 second (5 minute) `terminationGracePeriodsSeconds` is scheduled to it.
+If the pod is still running 55 minutes after the Node begins to drain, the pod will be deleted to ensure its `terminationGracePeriodSeconds` value is respected.
+
+If a pod's `terminationGracePeriodSeconds` value exceeds that of the Node it is scheduled to, Karpenter will prioritize the Node's `terminationGracePeriod`.
+The pod will be deleted as soon as the Node begins to drain, and it will not receive it's full `terminationGracePeriodSeconds`.
+{{% /alert %}}
 
 ### NodePool Disruption Budgets
 
-You can rate limit Karpenter's disruption through the NodePool's `spec.disruption.budgets`. If undefined, Karpenter will default to one budget with `nodes: 10%`. Budgets will consider nodes that are actively being deleted for any reason, and will only block Karpenter from disrupting nodes voluntarily through drift, emptiness, and consolidation. Note that NodePool Disruption Budgets do not prevent Karpenter from terminating expired nodes. 
+You can rate limit Karpenter's disruption through the NodePool's `spec.disruption.budgets`. If undefined, Karpenter will default to one budget with `nodes: 10%`. Budgets will consider nodes that are actively being deleted for any reason, and will only block Karpenter from disrupting nodes voluntarily through drift, emptiness, and consolidation. Note that NodePool Disruption Budgets do not prevent Karpenter from terminating expired nodes.
 
 #### Reasons
 Karpenter allows specifying if a budget applies to any of `Drifted`, `Underutilized`, or `Empty`. When a budget has no reasons, it's assumed that it applies to all reasons. When calculating allowed disruptions for a given reason, Karpenter will take the minimum of the budgets that have listed the reason or have left reasons undefined.
@@ -256,29 +286,26 @@ If the budget is configured with a percentage value, such as `20%`, Karpenter wi
 For example, the following NodePool with three budgets defines the following requirements:
 - The first budget will only allow 20% of nodes owned by that NodePool to be disrupted if it's empty or drifted. For instance, if there were 19 nodes owned by the NodePool, 4 empty or drifted nodes could be disrupted, rounding up from `19 * .2 = 3.8`.
 - The second budget acts as a ceiling to the previous budget, only allowing 5 disruptions when there are more than 25 nodes.
-- The last budget only blocks disruptions during the first 10 minutes of the day, where 0 disruptions are allowed, only applying to underutilized nodes. 
+- The last budget only blocks disruptions during the first 10 minutes of the day, where 0 disruptions are allowed, only applying to underutilized nodes.
 
 ```yaml
 apiVersion: karpenter.sh/v1
 kind: NodePool
 metadata:
   name: default
 spec:
-  template:
-    spec: 
-      expireAfter: 720h # 30 * 24h = 720h
   disruption:
     consolidationPolicy: WhenEmptyOrUnderutilized
     budgets:
     - nodes: "20%"
-      reasons: 
+      reasons:
       - "Empty"
       - "Drifted"
     - nodes: "5"
     - nodes: "0"
       schedule: "@daily"
       duration: 10m
-      reasons: 
+      reasons:
       - "Underutilized"
 ```
 
@@ -307,8 +334,18 @@ Duration and Schedule must be defined together. When omitted, the budget is alwa
 
 ### Pod-Level Controls
 
-You can block Karpenter from voluntarily choosing to disrupt certain pods by setting the `karpenter.sh/do-not-disrupt: "true"` annotation on the pod. This is useful for pods that you want to run from start to finish without disruption. By opting pods out of this disruption, you are telling Karpenter that it should not voluntarily remove a node containing this pod.
-
+You can block Karpenter from voluntarily disrupting and draining pods by adding the `karpenter.sh/do-not-disrupt: "true"` annotation to the pod.
+You can treat this annotation as a single-pod, permanently blocking PDB.
+This has the following consequences:
+- Nodes with `karpenter.sh/do-not-disrupt` pods will be excluded from [Consolidation]({{<ref "#consolidation" >}}), and conditionally excluded from [Drift]({{<ref "#drift" >}}).
+  - If the Node's owning NodeClaim has a [`terminationGracePeriod`]({{<ref "#terminationgraceperiod" >}}) configured, it will still be eligible for disruption via drift.
+- Like pods with a blocking PDB, pods with the `karpenter.sh/do-not-disrupt` annotation will **not** be gracefully evicted by the [Termination Controller]({{ref "#terminationcontroller"}}).
+  Karpenter will not be able to complete termination of the node until one of the following conditions is met:
+  - All pods with the `karpenter.sh/do-not-disrupt` annotation are removed.
+  - All pods with the `karpenter.sh/do-not-disrupt` annotation have entered a [terminal phase](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase) (`Succeeded` or `Failed`).
+  - The owning NodeClaim's [`terminationGracePeriod`]({{<ref "#terminationgraceperiod" >}}) has elapsed.
+
+This is useful for pods that you want to run from start to finish without disruption.
 Examples of pods that you might want to opt-out of disruption include an interactive game that you don't want to interrupt or a long batch job (such as you might have with machine learning) that would need to start over if it were interrupted.
 
 ```yaml
@@ -322,20 +359,16 @@ spec:
 ```
 
 {{% alert title="Note" color="primary" %}}
-This annotation will be ignored for [terminating pods](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase) and [terminal pods](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase) (Failed/Succeeded).
-{{% /alert %}}
-
-Examples of voluntary node removal that will be prevented by this annotation include:
-- [Consolidation]({{<ref "#consolidation" >}})
-- [Drift]({{<ref "#drift" >}})
-
-{{% alert title="Note" color="primary" %}}
-Voluntary node removal does not include [Interruption]({{<ref "#interruption" >}}) or manual deletion initiated through `kubectl delete node`. Both of these are considered involuntary events, since node removal cannot be delayed.
+The `karpenter.sh/do-not-disrupt` annotation does **not** exclude nodes from the forceful disruption methods: [Expiration]({{<ref "#expiration" >}}), [Interruption]({{<ref "#interruption" >}}), [Node Repair](<ref "#node-repair" >), and manual deletion (e.g. `kubectl delete node ...`).
+While both interruption and node repair have implicit upper-bounds on termination time, expiration and manual termination do not.
+Manual intervention may be required to unblock node termination, by removing pods with the `karpenter.sh/do-not-disrupt` annotation.
+For this reason, it is not recommended to use the `karpenter.sh/do-not-disrupt` annotation with `expireAfter` **if** you have not also configured `terminationGracePeriod`.
 {{% /alert %}}
 
 ### Node-Level Controls
 
-You can block Karpenter from voluntarily choosing to disrupt certain nodes by setting the `karpenter.sh/do-not-disrupt: "true"` annotation on the node. This will prevent disruption actions on the node.
+You can block Karpenter from voluntarily choosing to disrupt certain nodes by setting the `karpenter.sh/do-not-disrupt: "true"` annotation on the node.
+This will prevent voluntary disruption actions against the node.
 
 ```yaml
 apiVersion: v1