-
Notifications
You must be signed in to change notification settings - Fork 21
docs: document NoExecute taint risks and add admission warning #120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -1,6 +1,10 @@ | ||||||||||||
|
|
||||||||||||
| ## Getting Started | ||||||||||||
|
|
||||||||||||
| This guide covers creating and configuring `NodeReadinessRule` resources. | ||||||||||||
|
|
||||||||||||
| > **Prerequisites**: Node Readiness Controller must be installed. See [Installation](./installation.md). | ||||||||||||
|
|
||||||||||||
| ### API Spec | ||||||||||||
|
|
||||||||||||
| #### Example: Storage Readiness Rule (Bootstrap-only) | ||||||||||||
|
|
@@ -11,17 +15,18 @@ This rule ensures nodes have working storage before removing the storage readine | |||||||||||
| apiVersion: readiness.node.x-k8s.io/v1alpha1 | ||||||||||||
| kind: NodeReadinessRule | ||||||||||||
| metadata: | ||||||||||||
| name: storage-readiness-rule | ||||||||||||
| name: nfs-storage-readiness-rule | ||||||||||||
| spec: | ||||||||||||
| conditions: | ||||||||||||
| - type: "storage.kubernetes.io/CSIReady" | ||||||||||||
| requiredStatus: "True" | ||||||||||||
| - type: "storage.kubernetes.io/VolumePluginReady" | ||||||||||||
| requiredStatus: "True" | ||||||||||||
| conditions: | ||||||||||||
| - type: "csi.example.net/NodePluginRegistered" | ||||||||||||
| requiredStatus: "True" | ||||||||||||
| - type: "csi.example.net/BackendReachable" | ||||||||||||
| requiredStatus: "True" | ||||||||||||
| - type: "DiskPressure" | ||||||||||||
| requiredStatus: "False" | ||||||||||||
| taint: | ||||||||||||
| key: "readiness.k8s.io/StorageReady" | ||||||||||||
| key: "readiness.k8s.io/vendor.com/nfs-unhealthy" | ||||||||||||
| effect: "NoSchedule" | ||||||||||||
| value: "pending" | ||||||||||||
| enforcementMode: "bootstrap-only" | ||||||||||||
| nodeSelector: | ||||||||||||
| matchLabels: | ||||||||||||
|
|
@@ -43,82 +48,19 @@ spec: | |||||||||||
| | `nodeSelector` | Label selector to target specific nodes | No | | ||||||||||||
| | `dryRun` | Preview changes without applying them | No | | ||||||||||||
|
|
||||||||||||
| ### Deployment | ||||||||||||
|
|
||||||||||||
| #### Option 1: Install official release images | ||||||||||||
|
|
||||||||||||
| Node-Readiness Controller offers two variants of the container image to support different cluster architectures. | ||||||||||||
|
|
||||||||||||
| Released container images are available for: | ||||||||||||
| * **x86_64** (AMD64) | ||||||||||||
| * **Arm64** (AArch64) | ||||||||||||
|
|
||||||||||||
| The controller image is available in the Kubernetes staging registry: | ||||||||||||
|
|
||||||||||||
| ```sh | ||||||||||||
| REPO="us-central1-docker.pkg.dev/k8s-staging-images/node-readiness-controller/node-readiness-controller" | ||||||||||||
|
|
||||||||||||
| TAG=$(skopeo list-tags docker://$REPO | jq .Tags[-1] | tr -d '"') | ||||||||||||
|
|
||||||||||||
| docker pull $REPO:$TAG | ||||||||||||
| ``` | ||||||||||||
|
|
||||||||||||
| #### Option 2: Deploy Using Make Commands | ||||||||||||
|
|
||||||||||||
| **Build and push your image to the location specified by `IMG_PREFIX`:`IMG_TAG` :** | ||||||||||||
|
|
||||||||||||
| ```sh | ||||||||||||
| make docker-build docker-push IMG_PREFIX=<some-registry>/nrr-controller IMG_TAG=tag | ||||||||||||
| ``` | ||||||||||||
|
|
||||||||||||
| ```sh | ||||||||||||
| # Install the CRDs | ||||||||||||
| make install | ||||||||||||
|
|
||||||||||||
| # Deploy the controller | ||||||||||||
| make deploy IMG_PREFIX=<some-registry>/nrr-controller IMG_TAG=tag | ||||||||||||
|
|
||||||||||||
| # Create sample rules | ||||||||||||
| kubectl apply -k examples/network-readiness-rule.yaml | ||||||||||||
| ``` | ||||||||||||
|
|
||||||||||||
| #### Option 3: Deploy Using Kustomize Directly | ||||||||||||
|
|
||||||||||||
| ```sh | ||||||||||||
| # Install CRDs | ||||||||||||
| kubectl apply -k config/crd | ||||||||||||
|
|
||||||||||||
| # Deploy controller and RBAC | ||||||||||||
| kubectl apply -k config/default | ||||||||||||
|
|
||||||||||||
| # Create sample rules | ||||||||||||
| kubectl apply -f examples/network-readiness-rule.yaml | ||||||||||||
| ``` | ||||||||||||
|
|
||||||||||||
| ### Uninstallation | ||||||||||||
|
|
||||||||||||
| > **Important**: Follow this order to avoid stuck resources due to finalizers. | ||||||||||||
|
|
||||||||||||
| The controller adds a finalizer (`readiness.node.x-k8s.io/cleanup-taints`) to each `NodeReadinessRule` to ensure node taints are cleaned up before the rule is deleted. This means you must delete CRs **while the controller is still running**. | ||||||||||||
|
|
||||||||||||
| ```sh | ||||||||||||
| # 1. Delete all rule instances first (while controller is running) | ||||||||||||
| kubectl delete nodereadinessrules --all | ||||||||||||
|
|
||||||||||||
| # 2. Delete the controller | ||||||||||||
| make undeploy | ||||||||||||
|
|
||||||||||||
| # 3. Delete the CRDs | ||||||||||||
| make uninstall | ||||||||||||
| ``` | ||||||||||||
|
|
||||||||||||
| #### Recovering from Stuck Resources | ||||||||||||
| ### Enforcement Modes | ||||||||||||
|
|
||||||||||||
| If you deleted the controller before removing the CRs, the finalizer will block CR deletion. To recover, manually remove the finalizer: | ||||||||||||
| #### Bootstrap-only Mode | ||||||||||||
| - Removes bootstrap taint when conditions are first satisfied | ||||||||||||
| - Marks completion with node annotation | ||||||||||||
| - Stops monitoring after successful removal (fail-safe) | ||||||||||||
| - Ideal for one-time setup conditions (installing node daemons e.g: security agent or kernel-module update) | ||||||||||||
|
|
||||||||||||
| ```sh | ||||||||||||
| kubectl patch nodereadinessrule <rule-name> -p '{"metadata":{"finalizers":[]}}' --type=merge | ||||||||||||
| ``` | ||||||||||||
| #### Continuous Mode | ||||||||||||
| - Continuously monitors conditions | ||||||||||||
| - Adds taint when any condition becomes unsatisfied | ||||||||||||
| - Removes taint when all conditions become satisfied | ||||||||||||
| - Ideal for ongoing health monitoring (network connectivity, resource availability) | ||||||||||||
|
|
||||||||||||
| ## Operations | ||||||||||||
|
|
||||||||||||
|
|
@@ -151,7 +93,7 @@ Test rules safely before applying: | |||||||||||
| spec: | ||||||||||||
| dryRun: true # Enable dry run mode | ||||||||||||
| conditions: | ||||||||||||
| - type: "storage.kubernetes.io/CSIReady" | ||||||||||||
| - type: "csi.example.net/NodePluginRegistered" | ||||||||||||
| requiredStatus: "True" | ||||||||||||
| # ... rest of spec | ||||||||||||
| ``` | ||||||||||||
|
|
@@ -162,28 +104,73 @@ Check dry run results: | |||||||||||
| kubectl get nodereadinessrule <rule-name> -o jsonpath='{.status.dryRunResults}' | ||||||||||||
| ``` | ||||||||||||
|
|
||||||||||||
| ### Enforcement Modes | ||||||||||||
| ### Rule Validation and Constraints | ||||||||||||
|
|
||||||||||||
| #### Bootstrap-only Mode | ||||||||||||
| - Removes bootstrap taint when conditions are first satisfied | ||||||||||||
| - Marks completion with node annotation | ||||||||||||
| - Stops monitoring after successful removal (fail-safe) | ||||||||||||
| - Ideal for one-time setup conditions (storage, installing node daemons e.g: security agent or kernel-module update) | ||||||||||||
| #### NoExecute Taint Effect Warning | ||||||||||||
|
|
||||||||||||
| #### Continuous Mode | ||||||||||||
| - Continuously monitors conditions | ||||||||||||
| - Adds taint when any condition becomes unsatisfied | ||||||||||||
| - Removes taint when all conditions become satisfied | ||||||||||||
| - Ideal for ongoing health monitoring (network connectivity, resource availability) | ||||||||||||
| **`NoExecute` with `continuous` enforcement mode will evict existing workloads when conditions fail.** | ||||||||||||
|
|
||||||||||||
| ## Configuration | ||||||||||||
| If a readiness condition on the node is failing temporarily (eg., the component restarted), all pods without matching tolerations are immediately evicted from the node, if configured with a `NoExecute` taint. Use `NoSchedule` to prevent new scheduling without disrupting running workloads. | ||||||||||||
|
|
||||||||||||
| The admission webhook warns when using `NoExecute`. | ||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Did you get a chance to test your changes w/ warnings? You may need to follow this PR: #122 for testing the webhook. nit / optional: If you get a chance to verify it, you could also add the warning here as response to kubectl. |
||||||||||||
|
|
||||||||||||
| See [Kubernetes taints documentation](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/) for taint behavior details. | ||||||||||||
|
|
||||||||||||
| #### Avoiding Taint Key Conflicts | ||||||||||||
|
|
||||||||||||
| The admission webhook prevents multiple rules from using the same `taint.key` and `taint.effect` on overlapping node selectors. | ||||||||||||
|
|
||||||||||||
| **Example conflict:** | ||||||||||||
| ```yaml | ||||||||||||
| # Rule 1 | ||||||||||||
| spec: | ||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
let us also include conditions to show the risks / conflicts |
||||||||||||
| conditions: | ||||||||||||
| - type: "device.gpu-vendor.net/DevicePluginRegistered" | ||||||||||||
| requiredStatus: "True" | ||||||||||||
| nodeSelector: | ||||||||||||
| matchLabels: | ||||||||||||
| feature.node.kubernetes.io/pci-10de.present: "true" | ||||||||||||
| taint: | ||||||||||||
| key: "readiness.k8s.io/vendor.com/gpu-not-ready" | ||||||||||||
| effect: "NoSchedule" | ||||||||||||
|
|
||||||||||||
| # Rule 2 - This will be REJECTED | ||||||||||||
| spec: | ||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||
| conditions: | ||||||||||||
| - type: "cniplugin.example.net/rdma/NetworkReady" | ||||||||||||
| requiredStatus: "True" | ||||||||||||
| nodeSelector: | ||||||||||||
| matchLabels: | ||||||||||||
| feature.node.kubernetes.io/pci-10de.present: "true" | ||||||||||||
| taint: | ||||||||||||
| key: "readiness.k8s.io/vendor.com/gpu-not-ready" # Same (taint-key + effect) but different conditions = conflict | ||||||||||||
| effect: "NoSchedule" | ||||||||||||
| ``` | ||||||||||||
|
|
||||||||||||
| Use unique, descriptive taint keys for different readiness checks. | ||||||||||||
|
|
||||||||||||
| #### Taint Key Naming | ||||||||||||
|
|
||||||||||||
| Follow [Kubernetes naming conventions](https://kubernetes.io/docs/concepts/overview/working-with-objects/names/). | ||||||||||||
|
|
||||||||||||
| Taint keys must have the `readiness.k8s.io/` prefix to clearly identify readiness-related taints and avoid conflicts with other controllers | ||||||||||||
|
|
||||||||||||
| ### Security | ||||||||||||
| **Valid:** | ||||||||||||
| ```yaml | ||||||||||||
| taint: | ||||||||||||
| key: "readiness.k8s.io/vendor.com/network-not-ready" | ||||||||||||
| key: "readiness.k8s.io/vendor.com/gpu-not-ready" | ||||||||||||
| ``` | ||||||||||||
|
|
||||||||||||
| The controller requires the following RBAC permissions: | ||||||||||||
| - **Nodes**: `get`, `list`, `watch`, `patch`, `update` (for taint management) | ||||||||||||
| - **NodeReadinessRules**: Full CRUD access | ||||||||||||
| - **Events**: `create` (for status reporting) | ||||||||||||
| **Invalid:** | ||||||||||||
| ```yaml | ||||||||||||
| taint: | ||||||||||||
| key: "network-ready" # Missing prefix | ||||||||||||
| key: "node.kubernetes.io/ready" # Wrong prefix | ||||||||||||
| ``` | ||||||||||||
|
|
||||||||||||
|
|
||||||||||||
| ## Configuration | ||||||||||||
|
|
||||||||||||
| ### Performance and Scalability | ||||||||||||
|
|
||||||||||||
|
|
@@ -211,38 +198,3 @@ conditions: | |||||||||||
| - type: "readiness.k8s.io/mycompany.example.com/CacheWarmed" | ||||||||||||
| requiredStatus: "True" | ||||||||||||
| ``` | ||||||||||||
|
|
||||||||||||
| #### With Cluster Autoscaler | ||||||||||||
| NodeReadinessController work well with cluster autoscaling: | ||||||||||||
| - New nodes start with restrictive taints | ||||||||||||
| - Controller removes taints once conditions are satisfied | ||||||||||||
| - Autoscaler can safely scale knowing nodes are truly ready | ||||||||||||
|
|
||||||||||||
| ## Development | ||||||||||||
|
|
||||||||||||
| ### Building from Source | ||||||||||||
|
|
||||||||||||
| ```sh | ||||||||||||
| # Clone the repository | ||||||||||||
| git clone https://sigs.k8s.io/node-readiness-controller.git | ||||||||||||
| cd node-readiness-controller | ||||||||||||
|
|
||||||||||||
| # Run tests | ||||||||||||
| make test | ||||||||||||
|
|
||||||||||||
| # Build binary | ||||||||||||
| make build | ||||||||||||
|
|
||||||||||||
| # Generate manifests | ||||||||||||
| make manifests | ||||||||||||
| ``` | ||||||||||||
|
|
||||||||||||
| ### Running Locally | ||||||||||||
|
|
||||||||||||
| ```sh | ||||||||||||
| # Install CRDs | ||||||||||||
| make install | ||||||||||||
|
|
||||||||||||
| # Run against cluster (requires KUBECONFIG) | ||||||||||||
| make run | ||||||||||||
| ``` | ||||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -6,8 +6,6 @@ Follow this guide to install the Node Readiness Controller in your Kubernetes cl | |
|
|
||
| ### Option 1: Install Official Release (Recommended) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we could keep this as well |
||
|
|
||
| The easiest way to get started is by applying the official release manifests. | ||
|
|
||
| First, to install the CRDs, apply the `crds.yaml` manifest: | ||
|
|
||
| ```sh | ||
|
|
@@ -34,11 +32,16 @@ If it gets evicted during resource pressure, nodes can't transition to Ready sta | |
|
|
||
| This is the priority class used by other critical cluster components (eg: core-dns). | ||
|
|
||
| **Images**: The official releases use multi-arch images (AMD64, Arm64). | ||
| #### Images | ||
|
|
||
| ### Option 2: Deploy Using Kustomize | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. please revert to existing, let us keep the kustomize option |
||
| The official releases use multi-arch images (AMD64, Arm64) and are available at `registry.k8s.io/node-readiness-controller/node-readiness-controller` | ||
|
|
||
| If you have cloned the repository and want to deploy from source, you can use Kustomize. | ||
| ```sh | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we already have installation flows captured: simple consume from official release images, and using kustomize for advanced setup. I think documenting 3 different installation paths would just be confusing the reader. Maybe we could hold on to this for now and come back to updating the installation section for #94.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. makes sense, I've simplified the installation method for now, we can re-visit later as per your suggestion.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we could skip this detail |
||
| REPO="registry.k8s.io/node-readiness-controller/node-readiness-controller" | ||
| TAG=$(skopeo list-tags docker://$REPO | jq .'Tags[-1]' | tr -d '"') | ||
| docker pull $REPO:$TAG | ||
| ``` | ||
| ### Option 2: Deploy Using Kustomize | ||
|
|
||
| ```sh | ||
| # 1. Install Custom Resource Definitions (CRDs) | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -20,6 +20,7 @@ import ( | |
| "context" | ||
| "fmt" | ||
|
|
||
| corev1 "k8s.io/api/core/v1" | ||
| metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||
| "k8s.io/apimachinery/pkg/runtime" | ||
| "k8s.io/apimachinery/pkg/util/validation/field" | ||
|
|
@@ -165,6 +166,35 @@ func (w *NodeReadinessRuleWebhook) nodSelectorsOverlap(selector1, selector2 meta | |
| return sel1.String() == sel2.String() | ||
| } | ||
|
|
||
| // generateNoExecuteWarnings generates admission warnings for NoExecute taint usage. | ||
| // NoExecute taints cause immediate pod eviction, which can be disruptive when | ||
| // used with continuous enforcement mode. | ||
| func (w *NodeReadinessRuleWebhook) generateNoExecuteWarnings(spec readinessv1alpha1.NodeReadinessRuleSpec) admission.Warnings { | ||
| var warnings admission.Warnings | ||
|
|
||
| if spec.Taint.Effect != corev1.TaintEffectNoExecute { | ||
| return warnings | ||
| } | ||
|
|
||
| // NoExecute with continuous mode is particularly risky | ||
| if spec.EnforcementMode == readinessv1alpha1.EnforcementModeContinuous { | ||
| warnings = append(warnings, | ||
| "CAUTION: Using NoExecute taint effect with continuous enforcement mode. "+ | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for also handling API warnings. Though these are nice to have, there seem to be length restrictions on processing the admission warnings. ref: https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#response. Could you check if this satisfies the criteria? |
||
| "This configuration will evict existing pods when conditions fail, which may cause "+ | ||
| "workload disruption if conditions are unstable. Consider using NoSchedule "+ | ||
| "effect instead, or bootstrap-only enforcement mode. "+ | ||
| "See: https://node-readiness-controller.sigs.k8s.io/user-guide/getting-started.html") | ||
| } else { | ||
| // NoExecute with bootstrap-only is less risky but still worth noting | ||
| warnings = append(warnings, | ||
| "NOTE: Using NoExecute taint effect. This will evict existing pods that do not "+ | ||
| "tolerate this taint when applied. Ensure critical system pods have appropriate tolerations. "+ | ||
| "See: https://node-readiness-controller.sigs.k8s.io/user-guide/getting-started.html") | ||
| } | ||
|
|
||
| return warnings | ||
| } | ||
|
|
||
| // SetupWithManager sets up the webhook with the manager. | ||
| func (w *NodeReadinessRuleWebhook) SetupWithManager(mgr ctrl.Manager) error { | ||
| return ctrl.NewWebhookManagedBy(mgr). | ||
|
|
@@ -185,7 +215,10 @@ func (w *NodeReadinessRuleWebhook) ValidateCreate(ctx context.Context, obj runti | |
| if allErrs := w.validateNodeReadinessRule(ctx, rule, false); len(allErrs) > 0 { | ||
| return nil, fmt.Errorf("validation failed: %v", allErrs) | ||
| } | ||
| return nil, nil | ||
|
|
||
| // Generate warnings for NoExecute taint usage | ||
| warnings := w.generateNoExecuteWarnings(rule.Spec) | ||
| return warnings, nil | ||
| } | ||
|
|
||
| func (w *NodeReadinessRuleWebhook) ValidateUpdate(ctx context.Context, oldObj, newObj runtime.Object) (admission.Warnings, error) { | ||
|
|
@@ -197,7 +230,10 @@ func (w *NodeReadinessRuleWebhook) ValidateUpdate(ctx context.Context, oldObj, n | |
| if allErrs := w.validateNodeReadinessRule(ctx, rule, true); len(allErrs) > 0 { | ||
| return nil, fmt.Errorf("validation failed: %v", allErrs) | ||
| } | ||
| return nil, nil | ||
|
|
||
| // Generate warnings for NoExecute taint usage | ||
| warnings := w.generateNoExecuteWarnings(rule.Spec) | ||
| return warnings, nil | ||
| } | ||
|
|
||
| func (w *NodeReadinessRuleWebhook) ValidateDelete(ctx context.Context, obj runtime.Object) (admission.Warnings, error) { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Operations + Monitoring could go onto its own page including the metrics improvement we handled recently. We could leave this for a follow up #94