You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(controller): re-apply runtime-required taint on reboot when autoTaintNewNodes=true (#273)
When REAPPLY_ON_REBOOT=true, runtimeRequired=true, and autoTaintNewNodes=true
are all set, a rebooted node must be re-tainted so workloads cannot schedule on
it until Skyhook finishes re-applying. Previously, node.Reset() cleared the
nodeState/cordon/status annotations but left the autoTaint annotation intact,
so HasSkyhookAnnotations() remained true and HandleAutoTaint never re-tainted
the node.
The fix appends the runtime-required taint to the node inside the existing
ReapplyOnReboot block, after node.Reset() and before the StrategicMerge patch,
so the taint re-application and state reset land in one atomic apiserver write.
Taint removal on completion is unchanged (HandleRuntimeRequiredTaint).
Closes#180
Signed-off-by: Brian Lockwood <lockwobr@gmail.com>
Copy file name to clipboardExpand all lines: docs/runtime_required.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -37,6 +37,8 @@ When enabled, the operator automatically applies the runtime-required taint to n
37
37
38
38
A node is considered "new" if it has no Skyhook annotations. This works for both initial cluster setup (day 0) and nodes joining an existing cluster (day 2+). Nodes that have already been processed by Skyhook (and had their taint removed after completion) will not be re-tainted because they retain their Skyhook annotations.
39
39
40
+
**Exception: reboot with `REAPPLY_ON_REBOOT=true`.** When the operator is configured with `REAPPLY_ON_REBOOT=true` and a Skyhook has both `runtimeRequired: true` and `autoTaintNewNodes: true`, a node whose boot ID changes is treated as new for taint purposes. The runtime-required taint is re-applied alongside the state reset in the same atomic operation, ensuring no workloads can schedule on the rebooted node before Skyhook finishes re-applying. The taint is removed again by the normal completion path once all runtime-required Skyhooks finish on that node.
41
+
40
42
## What runtimeRequired: true will NOT do
41
43
42
44
1. Without `autoTaintNewNodes: true`, it will NOT add the taint to any nodes targeted by a SCR with `runtimeRequired: true`
r.recorder.Eventf(node.GetNode(), nil, EventTypeNormal, EventsReasonNodeReboot, "ResetNodeState", "detected reboot, resetting node for [%s] to be reapplied", node.GetSkyhook().Name)
681
681
node.Reset()
682
682
683
+
// Re-apply the runtime-required taint so workloads cannot schedule on the
684
+
// rebooted node until Skyhook finishes re-applying. The original auto-taint
685
+
// annotation survives Reset() and remains the record that this taint is
686
+
// operator-managed; no annotation update is needed.
0 commit comments