Allow tainting Nodes even when blocker Pods exist

I'm proposing an optimization that should help a cluster perform a full reboot faster when there are blocker conditions.

Currently, the [logic is as follows](https://github.com/kubereboot/kured/blob/main/cmd/kured/main.go#L764-L775) when using both `--blocking-pod-selector` and `--prefer-no-schedule-taint` :

```go
var rebootRequiredBlockCondition string
if rebootBlocked(blockCheckers...) {
	rebootRequiredBlockCondition = ", but blocked at this time"
	continue
}
log.Infof("Reboot required%s", rebootRequiredBlockCondition)

if !holding(lock, &nodeMeta, concurrency > 1) && !acquire(lock, &nodeMeta, TTL, concurrency) {
	// Prefer to not schedule pods onto this node to avoid draing the same pod multiple times.
	preferNoScheduleTaint.Enable()
	continue
}
```

By this point, we know that the Node requires a reboot, but if a blocker exists (e.g. Prometheus alert is firing or Pods exist on the node that match the blocking selector), the main loop will just go back to sleep and wait for the next tick without tainting the node. The problem here though is that more blocker Pods could be scheduled to this node while we wait for the next tick cycle to happen, meaning that it could take an extensively long period before the Node happens to be free of blocker Pods and can be rebooted.

My proposal is to add a flag so that Nodes will be tainted as `PreferNoSchedule` as soon as they're detected as requiring a reboot, then the block checker can continue as normal. This way, there is a high chance that blocker Pods will schedule on other nodes instead, as long as the scheduler can accommodate them somewhere else. Once all of the blocking conditions have cleared, the Node will reboot as normal.

I'm happy to submit a PR to create this flag if it's agreed to!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow tainting Nodes even when blocker Pods exist #970

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Allow tainting Nodes even when blocker Pods exist #970

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions