Open
Description
Image I'm using:
v1.4.0
Issue:
We're seeing our nodes get stuck with the taint node.kubernetes.io/unschedulable:NoSchedule
after an update. It doesn't look like this happens to every node when it updates, only some of them. When it does hit this issue - the node successfully performs the update, reboots, but is never uncordoned when it comes back up.
The last thing in the controller logs is the event RebootedIntoUpdate
and I see the node reports the new version of bottlerocket
status: Some(BottlerocketShadowStatus { current_version: \"1.24.0\", target_version: \"1.25.0\", current_state: StagedAndPerformedUpdate, crash_count: 0, state_transition_failure_timestamp: None })
Last event I can see in the agent logs is Bottlerocket node is terminated by reboot signal
** Helm Values **
scheduler_cron_expression: "0 0 8 * * Sun"
logging:
formatter: json
prometheus:
controller:
serviceMonitor:
enabled: true
Metadata
Metadata
Assignees
Labels
No labels