Nodes never uncordoned

**Image I'm using:**
v1.4.0

**Issue:**
We're seeing our nodes get stuck with the taint `node.kubernetes.io/unschedulable:NoSchedule` after an update. It doesn't look like this happens to every node when it updates, only some of them. When it does hit this issue - the node successfully performs the update, reboots, but is never uncordoned when it comes back up.

The last thing in the controller logs is the event `RebootedIntoUpdate` and I see the node reports the new version of bottlerocket 
```
status: Some(BottlerocketShadowStatus { current_version: \"1.24.0\", target_version: \"1.25.0\", current_state: StagedAndPerformedUpdate, crash_count: 0, state_transition_failure_timestamp: None })
```

Last event I can see in the agent logs is `Bottlerocket node is terminated by reboot signal`
 
** Helm Values **
```
scheduler_cron_expression: "0 0 8 * * Sun"
logging:
  formatter: json
prometheus:
  controller:
    serviceMonitor:
      enabled: true
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nodes never uncordoned #691

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Nodes never uncordoned #691

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions