Skip to content

Nodes never uncordoned #691

Open
Open
@arcdigital

Description

@arcdigital

Image I'm using:
v1.4.0

Issue:
We're seeing our nodes get stuck with the taint node.kubernetes.io/unschedulable:NoSchedule after an update. It doesn't look like this happens to every node when it updates, only some of them. When it does hit this issue - the node successfully performs the update, reboots, but is never uncordoned when it comes back up.

The last thing in the controller logs is the event RebootedIntoUpdate and I see the node reports the new version of bottlerocket

status: Some(BottlerocketShadowStatus { current_version: \"1.24.0\", target_version: \"1.25.0\", current_state: StagedAndPerformedUpdate, crash_count: 0, state_transition_failure_timestamp: None })

Last event I can see in the agent logs is Bottlerocket node is terminated by reboot signal

** Helm Values **

scheduler_cron_expression: "0 0 8 * * Sun"
logging:
  formatter: json
prometheus:
  controller:
    serviceMonitor:
      enabled: true

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions