-
Notifications
You must be signed in to change notification settings - Fork 815
Description
Summary
I'm running 3 nodes on 1.34. When I take a host down I get 1 of 3 outcomes:
- All nodes remain reported as Ready even though a host is not accessible
- The node I put offline will go NotReady as will another healthy one, ie 2 nodes reported as NotReady
- The node I put offline will be marked NotReady
This is reproducible on different systems on both Red Hat and Ubuntu
Seeing other undesirable behaviour like pods not being (re)scheduled
dmhost1 taken offline:
labuser@dmhost2:~$ kubectl get no
NAME STATUS ROLES AGE VERSION
dmhost1 NotReady 9d v1.34.1
dmhost2 Ready 9d v1.34.1
dmhost3 NotReady 9d v1.34.1
When I brought host1 back online, it returned to Ready, however host3 did not (should not have been NotReady)
dmhost3
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
NetworkUnavailable False Thu, 23 Oct 2025 10:45:20 +0000 Thu, 23 Oct 2025 10:45:20 +0000 CalicoIsUp Calico is running on this node
MemoryPressure Unknown Thu, 23 Oct 2025 10:51:25 +0000 Thu, 23 Oct 2025 10:50:53 +0000 NodeStatusUnknown Kubelet stopped posting node status.
DiskPressure Unknown Thu, 23 Oct 2025 10:51:25 +0000 Thu, 23 Oct 2025 10:50:53 +0000 NodeStatusUnknown Kubelet stopped posting node status.
PIDPressure Unknown Thu, 23 Oct 2025 10:51:25 +0000 Thu, 23 Oct 2025 10:50:53 +0000 NodeStatusUnknown Kubelet stopped posting node status.
Ready Unknown Thu, 23 Oct 2025 10:51:25 +0000 Thu, 23 Oct 2025 10:50:53 +0000 NodeStatusUnknown Kubelet stopped posting node status.
Seen other strange behaviour like not reacting to the NotReady
What Should Happen Instead?
Failed node, and only failed node, should be marked NotReady
Reproduction Steps
- 3 node system
- Take host down
Introspection Report
dmhost1 offline
inspection-report-20251023_113929.tar.gz