Sealos Version
v5.0.1
How to reproduce the bug?
I attempted to test how Sealos recovers after a failure of the master0 node. During this, I discovered an issue: on control plane nodes other than master0, the host file on the host machine and the host file inside the pods are inconsistent. For example, in the controller-manager pod, when master0 becomes unavailable, controller-manager fails. The same behavior is observed in other control plane components as well.
Reproduce
The cluster is configured with three control plane nodes: master1, master2, and master3, and one worker node node1. The cluster was started with the following command:
root@master1:~# sealos gen registry.cn-shanghai.aliyuncs.com/labring/kubernetes:v1.29.9 registry.cn-shanghai.aliyuncs.com/labring/helm:v3.9.4 registry.cn-shanghai.aliyuncs.com/labring/cilium:v1.13.4 \
--masters 192.168.64.15,192.168.64.16,192.168.64.17 \
--nodes 192.168.64.18 \
-u root --pk='/root/.ssh/multipass_key' \
--output Clusterfile
sealos apply -f Clusterfile
Then shutdown the master1. Below is the output from controller-manager on master2 after master1 (master0) went offline. As you can see, the DNS resolution of apiserver.cluster.local points to master1 (master0).
root@master2:~# kubectl logs -n kube-system kube-controller-manager-master2 | tail -n10
E0526 13:58:46.446458 1 leaderelection.go:332] error retrieving resource lock kube-system/kube-controller-manager: Get "https://apiserver.cluster.local:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=5s": dial tcp 192.168.64.15:6443: connect: no route to host
E0526 13:58:52.590254 1 leaderelection.go:332] error retrieving resource lock kube-system/kube-controller-manager: Get "https://apiserver.cluster.local:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=5s": dial tcp 192.168.64.15:6443: connect: no route to host
E0526 13:58:58.736356 1 leaderelection.go:332] error retrieving resource lock kube-system/kube-controller-manager: Get "https://apiserver.cluster.local:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=5s": dial tcp 192.168.64.15:6443: connect: no route to host
E0526 13:59:01.811818 1 leaderelection.go:332] error retrieving resource lock kube-system/kube-controller-manager: Get "https://apiserver.cluster.local:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=5s": dial tcp 192.168.64.15:6443: connect: no route to host
E0526 13:59:04.879391 1 leaderelection.go:332] error retrieving resource lock kube-system/kube-controller-manager: Get "https://apiserver.cluster.local:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=5s": dial tcp 192.168.64.15:6443: connect: no route to host
E0526 13:59:11.023681 1 leaderelection.go:332] error retrieving resource lock kube-system/kube-controller-manager: Get "https://apiserver.cluster.local:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=5s": dial tcp 192.168.64.15:6443: connect: no route to host
E0526 13:59:14.104059 1 leaderelection.go:332] error retrieving resource lock kube-system/kube-controller-manager: Get "https://apiserver.cluster.local:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=5s": dial tcp 192.168.64.15:6443: connect: no route to host
E0526 13:59:17.166370 1 leaderelection.go:332] error retrieving resource lock kube-system/kube-controller-manager: Get "https://apiserver.cluster.local:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=5s": dial tcp 192.168.64.15:6443: connect: no route to host
E0526 13:59:20.240788 1 leaderelection.go:332] error retrieving resource lock kube-system/kube-controller-manager: Get "https://apiserver.cluster.local:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=5s": dial tcp 192.168.64.15:6443: connect: no route to host
E0526 13:59:26.395983 1 leaderelection.go:332] error retrieving resource lock kube-system/kube-controller-manager: Get "https://apiserver.cluster.local:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=5s": dial tcp 192.168.64.15:6443: connect: no route to host
By directly inspecting the hosts file under /var/lib/kubelet/pods, I confirmed this was indeed the case.
root@master2:~# grep -i "apiserver.cluster.local" -r /var/lib/kubelet/pods/
/var/lib/kubelet/pods/56a1a6061487d03b440de1b2e6d4cba5/etc-hosts:192.168.64.15 apiserver.cluster.local
/var/lib/kubelet/pods/e669d082174b1b8e93a3b80fa1a4a2b9/etc-hosts:192.168.64.15 apiserver.cluster.local
/var/lib/kubelet/pods/14812d81b7c93b918d4faed7ae4a6dcf/etc-hosts:192.168.64.15 apiserver.cluster.local
/var/lib/kubelet/pods/a27a8174-6ac0-4f5d-81fe-17a05a525c43/etc-hosts:192.168.64.15 apiserver.cluster.local
/var/lib/kubelet/pods/a27a8174-6ac0-4f5d-81fe-17a05a525c43/volumes/kubernetes.io~configmap/kube-proxy/..2025_05_26_02_51_19.923862820/kubeconfig.conf: server: https://apiserver.cluster.local:6443
/var/lib/kubelet/pods/a9d78507a0d74897c9c340c682a3413f/etc-hosts:192.168.64.15 apiserver.cluster.local
/var/lib/kubelet/pods/cfb54c6d-8bd4-4533-b2e1-8b9b74d47e43/etc-hosts:192.168.64.16 apiserver.cluster.local
root@master2:~# ls /var/lib/kubelet/pods/*/containers
/var/lib/kubelet/pods/14812d81b7c93b918d4faed7ae4a6dcf/containers:
kube-scheduler
/var/lib/kubelet/pods/362d9138-58c5-4961-b8b9-39d9aa57f8d7/containers:
coredns
/var/lib/kubelet/pods/56a1a6061487d03b440de1b2e6d4cba5/containers:
kube-controller-manager
/var/lib/kubelet/pods/a27a8174-6ac0-4f5d-81fe-17a05a525c43/containers:
kube-proxy
/var/lib/kubelet/pods/a9d78507a0d74897c9c340c682a3413f/containers:
etcd
/var/lib/kubelet/pods/cfb54c6d-8bd4-4533-b2e1-8b9b74d47e43/containers:
apply-sysctl-overwrites cilium-agent clean-cilium-state config install-cni-binaries mount-bpf-fs mount-cgroup
/var/lib/kubelet/pods/e669d082174b1b8e93a3b80fa1a4a2b9/containers:
kube-apiserver
/var/lib/kubelet/pods/e9afef92-76ae-461a-9fde-f105f5521e07/containers:
coredns
We can see that pod 56a1a6061487d03b440de1b2e6d4cba5 is kube-controller-manager with host record
192.168.64.15 apiserver.cluster.local.
However, the host machine’s /etc/hosts file looks like this:
root@master2:~# cat /etc/hosts
# Your system has configured 'manage_etc_hosts' as True.
# As a result, if you wish for changes to this file to persist
# then you will need to either
# a.) make changes to the master file in /etc/cloud/templates/hosts.debian.tmpl
# b.) change or remove the value of 'manage_etc_hosts' in
# /etc/cloud/cloud.cfg or cloud-config from user-data
#
127.0.1.1 master2 master2
127.0.0.1 localhost
# The following lines are desirable for IPv6 capable hosts
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
192.168.64.15 sealos.hub
192.168.64.16 apiserver.cluster.local
And not all control plane component pods have their apiserver.cluster.local entry pointing to master1 (master0) — some point to the host machine instead.
After reading parts of Sealos’ source code, I suspect this inconsistency is caused by a data race. During the initialization of control plane nodes, Sealos modifies the host machine’s /etc/hosts file twice: the first time during the init phase, it checks whether the node is a master, and if so, it points to master0. This makes sense because during join master, the node locates master0 through apiserver.cluster.local. The second modification happens after the kubeadm join command completes, at which point it only waits for the API server to become ready, but not all control plane components.
The first host modification in init stage
defaultInitializers = append(defaultInitializers, ®istryHostApplier{}, ®istryApplier{}, &defaultCRIInitializer{}, &apiServerHostApplier{}, &lvscareHostApplier{}, &defaultInitializer{})
func (a *apiServerHostApplier) Apply(ctx Context, host string) error {
if slices.Contains(ctx.GetCluster().GetMasterIPAndPortList(), host) {
if err := ctx.GetRemoter().HostsAdd(host, ctx.GetCluster().GetMaster0IP(), constants.DefaultAPIServerDomain); err != nil {
return fmt.Errorf("failed to add hosts: %v", err)
}
return nil
}
if err := ctx.GetRemoter().HostsAdd(host, ctx.GetCluster().GetVIP(), constants.DefaultAPIServerDomain); err != nil {
return fmt.Errorf("failed to add hosts: %v", err)
}
return nil
}
The second host modification after kubeadm join
func (k *KubeadmRuntime) joinMasters(masters []string) error {
// ...
err = k.sshCmdAsync(master, joinCmd)
if err != nil {
return fmt.Errorf("exec kubeadm join in %s failed %v", master, err)
}
err = k.execHostsAppend(master, master, k.getAPIServerDomain())
if err != nil {
return fmt.Errorf("add master0 apiserver domain hosts in %s failed %v", master, err)
}
// ...
}
The possible cases
case1 apiserver.cluster.local points to master0: first host modification --> pod start --> second host modification
case2 apiserver.cluster.local points to host machine: first host modification --> second host modification --> pod start
Here is the doc about behavior of kubeadm join
Without the feature gate enabled, kubeadm will only wait for the kube-apiserver on a control plane node to become ready.
The wait process starts right after the kubelet on the host is started by kubeadm.
You are advised to enable this feature gate in case you wish to observe a ready state from all control plane components
during the kubeadm init or kubeadm join command execution.
What is the expected behavior?
No response
What do you see instead?
No response
Operating environment
- Sealos version:
- Docker version:
- Kubernetes version:
- Operating system:
- Runtime environment:
- Cluster size:
- Additional information:
Additional information
No response
Sealos Version
v5.0.1
How to reproduce the bug?
I attempted to test how Sealos recovers after a failure of the master0 node. During this, I discovered an issue: on control plane nodes other than master0, the host file on the host machine and the host file inside the pods are inconsistent. For example, in the controller-manager pod, when master0 becomes unavailable, controller-manager fails. The same behavior is observed in other control plane components as well.
Reproduce
The cluster is configured with three control plane nodes: master1, master2, and master3, and one worker node node1. The cluster was started with the following command:
Then shutdown the master1. Below is the output from controller-manager on master2 after master1 (master0) went offline. As you can see, the DNS resolution of apiserver.cluster.local points to master1 (master0).
By directly inspecting the hosts file under /var/lib/kubelet/pods, I confirmed this was indeed the case.
We can see that pod 56a1a6061487d03b440de1b2e6d4cba5 is kube-controller-manager with host record
192.168.64.15 apiserver.cluster.local.
However, the host machine’s /etc/hosts file looks like this:
And not all control plane component pods have their apiserver.cluster.local entry pointing to master1 (master0) — some point to the host machine instead.
After reading parts of Sealos’ source code, I suspect this inconsistency is caused by a data race. During the initialization of control plane nodes, Sealos modifies the host machine’s /etc/hosts file twice: the first time during the init phase, it checks whether the node is a master, and if so, it points to master0. This makes sense because during join master, the node locates master0 through apiserver.cluster.local. The second modification happens after the kubeadm join command completes, at which point it only waits for the API server to become ready, but not all control plane components.
The first host modification in init stage
The second host modification after kubeadm join
The possible cases
Here is the doc about behavior of kubeadm join
What is the expected behavior?
No response
What do you see instead?
No response
Operating environment
Additional information
No response