-
Environmental Info: Node(s) CPU architecture, OS, and Version:
Cluster Configuration: 3 servers, 1 agent Describe the bug: One of the server nodes went corrupt, but after re-installing it, joining the existing cluster fails. I see the bootstrap succeeding, but then later I see:
from Etcd. Is the snapshot too large? Steps To Reproduce: export K3S_TOKEN=<blah>
curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=v1.29 sh -s - \
--disable traefik \
--disable local-storage \
--disable servicelb \
--server https://kube-api.service.<blah>:6443 \
--tls-san kube-api.service.<blah> \
--flannel-backend wireguard-native \
--kubelet-arg=allowed-unsafe-sysctls=net.ipv4.conf.all.src_valid_mark,net.ipv4.ip_forward,net.ipv6.conf.all.forwarding Expected behavior: Actual behavior: Additional context / logs:
On a healthy node, I see
(confirms networking is good) |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 8 replies
-
You should not be seeing this message. What version are your other nodes on? K3s has not written bootstrap data in a format that would need migration since K3s v1.22 back in 2021. (#3398). The fact that you are seeing this message indicates that either your other nodes are on a VERY old version, or the data in etcd on the other nodes is also corrupt. If the nodes are not on a very old version, I would suggest restoring from an etcd snapshot. Before doing this - and this is a long shot, I don't know what exactly happened to your node - you might try running |
Beta Was this translation helpful? Give feedback.
Yeah... something's definitely wrong here; the bootstrap data in the datastore is somehow incomplete so the nodes are generating their own CA data. What I'd probably try is:
k3s certificate rotate-ca --force --path /var/lib/rancher/k3s/server
to force a write of the content on disk to the datastore