Skip to content

使用 kubeadm 还原集群配置

pixiake edited this page Aug 18, 2025 · 2 revisions

背景

kubekey 3.x 中如果在添加节点时,找不到 master 节点中的 /etc/kubernetes/admin.conf 会导致触发 kubeadm reset 在 master 节点上执行。kubeadm reset 会导致master节点上的集群配置被清空,包括:/etc/kubernetes 以及 /var/lib/kubelet/ 中的配置和证书文件。 相关代码位置:https://github.com/kubesphere/kubekey/blob/d7dac556c09609570bb702be29a5ca54b721155b/cmd/kk/pkg/kubernetes/tasks.go#L56 image

遇到以上问题,或人工手动删除了一些文件,可以使用kubeadm 进行还原,可参考以下步骤。

还原步骤

初始化第一个 master 节点

以下步骤均在第一个master节点执行,用于恢复集群 control-plane

创建配置文件

配置文件是关键

根据集群具体信息参考以下配置文件创建 /etc/kubernetes/kubeadm-confg.yaml

---
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
etcd:
  external:
    endpoints:
    - https://192.168.6.4:2379
    - https://192.168.6.5:2379
    - https://192.168.6.6:2379
    caFile: /etc/ssl/etcd/ssl/ca.pem
    certFile: /etc/ssl/etcd/ssl/node-node1.pem
    keyFile: /etc/ssl/etcd/ssl/node-node1-key.pem
dns:
  imageRepository: coredns
  imageTag: 1.8.6
imageRepository: kubesphere
kubernetesVersion: v1.23.17
certificatesDir: /etc/kubernetes/pki
clusterName: cluster.local
controlPlaneEndpoint: lb.kubesphere.local:6443
networking:
  dnsDomain: cluster.local
  podSubnet: 10.233.64.0/18
  serviceSubnet: 10.233.0.0/18
apiServer:
  extraArgs:
    bind-address: 0.0.0.0
    feature-gates: RotateKubeletServerCertificate=true,TTLAfterFinished=true
  certSANs:
    - "kubernetes"
    - "kubernetes.default"
    - "kubernetes.default.svc"
    - "kubernetes.default.svc.cluster.local"
    - "localhost"
    - "127.0.0.1"
    - "lb.kubesphere.local"
    - "192.168.6.4"
    - "node1"
    - "node1.cluster.local"
    - "node2"
    - "node2.cluster.local"
    - "192.168.6.5"
    - "node3"
    - "node3.cluster.local"
    - "192.168.6.6"
    - "10.233.0.1"
controllerManager:
  extraArgs:
    node-cidr-mask-size: "24"
    bind-address: 0.0.0.0
    cluster-signing-duration: 87600h
    feature-gates: RotateKubeletServerCertificate=true,TTLAfterFinished=true
  extraVolumes:
  - name: host-time
    hostPath: /etc/localtime
    mountPath: /etc/localtime
    readOnly: true
scheduler:
  extraArgs:
    bind-address: 0.0.0.0
    feature-gates: RotateKubeletServerCertificate=true,TTLAfterFinished=true

---
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 192.168.6.4
  bindPort: 6443
nodeRegistration:
  kubeletExtraArgs:
    cgroup-driver: systemd
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
clusterCIDR: 10.233.64.0/18
iptables:
    masqueradeAll: false
    masqueradeBit: 14
    minSyncPeriod: 0s
    syncPeriod: 30s
mode: ipvs
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd
clusterDNS:
    - 169.254.25.10
clusterDomain: cluster.local
evictionHard:
    memory.available: 5%
    pid.available: 10%
evictionMaxPodGracePeriod: 120
evictionPressureTransitionPeriod: 30s
evictionSoft:
    memory.available: 10%
evictionSoftGracePeriod:
    memory.available: 2m
featureGates:
    RotateKubeletServerCertificate: true
    TTLAfterFinished: true
kubeReserved:
    cpu: 200m
    memory: 250Mi
maxPods: 110
podPidsLimit: 10000
rotateCertificates: true
systemReserved:
    cpu: 200m
    memory: 250Mi

创建证书文件

kubeadm init phase certs all --config kubeadm-config.yaml
image

创建 kubeconfig

kubeadm init phase kubeconfig all --config kubeadm-config.yaml
image

创建 manifests

kubeadm init phase control-plane all --config kubeadm-config.yaml
image

启动 kubelet

kubeadm init phase kubelet-start --config kubeadm-config.yaml
image

检查 kube-apiserver 是否运行

docker ps  # or crictl ps
curl -k https://lb.kubesphere.local:6443
image

更新 kubeconfig 并测试 (注意不要 mv,出现该问题基本就是执行 mv 命令导致的)

cp /etc/kubernetes/admin.conf ~/.kube/config
image

更新 kube-public 中 cluster-info 中的证书信息

其它节点连接集群 control-plane 会获取这个 configmap 中的证书信息。

# 将 /etc/kubernetes/pki/ca.crt 进行 base64 编码并 替换 configmap 中内容
cat /etc/kubernetes/pki/ca.crt | base64 -w 0
kubectl edit cm -n kube-public cluster-info
image

创建节点添加 token 及 添加 join 命令

kubeadm init phase upload-certs --upload-certs --config /etc/kubernetes/kubeadm-config.yaml

kubeadm token create --print-join-command --certificate-key <certificate-key, 上述命令生成>
image

将 /etc/kubernetes/pki 目录同步至 其它 master 节点

恢复 control-plane 中其它 master 节点

更新 kubeadm-config.yaml

按照上述步骤中生成的 token 以及 certificateKey 信息完成 kubeadm-config image

生成 kubeconfig

# 使用上述步骤中生成的 join 命令,注意根据实际生成的 join 命令中的信息完善命令,主要是在原始命令 join 后 添加 phase control-plane-prepare kubeconfig
kubeadm join phase control-plane-prepare kubeconfig lb.kubesphere.local:6443 --token jla473.n6jrmgk0gqcxkafe --discovery-token-ca-cert-hash sha256:57b33c003cd9ffc1fe58a55ed9f87362fcb552e6d0d6b80fe3e24303b4bf2e6c --control-plane --certificate-key 88c41cedd863c93a3d33f72e09e5b802c68d8c536151678a921c391da3052a53
image ### 生成 manifests ``` kubeadm join phase control-plane-prepare control-plane --config kubeadm-config.yaml ``` image

启动 kubelet

kubeadm join phase kubelet-start --config kubeadm-config.yaml
image

更新 kubeconfig 并测试 (注意不要 mv,出现该问题基本就是执行 mv 命令导致的)

cp /etc/kubernetes/admin.conf ~/.kube/config
image