获取token及token证书
root@master0:~# kubeadm token create --print-join-command
kubeadm join 10.238.67.100:6443 --token tzmx95.jqu8kp4olv41cwps --discovery-token-ca-cert-hash sha256:3f1090762f1e92da82843285d17ff24720632174c5b058c5f114d346e4ebdb6f
获取control-plane证书
root@master0:~# kubeadm init phase upload-certs --upload-certs
I0812 16:22:02.105583 14476 version.go:254] remote version is much newer: v1.24.3; falling back to: stable-1.20
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
dab8068ca363d20a4ecdd1844aa253fa18642704de587165a21c5f34a12f9eb9
在新的master节点执行命令
kubeadm join 10.238.67.100:6443 --token tzmx95.jqu8kp4olv41cwps \
--discovery-token-ca-cert-hash sha256:3f1090762f1e92da82843285d17ff24720632174c5b058c5f114d346e4ebdb6f \
--control-plane --certificate-key dab8068ca363d20a4ecdd1844aa253fa18642704de587165a21c5f34a12f9eb9
方法一:
获取master的join token
kubeadm token create --print-join-command --ttl=0 (–ttl=0代表token永不过期,不加此参数默认24小时过期)
执行完成后,会自动生成以下命令
root@k8s-master1:~# kubeadm token create --print-join-command --ttl=0
kubeadm join 10.238.67.100:6443 --token 00n4f5.cwr1dpzx7ojorsmu --discovery-token-ca-cert-hash sha256:3f1090762f1e92da82843285d17ff24720632174c5b058c5f114d346e4ebdb6f
在node节点操作:
root@k8s-node1:~# kubeadm join 10.238.67.100:6443 --token 00n4f5.cwr1dpzx7ojorsmu --discovery-token-ca-cert-hash sha256:3f1090762f1e92da82843285d17ff24720632174c5b058c5f114d346e4ebdb6f
方法二:
在master节点操作
root@master0:~# kubeadm token create
rsv0a5.pba45ov0kdg34qth #生成的token
再执行:
root@master0:~# openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
43e1fe29b90eab70840d47a98f9f07dd4efb3b002df16ccfc7e252777d4104cb #生成的token-ca-cert-hash
在node节点操作:
root@k8s-node1:~# kubeadm join 10.238.67.100:6443 --token rsv0a5.pba45ov0kdg34qth --discovery-token-ca-cert-hash sha256:43e1fe29b90eab70840d47a98f9f07dd4efb3b002df16ccfc7e252777d4104cb
我是之前的一个主节点加入过集群再次加入会出现这个问题,注
错误样例
[check-etcd] Checking that the etcd cluster is healthy
error execution phase check-etcd: etcd cluster is not healthy: failed to dial endpoint https://10.238.67.101:2379 with maintenance client: context deadline exceeded
To see the stack trace of this error execute with --v=5 or higher
根据关键信息 “error execution phase check-etcd” 可知,可能是在执行加入 etcd 时候出现的错误,导致 master 无法加入原先的 kubernetes 集群
分析问题
因为集群是通过 kubeadm 工具搭建的,且使用了 etcd 镜像方式与 master 节点一起,所以每个 Master 节点上都会存在一个 etcd 容器实例。当剔除一个 master 节点时 etcd 集群未删除剔除的节点的 etcd 成员信息,该信息还存在 etcd 集群列表中。
所以,我们需要 进入 etcd 手动删除 etcd 成员信息
解决问题
1、获取 Etcd 镜像列表
[root@master2 ~]# kubectl get pods -n kube-system | grep etcd
etcd-master1 1/1 Running 19 375d
etcd-master2 1/1 Running 20 375d
2、进入 Etcd 容器并删除节点信息
选择上面两个 etcd 中任意一个 pod,通过 kubectl 工具进入 pod 内部
[root@k8s-master01 ~]# kubectl exec -it -n kube-system etcd-master1 sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
sh-5.0#
进入容器后,按下面步执行
## 配置环境
# export ETCDCTL_API=3
# alias etcdctl='etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key'
## 查看 etcd 集群成员列表
# etcdctl member list
16ef717fe7811db9, started, master1, https://10.238.67.102:2380, https://10.238.67.102:2379, false
de36a6ad7b39ca1a, started, master2, https://10.238.67.103:2380, https://10.238.67.103:2379, false
ec799f30ae579d1d, started, master0, https://10.238.67.101:2380, https://10.238.67.101:2379, false
## 删除 etcd 集群成员 k8s-master02
# etcdctl member remove ec799f30ae579d1d
Member ec799f30ae579d1d removed from cluster 9ace53a6b5b20c7b
## 退出容器
# exit
3、再次尝试加入集群
通过 kubeadm 命令再次尝试将 k8s-master02 节点加入集群,在执行前首先进入到 k8s-master02 节点服务器,执行 kubeadm 的清除命令:
kubeadm reset
然后再尝试加入 kubernetes 集群,就ok了