etcd集群节点挂掉后恢复步骤

背景

实验环境k8s集群,一不小心,使用kvm 快照功能把 etcd集群的一台node的vm给快照回去了,结果可想而知了。。。
下面是我恢复这个etcd挂掉节点步骤

重新部署etcd步骤就不说了

恢复集群

# 删除集群里面挂掉节点信息
export ETCDCTL_API=3
# 10.0.249.162:2379是正常一个node节点
etcdctl --endpoints=10.0.249.162:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt  --key=/etc/kubernetes/pki/etcd/healthcheck-client.key member list
# 找到挂掉节点的id
etcdctl --endpoints=10.0.249.162:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt  --key=/etc/kubernetes/pki/etcd/healthcheck-client.key member remove b78f844a88f77df
# 删除挂掉节点上残留的数据
rm -rf /var/lib/etcd/*
# 启动服务
systemctl enable --now etcd
# 发现启动貌似没有问题,但是查看集群信息缺出现
etcdctl --endpoints=10.0.249.162:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt  --key=/etc/kubernetes/pki/etcd/healthcheck-client.key member list
457b7335683cd9fd, started, k8s-m3-163, https://10.0.249.163:2380, https://10.0.249.163:2379
5a2061af7b8d7465, started, k8s-m2-162, https://10.0.249.162:2380, https://10.0.249.162:2379
678f844a88f77df9, unstarted, , https://10.0.249.161:2380,
# 把etcd启动配置文件里面的new改成existing后再启动,就正常了

你可能感兴趣的:(linux系统)