etcd报错记录

问题描述:
早上master所在的物理节点主机故障,导致虚拟机漂移,导致etcd应用异常 容器异常如下

[root@region-master2 ~]# kubectl get po -nkube-system -owide |grep 32.45
etcd-region-master1                      0/1     CrashLoopBackOff   69         4m35s    10.39.32.45       region-master1                  
kube-apiserver-region-master1            0/1     CrashLoopBackOff   56         4m24s    10.39.32.45       region-master1                  

查看etcd的报错日志如图:


image.png

解决办法

etcd增加节点和剔除节点
剔除节点(剔除有问题的节点,让其重新加入集群同步数据)(举例要剔除的对象是https://192.168.1.73:2379)
member list打印出所有节点的节点ID
$ ETCDCTL_API=3 etcdctl --endpoints=https://192.168.1.71:2379,https://192.168.1.72:2379,https://192.168.1.73:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key member list -w table
 
member remove 对应的节点ID
$ ETCDCTL_API=3 etcdctl --endpoints=https://192.168.1.71:2379,https://192.168.1.72:2379,https://192.168.1.73:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key member remove f926bd1d34241ce0
Member f926bd1d34241ce0 removed from cluster 6294eac8c3e80ca
 
此时运行member list对应的节点会在集群消失,被剔除节点etcd进程会退出
 
添加节点
清空etcd数据目录(etcd异常主机上操作)
$ rm -rf /var/lib/etcd/*
 
确认/etc/kubernetes/manifests/etcd.yaml中spec.containers.command里的3个参数
1. --initial-cluster-state=existing
2. --initial-cluster的值是否是全集群
3. --name成 员名

# 注意不要将etcd.yaml 备份到 /etc/kubernetes/manifests/这个目录,不然会有2个etcd ,kubectl 启动是会加载这个目录下所有配置文件
 
修改好后在正常运行etcd的节点执行以下命令,endpoints只填写当前集群现有的节点,member add后面添加的是--name的ming c
$ ETCDCTL_API=3 etcdctl --endpoints=https://192.168.1.71:2379,https://192.168.1.72:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key member add 192.168.1.73 --peer-urls=https://192.168.1.73:2380
Member 2226f8cff2cbbfa9 added to cluster 6294eac8c3e80ca
 
ETCD_NAME="192.168.1.73"
ETCD_INITIAL_CLUSTER="192.168.1.71=https://192.168.1.71:2380,192.168.1.73=https://192.168.1.73:2380,192.168.1.72=https://192.168.1.72:2380"
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://192.168.1.73:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"
 
重启kubelet,让其重新拉起etcd
$ systemctl restart kubelet

你可能感兴趣的:(etcd报错记录)