之前的文章对k8s demo集群进行了CICD,以及监控配置,对于日常运维来说,对k8s集群的控制数据的备份也是必不可少的,所以本文对这个k8s demo集群的etcd进行了备份与恢复的测试演练。
在/etc/profile增加如下配置
export ETCDCTL_API=3
alias etcdctl='etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key '
查看etcd的状态数据
[root@VM-12-8-centos ~]# source /etc/profie
#查看状态
[root@VM-12-8-centos ~]# etcdctl endpoint status -w table
+--------------------------+------------------+---------+---------+-----------+-----------+------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+--------------------------+------------------+---------+---------+-----------+-----------+------------+
| https://[127.0.0.1]:2379 | fc2c7cf28a83253f | 3.3.10 | 5.8 MB | true | 5 | 22421398 |
+--------------------------+------------------+---------+---------+-----------+-----------+------------+
#健康检查
[root@VM-12-8-centos ~]# etcdctl endpoint health -w table
https://[127.0.0.1]:2379 is healthy: successfully committed proposal: took = 1.073883ms
#etcd集群成员列表
[root@VM-12-8-centos ~]# etcdctl member list -w table
+------------------+---------+----------------+------------------------+------------------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS |
+------------------+---------+----------------+------------------------+------------------------+
| fc2c7cf28a83253f | started | vm-12-8-centos | https://10.0.12.8:2380 | https://10.0.12.8:2379 |
+------------------+---------+----------------+------------------------+------------------------+
查看etcd的相关配置文件,查看etcd的数据目录
[root@VM-12-8-centos manifests]# pwd
/etc/kubernetes/manifests
[root@VM-12-8-centos manifests]# cat etcd.yaml
......
- --data-dir=/var/lib/etcd
......
检查etcd的数据备份目录
[root@VM-12-8-centos etcd]# pwd
/var/lib/etcd
[root@VM-12-8-centos etcd]# tree member
member
├── snap
│ ├── 0000000000000005-0000000001555f8e.snap
│ ├── 0000000000000005-000000000155869f.snap
│ ├── 0000000000000005-000000000155adb0.snap
│ ├── 0000000000000005-000000000155d4c1.snap
│ ├── 0000000000000005-000000000155fbd2.snap
│ └── db
└── wal
├── 000000000000010e-00000000015074f9.wal
├── 000000000000010f-000000000151acf4.wal
├── 0000000000000110-000000000152e4f4.wal
├── 0000000000000111-0000000001541cef.wal
├── 0000000000000112-00000000015554ef.wal
└── 0.tmp
建立一个etcd数据备份的目录/data/etcd_bak,再进行备份
[root@VM-12-8-centos etcd_bak]# etcdctl snapshot save /data/etcd_bak/etcd-snapshot-`date +%Y%m%d`.db
Snapshot saved at /data/etcd_bak/etcd-snapshot-20230401.db
[root@VM-12-8-centos etcd_bak]# ll
total 5656
-rw-r--r-- 1 root root 5787680 Apr 1 15:35 etcd-snapshot-20230401.db
以上备份命令可以放在shell脚本中,配置crontab,每天凌晨的时候定时备份
将/var/lib/etcd/member改名,模拟etcd数据丢失
mv /var/lib/etcd/member /var/lib/etcd/member.bk
检查确认
[root@VM-12-8-centos etcd]# etcdctl member list -w table
Error: context deadline exceeded
[root@VM-12-8-centos etcd]# etcdctl endpoint health -w table
127.0.0.1:2379 is unhealthy: failed to commit proposal: context deadline exceeded
Error: unhealthy cluster
[root@VM-12-8-centos etcd]# kubectl get po
No resources found.
可以看到将member目录改名后,很快etcdctl查看etcd状态,用kubectl查看集群已经不正常了,获取不到相关数据
需要注意的是备份恢复需要遵守以下的顺序
停止kube-apiserver --> 停止ETCD --> 恢复数据 --> 启动ETCD --> 启动kube-apiserve
由于我的k8s集群是采用kubeadm的方式安装,etcd和apiserver不属于系统服务,所以重启这2个进程采取静态pod重启的方式,即在/etc/kubernetes/manifests目录里移走apiserver,etcd的yml文件,再kill掉apiserver和etcd,待做了etcd数据恢复后,再分别移入etcd的yml文件,和apiserver的yml文件的方式来分别重启这2个进程
etcd备份数据恢复
[root@VM-12-8-centos etcd_bak]# etcdctl snapshot restore etcd-snapshot-20230401.db --data-dir=/var/lib/etcd
2023-04-01 16:04:03.430332 I | mvcc: restore compact to 19663311
2023-04-01 16:04:03.441067 I | etcdserver/membership: added member 8e9e05c52164694d [http://localhost:2380] to cluster cdf818194e3a8c32
启动etcd和apiserver
[root@VM-12-8-centos kubernetes]# mv etcd.yaml manifests/
[root@VM-12-8-centos etcd_bak]# p etcd
root 19873 19853 1 16:04 ? 00:00:00 etcd --advertise-client-urls=https://10.0.12.8:2379 --cert-file=/etc/kubernetes/pki/etcd/server.crt --client-cert-auth=true --data-dir=/var/lib/etcd --initial-advertise-peer-urls=https://10.0.12.8:2380 --initial-cluster=vm-12-8-centos=https://10.0.12.8:2380 --key-file=/etc/kubernetes/pki/etcd/server.key --listen-client-urls=https://127.0.0.1:2379,https://10.0.12.8:2379 --listen-peer-urls=https://10.0.12.8:2380 --name=vm-12-8-centos --peer-cert-file=/etc/kubernetes/pki/etc/peer.crt --peer-client-cert-auth=true --peer-key-file=/etc/kubernetes/pki/etcd/peer.key --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt --snapshot-count=10000 --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
[root@VM-12-8-centos kubernetes]# mv kube-apiserver.yaml manifests/
[root@VM-12-8-centos etcd_bak]# p apiserver
root 20878 20858 99 16:05 ? 00:00:03 kube-apiserver --advertise-address=10.0.12.8 --allow-privileged=true --authorization-mode=Node,RBAC --client-ca-file=/etc/kubernetes/pki/ca.crt --enable-admission-plugins=NodeRestriction --enable-bootstrap-token-auth=true --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key --etcd-servers=https://127.0.0.1:2379 --insecure-port=0 --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key --requestheader-allowed-names=front-proxy-client --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --secure-port=6443 --service-account-key-file=/etc/kubernetes/pki/sa.pub --service-cluster-ip-range=10.1.0.0/16 --tls-cert-file=/etc/kubernetes/pki/apiserver.crt --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
[root@VM-12-8-centos etcd_bak]# etcdctl member list -w table
+------------------+---------+----------------+-----------------------+------------------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS |
+------------------+---------+----------------+-----------------------+------------------------+
| 8e9e05c52164694d | started | vm-12-8-centos | http://localhost:2380 | https://10.0.12.8:2379 |
+------------------+---------+----------------+-----------------------+------------------------+
[root@VM-12-8-centos etcd_bak]# etcdctl endpoint health -w table
https://[127.0.0.1]:2379 is healthy: successfully committed proposal: took = 737.506µs
[root@VM-12-8-centos etcd_bak]# etcdctl endpoint status -w table
+--------------------------+------------------+---------+---------+-----------+-----------+------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+--------------------------+------------------+---------+---------+-----------+-----------+------------+
| https://[127.0.0.1]:2379 | 8e9e05c52164694d | 3.3.10 | 5.8 MB | true | 2 | 611 |
+--------------------------+------------------+---------+---------+-----------+-----------+------------+
[root@VM-12-8-centos etcd_bak]# kubectl get ns
NAME STATUS AGE
default Active 156d
kube-node-lease Active 156d
kube-public Active 156d
kube-system Active 156d
kube-users Active 142d
monitoring Active 138d
[root@VM-12-8-centos etcd_bak]# kubectl get po -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-bccdc95cf-kwgjx 1/1 Running 0 156d
coredns-bccdc95cf-s52jg 1/1 Running 0 156d
etcd-vm-12-8-centos 1/1 Running 0 156d
kube-apiserver-vm-12-8-centos 1/1 Running 0 151d
kube-controller-manager-vm-12-8-centos 1/1 Running 7 137d