官方文档:为 Kubernetes 运行 etcd | 备份 etcd 集群
etcd 是兼具一致性和高可用性的键值数据库,可以作为保存 Kubernetes 所有集群数据的后台数据库。
运行的 etcd 集群个数成员为奇数。
etcd 是一个 leader-based 分布式系统。确保主节点定期向所有从节点发送心跳,以保持集群稳定。
确保不发生资源不足。
集群的性能和稳定性对网络和磁盘 I/O 非常敏感。任何资源匮乏都会导致心跳超时, 从而导致集群的不稳定。不稳定的情况表明没有选出任何主节点。 在这种情况下,集群不能对其当前状态进行任何更改,这意味着不能调度新的 pod。
所有 Kubernetes 对象都存储在 etcd 上。定期备份 etcd 集群数据对于在灾难场景(例如丢失所有控制平面节点)下恢复 Kubernetes 集群非常重要。 快照文件包含所有 Kubernetes 状态和关键信息。为了保证敏感的 Kubernetes 数据的安全,可以对快照文件进行加密。
备份 etcd 集群可以通过两种方式完成:etcd 内置快照 和 卷快照。
[root@k8s1 ~]# kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-bdc44d9f-frm2b 1/1 Running 0 3m8s
coredns-bdc44d9f-xhlbg 1/1 Running 0 3m8s
etcd-k8s1 1/1 Running 1 3m28s
kube-apiserver-k8s1 1/1 Running 1 3m28s
kube-controller-manager-k8s1 1/1 Running 2 3m26s
kube-flannel-ds-59ndm 1/1 Running 0 2m42s
kube-flannel-ds-jmtfd 1/1 Running 0 98s
kube-flannel-ds-njlz7 1/1 Running 0 83s
kube-proxy-9qb29 1/1 Running 0 3m8s
kube-proxy-cl5w6 1/1 Running 0 83s
kube-proxy-t5mlj 1/1 Running 0 98s
kube-scheduler-k8s1 1/1 Running 2 3m28s
[root@k8s1 ~]# yum install -y etcd
[root@k8s1 ~]# ETCDCTL_API=3 etcdctl --endpoints 127.0.0.1:2379 --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key --cacert=/etc/kubernetes/pki/etcd/ca.crt member list
22f19206c985d9c7, started, k8s1, https://172.25.21.1:2380, https://172.25.21.1:2379
[root@k8s1 ~]# ETCDCTL_API=3 etcdctl --endpoints 127.0.0.1:2379 --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key --cacert=/etc/kubernetes/pki/etcd/ca.crt member list -w table
+------------------+---------+------+--------------------------+--------------------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS |
+------------------+---------+------+--------------------------+--------------------------+
| 22f19206c985d9c7 | started | k8s1 | https://172.25.21.1:2380 | https://172.25.21.1:2379 |
+------------------+---------+------+--------------------------+--------------------------+
[root@k8s1 ~]# ETCDCTL_API=3 etcdctl --endpoints 127.0.0.1:2379 --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key --cacert=/etc/kubernetes/pki/etcd/ca.crt endpoint status -w table
+----------------+------------------+---------+---------+-----------+-----------+------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+----------------+------------------+---------+---------+-----------+-----------+------------+
| 127.0.0.1:2379 | 22f19206c985d9c7 | 3.5.0 | 2.6 MB | true | 2 | 1553 |
+----------------+------------------+---------+---------+-----------+-----------+------------+
[root@k8s1 ~]# ETCDCTL_API=3 etcdctl --endpoints 127.0.0.1:2379 --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key --cacert=/etc/kubernetes/pki/etcd/ca.crt get / --prefix
[root@k8s1 ~]# kubectl run demo --image=nginx
pod/demo created
[root@k8s1 ~]# kubectl get pod
NAME READY STATUS RESTARTS AGE
demo 1/1 Running 0 59s
[root@k8s1 ~]# ETCDCTL_API=3 etcdctl --endpoints 127.0.0.1:2379 --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key --cacert=/etc/kubernetes/pki/etcd/ca.crt get /registry/pods --prefix --keys-only
[root@k8s1 ~]# ETCDCTL_API=3 etcdctl --endpoints 127.0.0.1:2379 --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key --cacert=/etc/kubernetes/pki/etcd/ca.crt snapshot save /tmp/backup/`hostname`-etcd-`date +%Y%m%d%H%M`.db
Snapshot saved at /tmp/backup/k8s1-etcd-202204232220.db
在做备份和恢复的时候,出现了问题。
问题是当将备份的数据 etcd 恢复后,发现 etcd 服务是起不来的,
分析后,认为原因可能是出在了 kubeadm 集群的版本和 etcd 的 3.3 版本不合适。
因此,我将之前版本为 1.22.2 的 kubeadm 集群又进行了一次升级,版本为 1.23.1
etcd 的版本是3.3,kubeadm 集群的版本应该在 1.23.1以上
[root@k8s1 ~]# yum list etcd
Loaded plugins: product-id, search-disabled-repos, subscription-manager
This system is not registered with an entitlement server. You can use subscription-manager to register.
Installed Packages
etcd.x86_64 3.3.11-2.el7.centos @extras
[root@k8s1 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s1 Ready control-plane,master 2d v1.22.2
k8s2 Ready <none> 2d v1.22.2
k8s3 Ready <none> 2d v1.22.2
[root@k8s4 ~]# docker-compose start
ERROR:
Can't find a suitable configuration file in this directory or any
parent. Are you in the right directory?
Supported filenames: docker-compose.yml, docker-compose.yaml
[root@k8s4 ~]# cd harbor/
[root@k8s4 harbor]# ls
common common.sh docker-compose.yml harbor.v1.10.1.tar.gz harbor.yml install.sh LICENSE prepare
[root@k8s4 harbor]# docker-compose start
Starting log ... done
Starting registry ... done
Starting registryctl ... done
Starting postgresql ... done
Starting portal ... done
Starting redis ... done
Starting core ... done
Starting jobservice ... done
Starting proxy ... done
docker login
[root@k8s4 ~]# docker pull registry.aliyuncs.com/google_containers/kubeadm:v1.23.1
Error response from daemon: pull access denied for registry.aliyuncs.com/google_containers/kubeadm, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
[root@k8s4 ~]# docker login reg.westos.org
Authenticating with existing credentials...
WARNING! Your password will be stored unencrypted in /root/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store
Login Succeeded
[root@k8s4 ~]# docker pull registry.aliyuncs.com/google_containers/kube-apiserver:v1.23.1
[root@k8s4 ~]# docker pull registry.aliyuncs.com/google_containers/kube-controller-manager:v1.23.1
[root@k8s4 ~]# docker pull registry.aliyuncs.com/google_containers/kube-scheduler:v1.23.1
[root@k8s4 ~]# docker pull registry.aliyuncs.com/google_containers/kube-proxy:v1.23.1
[root@k8s4 ~]# docker tag registry.aliyuncs.com/google_containers/kube-apiserver:v1.23.1 reg.westos.org/k8s/kube-apiserver:v1.23.1
[root@k8s4 ~]# docker tag registry.aliyuncs.com/google_containers/kube-controller-manager:v1.23.1 reg.westos.org/k8s/kube-controller-manager:v1.23.1
[root@k8s4 ~]# docker tag registry.aliyuncs.com/google_containers/kube-scheduler:v1.23.1 reg.westos.org/k8s/kube-scheduler:v1.23.1
[root@k8s4 ~]# docker tag registry.aliyuncs.com/google_containers/kube-proxy:v1.23.1 reg.westos.org/k8s/kube-proxy:v1.23.1
[root@k8s4 ~]# docker push reg.westos.org/k8s/kube-apiserver:v1.23.1
[root@k8s4 ~]# docker push reg.westos.org/k8s/kube-controller-manager:v1.23.1
[root@k8s4 ~]# docker push reg.westos.org/k8s/kube-scheduler:v1.23.1
[root@k8s4 ~]# docker push reg.westos.org/k8s/kube-proxy:v1.23.1
[root@k8s1 ~]# yum install -y kubeadm-1.23.1-0
[root@k8s1 ~]# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.1", GitCommit:"86ec240af8cbd1b60bcc4c03c20da9b98005b92e", GitTreeState:"clean", BuildDate:"2021-12-16T11:39:51Z", GoVersion:"go1.17.5", Compiler:"gc", Platform:"linux/amd64"}
[root@k8s1 ~]# kubeadm upgrade plan
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks.
[upgrade] Running cluster health checks
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.22.1
[upgrade/versions] kubeadm version: v1.23.1
[upgrade/versions] Target version: v1.23.6
[upgrade/versions] Latest version in the v1.22 series: v1.22.9
Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
COMPONENT CURRENT TARGET
kubelet 3 x v1.22.2 v1.22.9
Upgrade to the latest version in the v1.22 series:
COMPONENT CURRENT TARGET
kube-apiserver v1.22.1 v1.22.9
kube-controller-manager v1.22.1 v1.22.9
kube-scheduler v1.22.1 v1.22.9
kube-proxy v1.22.1 v1.22.9
CoreDNS v1.8.4 v1.8.6
etcd 3.5.0-0 3.5.1-0
You can now apply the upgrade by executing the following command:
kubeadm upgrade apply v1.22.9
_____________________________________________________________________
Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
COMPONENT CURRENT TARGET
kubelet 3 x v1.22.2 v1.23.6
Upgrade to the latest stable version:
COMPONENT CURRENT TARGET
kube-apiserver v1.22.1 v1.23.6
kube-controller-manager v1.22.1 v1.23.6
kube-scheduler v1.22.1 v1.23.6
kube-proxy v1.22.1 v1.23.6
CoreDNS v1.8.4 v1.8.6
etcd 3.5.0-0 3.5.1-0
You can now apply the upgrade by executing the following command:
kubeadm upgrade apply v1.23.6
Note: Before you can perform this upgrade, you have to update kubeadm to v1.23.6.
_____________________________________________________________________
The table below shows the current state of component configs as understood by this version of kubeadm.
Configs that have a "yes" mark in the "MANUAL UPGRADE REQUIRED" column require manual config upgrade or
resetting to kubeadm defaults before a successful upgrade can be performed. The version to manually
upgrade to is denoted in the "PREFERRED VERSION" column.
API GROUP CURRENT VERSION PREFERRED VERSION MANUAL UPGRADE REQUIRED
kubeproxy.config.k8s.io v1alpha1 v1alpha1 no
kubelet.config.k8s.io v1beta1 v1beta1 no
_____________________________________________________________________
[root@k8s1 ~]# kubeadm upgrade apply v1.23.1
......
[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.23.1". Enjoy!
[upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so.
[root@k8s1 ~]# kubectl drain k8s1 --ignore-daemonsets
node/k8s1 already cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/kube-flannel-ds-59ndm, kube-system/kube-proxy-t4zpd
node/k8s1 drained
[root@k8s1 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s1 Ready,SchedulingDisabled control-plane,master 2d v1.22.2
k8s2 Ready <none> 2d v1.22.2
k8s3 Ready <none> 2d v1.22.2
[root@k8s1 ~]# yum install -y kubelet-1.23.1-0 kubectl-1.23.1-0
[root@k8s1 ~]# systemctl daemon-reload
[root@k8s1 ~]# systemctl restart kubelet.service
[root@k8s1 ~]# kubectl uncordon k8s1
node/k8s1 uncordoned
[root@k8s1 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s1 Ready control-plane,master 2d v1.23.1
k8s2 Ready <none> 2d v1.22.2
k8s3 Ready <none> 2d v1.22.2
[root@k8s2 ~]# yum install -y kubeadm-1.23.1-0
[root@k8s2 ~]# kubeadm upgrade node
[upgrade] Reading configuration from the cluster...
[upgrade] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks
[preflight] Skipping prepull. Not a control plane node.
[upgrade] Skipping phase. Not a control plane node.
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[upgrade] The configuration for this node was successfully updated!
[upgrade] Now you should go ahead and upgrade the kubelet package using your package manager.
[root@k8s1 ~]# kubectl drain k8s2 --ignore-daemonsets
node/k8s2 cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/kube-flannel-ds-jmtfd, kube-system/kube-proxy-fhnvc
evicting pod kube-system/coredns-7b56f6bc55-dgh4p
pod/coredns-7b56f6bc55-dgh4p evicted
node/k8s2 drained
[root@k8s2 ~]# yum install -y kubectl-1.23.1-0 kubelet-1.23.1-0
[root@k8s2 ~]# systemctl daemon-reload
[root@k8s2 ~]# systemctl restart kubelet.service
[root@k8s1 ~]# kubectl uncordon k8s2
node/k8s2 uncordoned
[root@k8s1 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s1 Ready control-plane,master 2d v1.23.1
k8s2 Ready <none> 2d v1.23.1
k8s3 Ready <none> 2d v1.23.1
systemctl start docker
,交互式进入 3.5.1-0 版本的 etcd[root@k8s1 ~]# docker run -it --rm reg.westos.org/k8s/etcd:3.5.1-0 sh
sh-5.1# sh-5.1#
[root@k8s1 ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d6c7021afde3 reg.westos.org/k8s/etcd:3.5.1-0 "sh" 42 seconds ago Up 40 seconds 2379-2380/tcp, 4001/tcp, 7001/tcp crazy_goodall
[root@k8s1 ~]# docker cp d6c7021afde3:/usr/local/bin/etcdctl .
[root@k8s1 ~]# ll etcdctl
-r-xr-xr-x 1 root root 17981440 Nov 3 12:14 etcdctl
[root@k8s1 ~]# docker run -it --rm reg.westos.org/k8s/etcd:3.5.1-0 sh
sh-5.1# sh-5.1# ^C
sh-5.1# exit
[root@k8s1 ~]# kubectl run test --image=nginx
pod/test created
[root@k8s1 ~]# kubectl get pod
NAME READY STATUS RESTARTS AGE
test 1/1 Running 0 94s
[root@k8s1 ~]# ETCDCTL_API=3 etcdctl --endpoints 127.0.0.1:2379 --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key --cacert=/etc/kubernetes/pki/etcd/ca.crt snapshot save /tmp/backup/`hostname`-etcd-`date +%Y%m%d%H%M`.db
Snapshot saved at /tmp/backup/k8s1-etcd-202204232220.db
[root@k8s1 ~]# ETCDCTL_API=3 etcdctl snapshot status /tmp/backup/k8s1-etcd-202204232220.db
f99920fc, 9452, 1431, 3.8 MB
[root@k8s1 ~]# kubectl delete --force pod test
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "test" force deleted
[root@k8s1 ~]# cd /etc/kubernetes/
[root@k8s1 kubernetes]# ls
admin.conf controller-manager.conf kubelet.conf manifests pki scheduler.conf tmp
[root@k8s1 kubernetes]# mv manifests/ manifests.bak
[root@k8s1 kubernetes]# ls
admin.conf controller-manager.conf kubelet.conf manifests.bak pki scheduler.conf tmp
[root@k8s1 kubernetes]# crictl ps
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID
00a2c92eab979 a4ca41631cc7a 35 minutes ago Running coredns 0 39c40f3e94642
3dd14316876bf b46c42588d511 42 minutes ago Running kube-proxy 0 d7f3f39d341b0
a353f5362f207 9247abf086779 About an hour ago Running kube-flannel 1 f060899b44344
[root@k8s1 kubernetes]# cd /var/lib/
[root@k8s1 lib]# mv etcd etcd.bak
[root@k8s1 lib]# ETCDCTL_API=3 etcdctl snapshot restore /tmp/backup/k8s1-etcd-202204232220.db --data-dir=/var/lib/etcd
2022-04-23 22:28:22.580862 I | mvcc: restore compact to 8778
2022-04-23 22:28:22.632531 I | etcdserver/membership: added member 8e9e05c52164694d [http://localhost:2380] to cluster cdf818194e3a8c32
[root@k8s1 lib]# cd /etc/kubernetes/
[root@k8s1 kubernetes]# mv manifests.bak manifests
[root@k8s1 kubernetes]# systemctl restart kubelet.service
[root@k8s1 kubernetes]# crictl ps
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID
d73374d629173 b6d7abedde399 30 seconds ago Running kube-apiserver 0 a6ce154f8ed39
9644257cc5b79 71d575efe6283 30 seconds ago Running kube-scheduler 0 69a86c32d82c2
9d2c7fdfae301 25f8c7f3da61c 30 seconds ago Running etcd 0 a156d32b4bfca
89501643a9ba6 f51846a4fd288 30 seconds ago Running kube-controller-manager 0 e2f0acade1880
00a2c92eab979 a4ca41631cc7a 38 minutes ago Running coredns 0 39c40f3e94642
3dd14316876bf b46c42588d511 46 minutes ago Running kube-proxy 0 d7f3f39d341b0
a353f5362f207 9247abf086779 About an hour ago Running kube-flannel 1 f060899b44344
[root@k8s1 kubernetes]# kubectl get pod
NAME READY STATUS RESTARTS AGE
test 1/1 Running 0 19m
tips:
第二个问题 单机恢复
【【【退出ssh的master节点,进入base环境】】】
(就2个命令)
先备份
ETCDCTL 3 save 路径tmp(路径不要手敲,复制网页的,浪费时间)
cd var lib
将etcd.bak 恢复成etcd