课程目标
- 向etcd集群中添加节点
- 从etcd集群中删除节点
- etcd集群的备份与恢复
- etcd集群的升级
1. 环境
IP | HOST |
---|---|
10.0.11.36 | infra0 |
10.0.12.21 | infra1 |
10.0.13.126 | infra2 |
10.0.12.157 | infra3 |
10.0.13.118 | infra4 |
且infra0,infra1,infra2是创建好的etcd集群,方法请看这里
2. 向etcd集群中添加节点
官方文档中不建议我们向生产系统中添加节点,因为添加etcd节点会损失集群的性能,增加节点并不会增加集群的性能或者是容量。一般来说不建议对etcd集群扩/缩容。更不要配置任何的etcd集群自动扩展策略。我们建议在生产系统中部署五个member的etcd集群。过程参考github。
当然,如果有特殊需求,我们也可以部署7个节点,比如两地三中心的架构,三个DC中分别部署2/2/3个etcd集群。
2.1. 添加一个正常工作的etcd节点
一般来说,添加节点有两个步骤:
- 我们可以通过 HTTP members API, gRPC members API,或者是
etcdctl member add
命令。当然我们肯定是需要使用etcdctl命令的 - 使用新的集群配置启动新的节点
- 同步后,使用新的配置重新启动原有节点
注意1:如果原来的etcd使用的是ssl证书在签署的时候,指定了hosts的地址的证书是不能够使用的,比如:
cfssl certinfo -csr server.csr
.
.
"IPAddresses": [
"127.0.0.1",
"10.0.11.36",
"10.0.12.21",
"10.0.13.126"
]
}
这样的证书在添加了新的member之后是无法正常通信的,需要重新签署证书,把新的节点加入到证书中。
注意2:由于签署的证书只用于数据传输,不用于数据加密,所以即使更换CA,重新签署证书,也是没有问题的。但是,如果我们首次启动etcd集群的时候使用的是非加密方式,后面改成SSL方式启动etcd集群的时候就会报错
注意3:json文件无法使用hosts: [""]
这种方式来通配所有地址(理论上可以行,实际操作上报错)
注意4:json文件中可以使用域名来通配一类host,理论上可行,但是没有测试
2.1.1. 环境准备
请参考环境准备和下载和解压
2.1.2. 添加节点
-
检查节点健康状态
$ etcdctl member --cacert="/etc/kubernetes/pki/etcd/ca.pem" --cert="/etc/kubernetes/pki/etcd/members.pem" --key="/etc/kubernetes/pki/etcd/members-key.pem" list 466035bfe5ac7a64, started, infra2, https://10.0.13.126:2380, https://10.0.13.126:2379, false 784e0050552d81cd, started, infra1, https://10.0.12.21:2380, https://10.0.12.21:2379, false f7d6895384ae86d2, started, infra0, https://10.0.11.36:2380, https://10.0.11.36:2379, false
-
在任意一个etcd节点上执行
$ etcdctl member add infra3 --cacert="/etc/kubernetes/pki/etcd/ca.pem" --cert="/etc/kubernetes/pki/etcd/members.pem" --key="/etc/kubernetes/pki/etcd/members-key.pem" --peer-urls=https://10.0.12.157:2380 Member 6a5e9d82c1c1c471 added to cluster f36e4227f36fdc03 ETCD_NAME="infra3" ETCD_INITIAL_CLUSTER="infra2=https://10.0.13.126:2380,infra3=https://10.0.12.157:2380,infra1=https://10.0.12.21:2380,infra0=https://10.0.11.36:2380" ETCD_INITIAL_ADVERTISE_PEER_URLS="https://10.0.12.157:2380" ETCD_INITIAL_CLUSTER_STATE="existing"
-
再查看集群状态
$ etcdctl member --cacert="/etc/kubernetes/pki/etcd/ca.pem" --cert="/etc/kubernetes/pki/etcd/members.pem" --key="/etc/kubernetes/pki/etcd/members-key.pem" list 466035bfe5ac7a64, started, infra2, https://10.0.13.126:2380, https://10.0.13.126:2379, false 6a5e9d82c1c1c471, unstarted, , https://10.0.12.157:2380, , false 784e0050552d81cd, started, infra1, https://10.0.12.21:2380, https://10.0.12.21:2379, false f7d6895384ae86d2, started, infra0, https://10.0.11.36:2380, https://10.0.11.36:2379, false
-
在新节点创建配置文件/etc/etcd/etcd.conf
DATA_DIR=/data/etcd HOST_NAME=infra3 HOST_IP=10.0.12.157 CLUSTER=infra0=https://10.0.11.36:2380,infra1=https://10.0.12.21:2380,infra2=https://10.0.13.126:2380,infra3=https://10.0.12.157:2380 CLUSTER_STATE=existing TOKEN=ea8cfe2bfe85b7e6c66fe190f9225838
-
修改systemd文件/lib/systemd/system/etcd.service
[Unit] Description=Etcd Server After=network.target After=network-online.target Wants=network-online.target [Service] Type=notify WorkingDirectory=/data/etcd EnvironmentFile=-/etc/etcd/etcd.conf User=etcd # set GOMAXPROCS to number of processors ExecStart=/bin/bash -c "GOMAXPROCS=$(nproc) /opt/etcd/etcd \ --data-dir ${DATA_DIR} \ --name ${HOST_NAME} \ --initial-advertise-peer-urls https://${HOST_IP}:2380 \ --listen-peer-urls https://${HOST_IP}:2380 \ --advertise-client-urls https://${HOST_IP}:2379 \ --listen-client-urls https://127.0.0.1:2379,https://${HOST_IP}:2379 \ --listen-metrics-urls=http://127.0.0.1:2381 \ --initial-cluster ${CLUSTER} \ --initial-cluster-state ${CLUSTER_STATE} \ --initial-cluster-token ${TOKEN} \ --client-cert-auth \ --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem \ --cert-file=/etc/kubernetes/pki/etcd/server.pem \ --key-file=/etc/kubernetes/pki/etcd/server-key.pem \ --peer-client-cert-auth \ --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem \ --peer-cert-file=/etc/kubernetes/pki/etcd/members.pem \ --peer-key-file=/etc/kubernetes/pki/etcd/members-key.pem" Restart=on-failure LimitNOFILE=65536 [Install] WantedBy=multi-user.target
-
通过scp方式把etcd的pki同步到新机器上并修改权限
chmod 400 /etc/kubernetes/pki/etcd/* chown -R etcd:adm /etc/kubernetes/pki/etcd/
-
在新节点上启动etcd
systemctl start etcd
如果报错,就吧数据文件
/data/etcd/member
删除再重启-
这时候再list member
$ etcdctl member --cacert="/etc/kubernetes/pki/etcd/ca.pem" --cert="/etc/kubernetes/pki/etcd/members.pem" --key="/etc/kubernetes/pki/etcd/members-key.pem" list 466035bfe5ac7a64, started, infra2, https://10.0.13.126:2380, https://10.0.13.126:2379, false 6a5e9d82c1c1c471, started, infra3, https://10.0.12.157:2380, https://10.0.12.157:2379, false 784e0050552d81cd, started, infra1, https://10.0.12.21:2380, https://10.0.12.21:2379, false f7d6895384ae86d2, started, infra0, https://10.0.11.36:2380, https://10.0.11.36:2379, false
注意:修改每个节点上的/etc/etcd/etcd.conf都需要修改为对应的配置
2.1.3. 添加一个learner etcd节点
从etcd3.4版本之后,etcd知识把节点添加为learner节点(不会参与投票的节点)。这样做是为了让添加新节点更加的安全,较少添加节点过程中集群的宕机时间,我们建议在节点在完全同步数据之前作为learner节点。所以我们增加节点的步骤变成了下面几个
- 我们可以通过 HTTP members API, gRPC members API,或者是
etcdctl member add --learner
命令. - 使用新的集群配置启动新的节点
- 使用新的配置重新启动原有节点
- 使用 gRPC members API,或者是
etcdctl member promote
命令来让learner节点成为有投票权的节点。这个时候etcd集群会检查一下promote的请求来确认这个操作是安全的。他会去确认learner节点确实已经同步了leader数据的节点,然后才让这个节点拥有投票权。
2.1.4. 添加learner etcd节点具体操作如下
-
检查节点健康状态
$ etcdctl member --cacert="/etc/kubernetes/pki/etcd/ca.pem" --cert="/etc/kubernetes/pki/etcd/members.pem" --key="/etc/kubernetes/pki/etcd/members-key.pem" list 466035bfe5ac7a64, started, infra2, https://10.0.13.126:2380, https://10.0.13.126:2379, false 6a5e9d82c1c1c471, started, infra3, https://10.0.12.157:2380, https://10.0.12.157:2379, false 784e0050552d81cd, started, infra1, https://10.0.12.21:2380, https://10.0.12.21:2379, false f7d6895384ae86d2, started, infra0, https://10.0.11.36:2380, https://10.0.11.36:2379, false
-
在任意一个etcd节点上执行
$ etcdctl member add infra4 --cacert="/etc/kubernetes/pki/etcd/ca.pem" --cert="/etc/kubernetes/pki/etcd/members.pem" --key="/etc/kubernetes/pki/etcd/members-key.pem" --peer-urls=https://10.0.13.118:2380 --learner Member d11c0954aee6f056 added to cluster f36e4227f36fdc03 ETCD_NAME="infra4" ETCD_INITIAL_CLUSTER="infra2=https://10.0.13.126:2380,infra3=https://10.0.12.157:2380,infra1=https://10.0.12.21:2380,infra4=https://10.0.13.118:2380,infra0=https://10.0.11.36:2380" ETCD_INITIAL_ADVERTISE_PEER_URLS="https://10.0.13.118:2380" ETCD_INITIAL_CLUSTER_STATE="existing"
-
再查看集群状态
etcdctl member --cacert="/etc/kubernetes/pki/etcd/ca.pem" --cert="/etc/kubernetes/pki/etcd/members.pem" --key="/etc/kubernetes/pki/etcd/members-key.pem" list 249badc16d370eb0, unstarted, , https://10.0.13.118:2380, , true 466035bfe5ac7a64, started, infra2, https://10.0.13.126:2380, https://10.0.13.126:2379, false 6a5e9d82c1c1c471, started, infra3, https://10.0.12.157:2380, https://10.0.12.157:2379, false 784e0050552d81cd, started, infra1, https://10.0.12.21:2380, https://10.0.12.21:2379, false f7d6895384ae86d2, started, infra0, https://10.0.11.36:2380, https://10.0.11.36:2379, false
-
在新节点创建配置文件/etc/etcd/etcd.conf
DATA_DIR=/data/etcd HOST_NAME=infra4 HOST_IP=10.0.13.118 CLUSTER=infra0=https://10.0.11.36:2380,infra1=https://10.0.12.21:2380,infra2=https://10.0.13.126:2380,infra3=https://10.0.12.157:2380,infra4=https://10.0.13.118:2380 CLUSTER_STATE=existing TOKEN=ea8cfe2bfe85b7e6c66fe190f9225838
-
修改systemd文件/lib/systemd/system/etcd.service
[Unit] Description=Etcd Server After=network.target After=network-online.target Wants=network-online.target [Service] Type=notify WorkingDirectory=/data/etcd EnvironmentFile=-/etc/etcd/etcd.conf User=etcd # set GOMAXPROCS to number of processors ExecStart=/bin/bash -c "GOMAXPROCS=$(nproc) /opt/etcd/etcd \ --data-dir ${DATA_DIR} \ --name ${HOST_NAME} \ --initial-advertise-peer-urls https://${HOST_IP}:2380 \ --listen-peer-urls https://${HOST_IP}:2380 \ --advertise-client-urls https://${HOST_IP}:2379 \ --listen-client-urls https://127.0.0.1:2379,https://${HOST_IP}:2379 \ --listen-metrics-urls=http://127.0.0.1:2381 \ --initial-cluster ${CLUSTER} \ --initial-cluster-state ${CLUSTER_STATE} \ --initial-cluster-token ${TOKEN} \ --client-cert-auth \ --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem \ --cert-file=/etc/kubernetes/pki/etcd/server.pem \ --key-file=/etc/kubernetes/pki/etcd/server-key.pem \ --peer-client-cert-auth \ --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem \ --peer-cert-file=/etc/kubernetes/pki/etcd/members.pem \ --peer-key-file=/etc/kubernetes/pki/etcd/members-key.pem" Restart=on-failure LimitNOFILE=65536 [Install] WantedBy=multi-user.target
-
通过scp方式把etcd的pki同步到新机器上并修改权限
$ chmod 400 /etc/kubernetes/pki/etcd/* $ chown -R etcd:adm /etc/kubernetes/pki/etcd/
-
启动etcd
$ systemctl start etcd
-
再查看集群状态,发现infra4的learner是true
$ etcdctl member --cacert="/etc/kubernetes/pki/etcd/ca.pem" --cert="/etc/kubernetes/pki/etcd/members.pem" --key="/etc/kubernetes/pki/etcd/members-key.pem" list 12d20db544264be2, started, infra4, https://10.0.13.118:2380, https://10.0.13.118:2379, true 466035bfe5ac7a64, started, infra2, https://10.0.13.126:2380, https://10.0.13.126:2379, false 6a5e9d82c1c1c471, started, infra3, https://10.0.12.157:2380, https://10.0.12.157:2379, false 784e0050552d81cd, started, infra1, https://10.0.12.21:2380, https://10.0.12.21:2379, false f7d6895384ae86d2, started, infra0, https://10.0.11.36:2380, https://10.0.11.36:2379, false
-
查看集群状态,集群状态的显示结果为
The items in the lists are endpoint, ID, version, db size, is leader, is learner, raft term, raft index, raft applied index, errors.
$ etcdctl endpoint status --cacert="/etc/kubernetes/pki/etcd/ca.pem" --cert="/etc/kubernetes/pki/etcd/members.pem" --key="/etc/kubernetes/pki/etcd/members-key.pem" --endpoints=https://127.0.0.1:2379 --cluster https://10.0.13.118:2379, 12d20db544264be2, 3.4.9, 16 kB, false, true, 4232, 26, 26, https://10.0.13.126:2379, 466035bfe5ac7a64, 3.4.9, 20 kB, false, false, 4232, 26, 26, https://10.0.12.157:2379, 6a5e9d82c1c1c471, 3.4.9, 20 kB, false, false, 4232, 26, 26, https://10.0.12.21:2379, 784e0050552d81cd, 3.4.9, 20 kB, false, false, 4232, 26, 26, https://10.0.11.36:2379, f7d6895384ae86d2, 3.4.9, 20 kB, true, false, 4232, 26, 26,
-
当信息同步之后,我们就可以开始promote了
$ etcdctl member promote 12d20db544264be2 --cacert="/etc/kubernetes/pki/etcd/ca.pem" --cert="/etc/kubernetes/pki/etcd/members.pem" --key="/etc/kubernetes/pki/etcd/members-key.pem" Member 12d20db544264be2 promoted in cluster f36e4227f36fdc03
-
再查看集群状态
$ etcdctl member --cacert="/etc/kubernetes/pki/etcd/ca.pem" --cert="/etc/kubernetes/pki/etcd/members.pem" --key="/etc/kubernetes/pki/etcd/members-key.pem" list 12d20db544264be2, started, infra4, https://10.0.13.118:2380, https://10.0.13.118:2379, false 466035bfe5ac7a64, started, infra2, https://10.0.13.126:2380, https://10.0.13.126:2379, false 6a5e9d82c1c1c471, started, infra3, https://10.0.12.157:2380, https://10.0.12.157:2379, false 784e0050552d81cd, started, infra1, https://10.0.12.21:2380, https://10.0.12.21:2379, false f7d6895384ae86d2, started, infra0, https://10.0.11.36:2380, https://10.0.11.36:2379, false
注意:修改每个节点上的/etc/etcd/etcd.conf都需要修改为对应的配置
2.2. 如果添加节点出现问题
我们这里举例是同步出现问题,比如:
$ etcd --name infra3 \
--initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380 \
--initial-cluster-state existing
etcdserver: assign ids error: the member count is unequal
exit 1
我们需要清空数据文件,再更换一下节点的信息(节点名称和IP)
$ etcd --name infra4 \
--initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380,infra4=http://10.0.1.14:2380 \
--initial-cluster-state existing
etcdserver: assign ids error: unmatched member while checking PeerURLs
exit 1
2.3. 如果添加learner节点出现问题
- 集群中多于1个learner会报错
$ etcdctl member add infra4 --peer-urls=http://10.0.1.14:2380 --learner
Error: etcdserver: too many learner members in cluster
- 数据不同步的情况下会报错
$ etcdctl member promote 9bf1b35fc7761a23
Error: etcdserver: can only promote a learner member which is in sync with leader
- 提升一个不是learner的节点会报错
$ etcdctl member promote 9bf1b35fc7761a23
Error: etcdserver: can only promote a learner member
- 提升一个不存在的learner节点会报错
$ etcdctl member promote 12345abcde
Error: etcdserver: member not found
3. 移除节点
-
查看集群状态
$ etcdctl member --cacert="/etc/kubernetes/pki/etcd/ca.pem" --cert="/etc/kubernetes/pki/etcd/members.pem" --key="/etc/kubernetes/pki/etcd/members-key.pem" list 12d20db544264be2, started, infra4, https://10.0.13.118:2380, https://10.0.13.118:2379, false 466035bfe5ac7a64, started, infra2, https://10.0.13.126:2380, https://10.0.13.126:2379, false 6a5e9d82c1c1c471, started, infra3, https://10.0.12.157:2380, https://10.0.12.157:2379, false 784e0050552d81cd, started, infra1, https://10.0.12.21:2380, https://10.0.12.21:2379, false f7d6895384ae86d2, started, infra0, https://10.0.11.36:2380, https://10.0.11.36:2379, false
-
从集群中删除节点infra4
$ etcdctl member remove 12d20db544264be2 --cacert="/etc/kubernetes/pki/etcd/ca.pem" --cert="/etc/kubernetes/pki/etcd/members.pem" --key="/etc/kubernetes/pki/etcd/members-key.pem" Member 12d20db544264be2 removed from cluster f36e4227f36fdc03
-
查看状态
$ etcdctl member --cacert="/etc/kubernetes/pki/etcd/ca.pem" --cert="/etc/kubernetes/pki/etcd/members.pem" --key="/etc/kubernetes/pki/etcd/members-key.pem" list 466035bfe5ac7a64, started, infra2, https://10.0.13.126:2380, https://10.0.13.126:2379, false 6a5e9d82c1c1c471, started, infra3, https://10.0.12.157:2380, https://10.0.12.157:2379, false 784e0050552d81cd, started, infra1, https://10.0.12.21:2380, https://10.0.12.21:2379, false f7d6895384ae86d2, started, infra0, https://10.0.11.36:2380, https://10.0.11.36:2379, false
注意:修改每个节点上的/etc/etcd/etcd.conf都需要修改为对应的配置
为了方便大家学习,请大家加我的微信,我会把大家加到微信群(微信群的二维码会经常变)和qq群821119334,问题答案云原生技术课堂,有问题可以一起讨论
-
个人微信
640.jpeg -
腾讯课堂
640-20200506145837072.jpeg -
微信公众号
640-20200506145842007.jpeg 专题讲座
2020 CKA考试视频 真题讲解 https://www.bilibili.com/video/BV167411K7hp
2020 CKA考试指南 https://www.bilibili.com/video/BV1sa4y1479B/
2020年 5月CKA考试真题 https://mp.weixin.qq.com/s/W9V4cpYeBhodol6AYtbxIA