由于负责不同模块,转发一下同事的文章,希望帮到需要的人
## 环境准备
192.168.244.11
192.168.244.12
192.168.244.13
##分别安装etcd
yum -y install etcd
##搭建集群
## 在11节点执行:
etcd --name etcd01 --initial-advertise-peer-urls http://192.168.244.11:2380 \
--data-dir /var/lib/etcd/default.etcd \
--listen-peer-urls http://192.168.244.11:2380 \
--listen-client-urls http://192.168.244.11:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://192.168.244.11:2379 \
--initial-cluster-token etcd-cluster \
--initial-cluster etcd01=http://192.168.244.11:2380,etcd02=http://192.168.244.12:2380,etcd03=http://192.168.244.13:2380 \
--initial-cluster-state new >> /tmp/etcd.log 2>&1 &
## 在12节点执行:
etcd --name etcd02 --initial-advertise-peer-urls http://192.168.244.12:2380 \
--data-dir /var/lib/etcd/default.etcd \
--listen-peer-urls http://192.168.244.12:2380 \
--listen-client-urls http://192.168.244.12:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://192.168.244.12:2379 \
--initial-cluster-token etcd-cluster \
--initial-cluster etcd01=http://192.168.244.11:2380,etcd02=http://192.168.244.12:2380,etcd03=http://192.168.244.13:2380 \
--initial-cluster-state new >> /tmp/etcd.log 2>&1 &
## 在13节点执行:
etcd --name etcd03 --initial-advertise-peer-urls http://192.168.244.13:2380 \
--data-dir /var/lib/etcd/default.etcd \
--listen-peer-urls http://192.168.244.13:2380 \
--listen-client-urls http://192.168.244.13:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://192.168.244.13:2379 \
--initial-cluster-token etcd-cluster \
--initial-cluster etcd01=http://192.168.244.11:2380,etcd02=http://192.168.244.12:2380,etcd03=http://192.168.244.13:2380 \
--initial-cluster-state new >> /tmp/etcd.log 2>&1 &
用这种方式启动的 etcd 服务,是使用root账号写数据的,如果要用 etcd 账号,另用 systemctl 服务的方式来启动。
# ETCDCTL_API=2 etcdctl member list
3875d2e31fd10372: name=etcd03 peerURLs=http://192.168.244.13:2380 clientURLs=http://192.168.244.13:2379 isLeader=false
397b52ecac7810c7: name=etcd01 peerURLs=http://192.168.244.11:2380 clientURLs=http://192.168.244.11:2379 isLeader=true
60192088bf0f1cbc: name=etcd02 peerURLs=http://192.168.244.12:2380 clientURLs=http://192.168.244.12:2379 isLeader=false
# ETCDCTL_API=2 etcdctl cluster-health
member 3875d2e31fd10372 is healthy: got healthy result from http://192.168.244.13:2379
member 397b52ecac7810c7 is healthy: got healthy result from http://192.168.244.11:2379
member 60192088bf0f1cbc is healthy: got healthy result from http://192.168.244.12:2379
cluster is healthy
# ETCDCTL_API=3 etcdctl --endpoints 192.168.244.11:2379,192.168.244.12:2379,192.168.244.13:2379 member list
3875d2e31fd10372, started, etcd03, http://192.168.244.13:2380, http://192.168.244.13:2379
397b52ecac7810c7, started, etcd01, http://192.168.244.11:2380, http://192.168.244.11:2379
60192088bf0f1cbc, started, etcd02, http://192.168.244.12:2380, http://192.168.244.12:2379
# ETCDCTL_API=3 etcdctl --endpoints 192.168.244.11:2379,192.168.244.12:2379,192.168.244.13:2379 endpoint status --write-out="table"
+---------------------+------------------+---------+---------+-----------+-----------+------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+---------------------+------------------+---------+---------+-----------+-----------+------------+
| 192.168.244.11:2379 | 397b52ecac7810c7 | 3.3.11 | 20 kB | true | 120 | 13 |
| 192.168.244.12:2379 | 60192088bf0f1cbc | 3.3.11 | 16 kB | false | 120 | 13 |
| 192.168.244.13:2379 | 3875d2e31fd10372 | 3.3.11 | 20 kB | false | 120 | 13 |
+---------------------+------------------+---------+---------+-----------+-----------+------------+
# ETCDCTL_API=3 etcdctl --endpoints 192.168.244.11:2379,192.168.244.12:2379,192.168.244.13:2379 endpoint health --write-out="table"
192.168.244.13:2379 is healthy: successfully committed proposal: took = 7.764781ms
192.168.244.12:2379 is healthy: successfully committed proposal: took = 7.589569ms
192.168.244.11:2379 is healthy: successfully committed proposal: took = 8.199871ms
etcdctl member remove 397b52ecac7810c7
etcdctl member add etcd01 http://192.168.244.11:2380
下面是具体演示:
- 杀掉 etcd01 节点
# ETCDCTL_API=3 etcdctl --endpoints 192.168.244.11:2379,192.168.244.12:2379,192.168.244.13:2379 endpoint status --write-out="table"
Failed to get the status of endpoint 192.168.244.11:2379 (context deadline exceeded)
+---------------------+------------------+---------+---------+-----------+-----------+------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+---------------------+------------------+---------+---------+-----------+-----------+------------+
| 192.168.244.12:2379 | 60192088bf0f1cbc | 3.3.11 | 16 kB | true | 121 | 64 |
| 192.168.244.13:2379 | 3875d2e31fd10372 | 3.3.11 | 20 kB | false | 121 | 64 |
+---------------------+------------------+---------+---------+-----------+-----------+------------+
# ETCDCTL_API=2 etcdctl member list
3875d2e31fd10372: name=etcd03 peerURLs=http://192.168.244.13:2380 clientURLs=http://192.168.244.13:2379 isLeader=false
397b52ecac7810c7: name=etcd01 peerURLs=http://192.168.244.11:2380 clientURLs=http://192.168.244.11:2379 isLeader=false
60192088bf0f1cbc: name=etcd02 peerURLs=http://192.168.244.12:2380 clientURLs=http://192.168.244.12:2379 isLeader=true
删掉 etcd01 节点
#etcdctl member remove 397b52ecac7810c7
Removed member 397b52ecac7810c7 from cluste
添加节点
#etcdctl member add etcd01 http://192.168.244.11:2380
Added member named etcd01 with ID a25253c213bdca83 to cluster
ETCD_NAME="etcd01"
ETCD_INITIAL_CLUSTER="etcd03=http://192.168.244.13:2380,etcd02=http://192.168.244.12:2380,etcd01=http://192.168.244.11:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"
启动 etcd01 节点,注意上面添加节点给出的提示信息
#etcd --name etcd01 --initial-advertise-peer-urls http://192.168.244.11:2380 --data-dir /var/lib/etcd/default.etcd --listen-peer-urls http://192.168.244.11:2380 --listen-client-urls http://192.168.244.11:2379,http://127.0.0.1:2379 --advertise-client-urls http://192.168.244.11:2379 --initial-cluster-token etcd-cluster --initial-cluster etcd01=http://192.168.244.11:2380,etcd02=http://192.168.244.12:2380,etcd03=http://192.168.244.13:2380 --initial-cluster-state existing >> /tmp/etcd.log 2>&1
启动未成功
报出以下错误:
2019-05-10 11:30:07.141141 E | etcdserver: the member has been permanently removed from the cluster
2019-05-10 11:30:07.141195 I | etcdserver: the data-dir used by this member must be removed.
将数据目录删除,然后重新启动
启动成功:
# ETCDCTL_API=2 etcdctl cluster-health
member 3875d2e31fd10372 is healthy: got healthy result from http://192.168.244.13:2379
member 60192088bf0f1cbc is healthy: got healthy result from http://192.168.244.12:2379
member a25253c213bdca83 is healthy: got healthy result from http://192.168.244.11:2379
cluster is healthy
以三个节点,宕掉两个为例
宕掉 etcd01和etcd02两个节点(直接杀掉进程),宕掉超过半数节点后,集群已不可用。
##无法查询集群中内容
# ETCDCTL_API=3 etcdctl get foo
Error: context deadline exceeded
此时将etcd03节点以单节点集群的方式启动:
#etcd --name etcd03 --initial-advertise-peer-urls http://192.168.244.13:2380 \
--data-dir /var/lib/etcd/default.etcd \
--listen-peer-urls http://192.168.244.13:2380 \
--listen-client-urls http://192.168.244.13:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://192.168.244.13:2379 \
--initial-cluster-token etcd-cluster \
--initial-cluster etcd03=http://192.168.244.13:2380 \
--initial-cluster-state=new \
--force-new-cluster >> /tmp/etcd.log 2>&1 &
# ETCDCTL_API=2 etcdctl cluster-health
member 3875d2e31fd10372 is healthy: got healthy result from http://192.168.244.13:2379
cluster is healthy
# ETCDCTL_API=2 etcdctl member list
3875d2e31fd10372: name=etcd03 peerURLs=http://192.168.244.13:2380 clientURLs=http://192.168.244.13:2379 isLeader=true
# ETCDCTL_API=3 etcdctl get foo
foo
1
# ETCDCTL_API=3 etcdctl get name
name
hello world
- 注意这里用到了 --force-new-cluster 参数,这个参数会重置集群ID和集群的所有成员信息。
- 以单节点集群启动后,可以正常提供访问了。
在 etcd03 节点上添加其他节点 :
# etcdctl member add etcd01 http://192.168.244.11:2380
Added member named etcd01 with ID 6dd47e3c8257f639 to cluster
ETCD_NAME="etcd01"
ETCD_INITIAL_CLUSTER="etcd03=http://192.168.244.13:2380,etcd01=http://192.168.244.11:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"
# etcdctl member add etcd02 http://192.168.244.12:2380
client: etcd cluster is unavailable or misconfigured; error #0: client: etcd member http://192.168.244.13:2379 has no leader
一个一个节点的添加,等添加的节点启动成功后,再添加后面的。否则后面的会添加失败,如果继续添加,就相当于集群又出现了宕掉半数以上的情况。
启动添加的 etcd01 节点:
##先删除原数据
# cd /var/lib/etcd
# rm -rf default.etcd/
##启动节点
# etcd --name etcd01 --initial-advertise-peer-urls http://192.168.244.11:2380 --data-dir /var/lib/etc/default.etcd --listen-peer-urls http://192.168.244.11:2380 --listen-client-urls http://192.168.244.11:2379,http://127.0.0.1:2379 --advertise-client-urls http://192.168.244.11:2379 --initial-cluster-token etcd-cluster --initial-cluster etcd01=http://192.168.244.11:2380,etcd03=http://192.168.244.13:2380 --initial-cluster-state existing
然后添加 etcd02 和启动 etcd02 节点。
完成后,集群已正常:
# ETCDCTL_API=2 etcdctl member list
3875d2e31fd10372: name=etcd03 peerURLs=http://192.168.244.13:2380 clientURLs=http://192.168.244.13:2379 isLeader=true
6dd47e3c8257f639: name=etcd01 peerURLs=http://192.168.244.11:2380 clientURLs=http://192.168.244.11:2379 isLeader=false
994b3d27fdf1df2c: name=etcd02 peerURLs=http://192.168.244.12:2380 clientURLs=http://192.168.244.12:2379 isLeader=false
# ETCDCTL_API=2 etcdctl cluster-health
member 3875d2e31fd10372 is healthy: got healthy result from http://192.168.244.13:2379
member 6dd47e3c8257f639 is healthy: got healthy result from http://192.168.244.11:2379
member 994b3d27fdf1df2c is healthy: got healthy result from http://192.168.244.12:2379
cluster is healthy
整个集群挂掉,需要使用备份才能恢复。
由于 k8s 存储在 etcd 中的数据模型和API用的是 v3 的,这里只说 v3 的备份和还原。
v2 和 v3 的区别是 API 不同,存储不同,数据互相隔离。v2 升级到 v3 后,v2 的数据仍然还是要用 v2 API 访问,v3 的也只能用 v3 API 来访问。
##备份etcd(集群整个挂掉,就用日常备份的文件)
##endpoint指定哪台机器,就是备份哪台上的数据,虽然节点间是最终一致的,不加endpoints参数就是备份本机节点的数据
ETCDCTL_API=3 etcdctl snapshot --endpoints="192.168.244.11:2379" save /tmp/etcd_backup/etcdback.db
##为了防止存在节点出现问题,可以在 endpoints 中写多个ip:
ETCDCTL_API=3 etcdctl snapshot --endpoints=http://192.168.244.11:2379,http://192.168.244.12:2379,http://192.168.244.13:2379 save /tmp/etcd_backup/etcdback.db
重新准备三台机器, yum 安装好 etcd ,删除掉 /var/lib/etcd/
目录下的内容。
##还原数据,分别在三台执行(这里ip和节点名称均使用的之前的,根据情况修改):
##还原时,不加集群信息,启动后将是单点的,只会还原单个的节点信息
ETCDCTL_API=3 etcdctl --name=etcd01 --endpoints="http://192.168.244.11:2379" --initial-cluster-token=etcd-cluster --initial-advertise-peer-urls=http://192.168.244.11:2380 --initial-cluster=etcd01=http://192.168.244.11:2380,etcd02=http://192.168.244.12:2380,etcd03=http://192.168.244.13:2380 --data-dir=/var/lib/etcd/default.etcd snapshot restore /tmp/etcd_backup/etcdback.db
ETCDCTL_API=3 etcdctl --name=etcd02 --endpoints="http://192.168.244.12:2379" --initial-cluster-token=etcd-cluster --initial-advertise-peer-urls=http://192.168.244.12:2380 --initial-cluster=etcd01=http://192.168.244.11:2380,etcd02=http://192.168.244.12:2380,etcd03=http://192.168.244.13:2380 --data-dir=/var/lib/etcd/default.etcd snapshot restore /tmp/etcd_backup/etcdback.db
ETCDCTL_API=3 etcdctl --name=etcd03 --endpoints="http://192.168.244.13:2379" --initial-cluster-token=etcd-cluster --initial-advertise-peer-urls=http://192.168.244.13:2380 --initial-cluster=etcd01=http://192.168.244.11:2380,etcd02=http://192.168.244.12:2380,etcd03=http://192.168.244.13:2380 --data-dir=/var/lib/etcd/default.etcd snapshot restore /tmp/etcd_backup/etcdback.db
启动节点,启动的 --initial-cluster-state均为 existing:
etcd --name etcd01 --initial-advertise-peer-urls http://192.168.244.11:2380 \
--data-dir /var/lib/etcd/default.etcd \
--listen-peer-urls http://192.168.244.11:2380 \
--listen-client-urls http://192.168.244.11:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://192.168.244.11:2379 \
--initial-cluster-token etcd-cluster \
--initial-cluster etcd01=http://192.168.244.11:2380,etcd02=http://192.168.244.12:2380,etcd03=http://192.168.244.13:2380 \
--initial-cluster-state existing >> /tmp/etcd.log 2>&1 &
etcd --name etcd02 --initial-advertise-peer-urls http://192.168.244.12:2380 \
--data-dir /var/lib/etcd/default.etcd \
--listen-peer-urls http://192.168.244.12:2380 \
--listen-client-urls http://192.168.244.12:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://192.168.244.12:2379 \
--initial-cluster-token etcd-cluster \
--initial-cluster etcd01=http://192.168.244.11:2380,etcd02=http://192.168.244.12:2380,etcd03=http://192.168.244.13:2380 \
--initial-cluster-state existing >> /tmp/etcd.log 2>&1 &
etcd --name etcd03 --initial-advertise-peer-urls http://192.168.244.13:2380 \
--data-dir /var/lib/etcd/default.etcd \
--listen-peer-urls http://192.168.244.13:2380 \
--listen-client-urls http://192.168.244.13:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://192.168.244.13:2379 \
--initial-cluster-token etcd-cluster \
--initial-cluster etcd01=http://192.168.244.11:2380,etcd02=http://192.168.244.12:2380,etcd03=http://192.168.244.13:2380 \
--initial-cluster-state existing >> /tmp/etcd.log 2>&1 &
至此集群恢复完成。
某节点关闭后,再次手动启动时,–initial-cluster-state 值仍为 new,它不是删掉后,重新加入集群。
下面是使用 etcdct snapshot restore 命令比较的还原单个节点和还原集群的区别:
# ETCDCTL_API=3 etcdctl --data-dir=/var/lib/etcd/default.etcd snapshot restore /tmp/etcd_backup/etcdback.db
2019-05-10 17:19:45.016910 I | etcdserver/membership: added member 8e9e05c52164694d [http://localhost:2380] to cluster cdf818194e3a8c32
# ETCDCTL_API=3 etcdctl --name=etcd03 --endpoints="http://192.168.244.13:2379" --initial-cluster-token=etcd-cluster --initial-advertise-peer-urls=http://192.168.244.13:2380 --initial-cluster=etcd01=http://192.168.244.11:2380,etcd02=http://192.168.244.12:2380,etcd03=http://192.168.244.13:2380 --data-dir=/var/lib/etcd/default.etcd snapshot restore /tmp/etcd_backup/etcdback.db
2019-05-10 17:24:27.743383 I | etcdserver/membership: added member 3875d2e31fd10372 [http://192.168.244.13:2380] to cluster f3c226181d7ed864
2019-05-10 17:24:27.743634 I | etcdserver/membership: added member 397b52ecac7810c7 [http://192.168.244.11:2380] to cluster f3c226181d7ed864
2019-05-10 17:24:27.743686 I | etcdserver/membership: added member 60192088bf0f1cbc [http://192.168.244.12:2380] to cluster f3c226181d7ed864