基于CFSSL构建高可用ETCD集群全指南(含TLS证书管理)

基于CFSSL构建高可用ETCD集群全指南(含TLS证书管理)

摘要:本文深入讲解使用CFSSL工具签发TLS证书,并部署生产级高可用ETCD集群的完整流程。涵盖证书全生命周期管理、集群配置优化及安全加固方案,适用于Kubernetes、分布式系统等场景。

一、环境规划与架构设计

1.1 节点信息

节点IP 角色 主机名 证书SAN扩展
192.167.14.228 ETCD Master etcd-1 IP:228,229,246
192.167.14.229 ETCD Backup etcd-2 DNS:etcd-cluster
192.167.14.246 ETCD Backup etcd-3

1.2 端口规划

端口 协议 用途
2379 HTTPS 客户端通信
2380 HTTPS 节点间Peer通信

二、CFSSL证书管理全流程

2.1 安装CFSSL工具链

wget https://pkg.cfssl.org/R1.2/cfssl_linux-amd64 \
     https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64 \
     https://pkg.cfssl.org/R1.2/cfssl-certinfo_linux-amd64

chmod +x cfssl* && mv cfssl_linux-amd64 /usr/local/bin/cfssl
mv cfssljson_linux-amd64 /usr/local/bin/cfssljson
mv cfssl-certinfo_linux-amd64 /usr/bin/cfssl-certinfo

2.2 生成根证书机构(CA)

mkdir -p ~/etcd_tls && cd ~/etcd_tls

# CA配置文件
cat > ca-config.json <<EOF
{
  "signing": {
    "default": {"expiry": "876000h"},
    "profiles": {
      "kubernetes": {
        "expiry": "876000h",
        "usages": ["signing", "key encipherment", "server auth", "client auth"]
      }
    }
  }
}
EOF

# CA CSR请求文件
cat > ca-csr.json <<EOF
{
  "CN": "Kubernetes",
  "key": {"algo": "rsa", "size": 2048},
  "names": [{"C": "CN", "L": "Xi'an", "O": "k8s", "OU": "Cluster"}]
}
EOF

# 生成CA证书
cfssl gencert -initca ca-csr.json | cfssljson -bare ca

2.3 签发ETCD服务证书

cat > etcd-csr.json <<EOF
{
  "CN": "etcd",
  "hosts": [
    "192.167.14.228",
    "192.167.14.229", 
    "192.167.14.246",
    "etcd-cluster.local"
  ],
  "key": {"algo": "rsa", "size": 2048},
  "names": [{"C": "CN", "L": "Xi'an", "O": "k8s", "OU": "ETCD"}]
}
EOF

cfssl gencert -ca=ca.pem -ca-key=ca-key.pem \
  -config=ca-config.json -profile=kubernetes \
  etcd-csr.json | cfssljson -bare etcd

三、ETCD集群部署实战

3.1 安装ETCD二进制

ETCD_VER=v3.5.9
wget https://github.com/etcd-io/etcd/releases/download/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz

tar -zxvf etcd-${ETCD_VER}-linux-amd64.tar.gz
mkdir -p /opt/etcd/{bin,cfg,ssl}
mv etcd-${ETCD_VER}-linux-amd64/{etcd,etcdctl} /opt/etcd/bin/

3.2 节点配置模板(以etcd-1为例)

cat > /opt/etcd/cfg/etcd.conf <<EOF
[Member]
name = "etcd-1"
data-dir = "/var/lib/etcd"
listen-peer-urls = "https://192.167.14.228:2380"
listen-client-urls = "https://192.167.14.228:2379,https://127.0.0.1:2379"

[Cluster]
initial-advertise-peer-urls = "https://192.167.14.228:2380"
advertise-client-urls = "https://192.167.14.228:2379"
initial-cluster = "etcd-1=https://192.167.14.228:2380,etcd-2=https://192.167.14.229:2380,etcd-3=https://192.167.14.246:2380"
initial-cluster-token = "etcd-cluster"
initial-cluster-state = "new"
EOF

3.3 Systemd服务配置

cat > /usr/lib/systemd/system/etcd.service <<EOF
[Unit]
Description=ETCD KeyValue Store
Documentation=https://etcd.io
After=network.target

[Service]
EnvironmentFile=/opt/etcd/cfg/etcd.conf
ExecStart=/opt/etcd/bin/etcd \
  --cert-file=/opt/etcd/ssl/etcd.pem \
  --key-file=/opt/etcd/ssl/etcd-key.pem \
  --peer-cert-file=/opt/etcd/ssl/etcd.pem \
  --peer-key-file=/opt/etcd/ssl/etcd-key.pem \
  --trusted-ca-file=/opt/etcd/ssl/ca.pem \
  --peer-trusted-ca-file=/opt/etcd/ssl/ca.pem
Restart=on-failure
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
EOF

四、集群初始化与验证

4.1 启动集群

systemctl daemon-reload
systemctl enable --now etcd

4.2 集群健康检查

ETCDCTL_API=3 /opt/etcd/bin/etcdctl \
  --cacert=/opt/etcd/ssl/ca.pem \
  --cert=/opt/etcd/ssl/etcd.pem \
  --key=/opt/etcd/ssl/etcd-key.pem \
  --endpoints="https://192.167.14.228:2379,https://192.167.14.229:2379,https://192.167.14.246:2379" \
  endpoint health --write-out=table

预期输出

+---------------------------+--------+-------------+-------+
|         ENDPOINT          | HEALTH |    TOOK     | ERROR |
+---------------------------+--------+-------------+-------+
| https://192.167.14.228:2379 |   true |  14.567345ms |       |
| https://192.167.14.229:2379 |   true |  15.234512ms |       |
| https://192.167.14.246:2379 |   true |  16.789123ms |       |
+---------------------------+--------+-------------+-------+

五、生产级优化建议

5.1 安全加固

# 启用客户端证书认证
--client-cert-auth=true

# 定期轮换证书(每年)
openssl x509 -in /opt/etcd/ssl/etcd.pem -noout -dates

5.2 性能调优

# 调整后端存储配额
--quota-backend-bytes=8589934592  # 8GB

# 优化日志配置
--log-level=warn
--logger=zap

六、防火墙策略(生产必配)

firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="192.167.14.0/24" port port="2379-2380" protocol="tcp" accept'
firewall-cmd --reload

七、故障排查指南

现象 排查命令 解决方案
节点无法加入集群 journalctl -u etcd -f 检查证书SAN与节点IP是否匹配
客户端连接超时 telnet 2379 验证防火墙和SELinux策略
存储空间不足 du -sh /var/lib/etcd/member/ 清理快照或扩容存储
证书过期 cfssl-certinfo -cert etcd.pem 重新签发证书并滚动重启集群

扩展工具推荐

  • etcd-browser:Web管理界面
  • etcd-backup-operator:自动化备份工具

通过本文,您已掌握企业级ETCD集群的构建与维护技能。建议定期进行灾难恢复演练确保集群高可用!

如果本教程帮助您解决了问题,请点赞❤️收藏⭐支持!欢迎在评论区留言交流技术细节!欲了解密码学知识,请订阅《密码学实战》专栏 → 密码学实战

你可能感兴趣的:(k8s实战,etcd,数据库)