Etcd connect error does not match any of DNSNames

环境信息

Kubernetes v1.18.20
Etcd 3.4.3
Etcd节点 3

问题描述

etcd更新证书出现节点之间连接失败导致无法选举整个集群不可用,错误日志请看如下

[master node] 192.16.0.1 etcd log
2023-05-24 02:17:33.760448 I | embed: rejected connection from "192.16.0.3:41464" (error "tls: "192.16.0.3" does not match any of DNSNames ["host1" "localhost"]", ServerName "", IPAddresses ["192.16.0.1" "127.0.0.1" "::1"], DNSNames ["host1" "localhost"])
2023-05-24 02:17:37.498451 I | embed: rejected connection from "192.16.0.2:36448" (error "tls: "192.16.0.2" does not match any of DNSNames ["host1" "localhost"]", ServerName "", IPAddresses ["192.16.0.1" "127.0.0.1" "::1"], DNSNames ["host1" "localhost"])
2023-05-24 02:17:36.357622 W | rafthttp: health check for peer c6198c3c2a184417 could not connect: x509: certificate is valid for 192.16.0.1, 127.0.0.1, ::1, not 192.16.0.3
2023-05-24 02:17:36.359115 W | rafthttp: health check for peer cde1c9316d25ba89 could not connect: x509: certificate is valid for 192.16.0.1, 127.0.0.1, ::1, not 192.16.0.2
2023-05-24 02:17:51.359023 W | rafthttp: health check for peer c6198c3c2a184417 could not connect: dial tcp 192.16.0.3:2380: connect: connection refused
2023-05-24 02:17:51.359663 W | rafthttp: health check for peer cde1c9316d25ba89 could not connect: dial tcp 192.16.0.2:2380: connect: connection refused

定位分析

  1. 参考etcd的官网文档 Transport security model | etcd ,应该是客户端TLS请求的IP地址跟证书 *.crt 配置的不一致
  2. 查看etcd证书文件的配置信息: openssl x509 -text -in /etc/kubernetes/pki/etcd/peer.crt -noout | grep DNSName
  3. 发现里面配置的信息跟主机的名称和IP对应不上,还好之前更新证书有做了备份,查看备份的证书是正确的,应该是执行kubeadm alpha certs renew all 更新时,将etcd证书文件替换错误导致的,将备份的证书放回到/etc/kubernetes/pki/etcd/peer.crt目录,重新执行证书更新就可以了

你可能感兴趣的:(etcd,kubernetes,linux)