kubernetes node 节点启动报错故障排查

报错场景:

kubernetes 集群安装部署期间,部署node节点kubelet服务时,执行  systemctl start kubelet ,tailf /var/log/messages 看到大量证书验证报错;

报错内容:

May  5 22:23:40 kubnode-01 kubelet: I0505 22:23:40.583305    5336 feature_gate.go:206] feature gates: &{map[]}
May  5 22:23:40 kubnode-01 kubelet: I0505 22:23:40.589637    5336 mount_linux.go:180] Detected OS with systemd
May  5 22:23:40 kubnode-01 kubelet: I0505 22:23:40.589680    5336 server.go:407] Version: v1.13.4
May  5 22:23:40 kubnode-01 kubelet: I0505 22:23:40.589732    5336 feature_gate.go:206] feature gates: &{map[]}
May  5 22:23:40 kubnode-01 kubelet: I0505 22:23:40.589825    5336 feature_gate.go:206] feature gates: &{map[]}
May  5 22:23:40 kubnode-01 kubelet: I0505 22:23:40.589899    5336 plugins.go:103] No cloud provider specified.
May  5 22:23:40 kubnode-01 kubelet: I0505 22:23:40.589916    5336 server.go:523] No cloud provider specified: "" from the config file: ""
May  5 22:23:40 kubnode-01 kubelet: I0505 22:23:40.589938    5336 bootstrap.go:65] Using bootstrap kubeconfig to generate TLS client cert, key and kubeconfig file
May  5 22:23:40 kubnode-01 kubelet: I0505 22:23:40.593022    5336 bootstrap.go:96] No valid private key and/or certificate found, reusing existing private key or creating a new one
May  5 22:23:40 kubnode-01 kubelet: I0505 22:23:40.612493    5336 bootstrap.go:239] Failed to connect to apiserver: Get https://172.20.101.157:6443/healthz?timeout=1s: x509: certificate signed by unknown authority
May  5 22:23:42 kubnode-01 kubelet: I0505 22:23:42.909358    5336 bootstrap.go:239] Failed to connect to apiserver: Get https://172.20.101.157:6443/healthz?timeout=1s: x509: certificate signed by unknown authority
May  5 22:23:45 kubnode-01 kubelet: I0505 22:23:45.036663    5336 bootstrap.go:239] Failed to connect to apiserver: Get https://172.20.101.157:6443/healthz?timeout=1s: x509: certificate signed by unknown authority

解决办法如下:

在master节点创建kubelet-bootstrap用户

[root@k8s-node01 ~]# 

kubectl create clusterrolebinding kubelet-bootstrap --clusterrole=system:node-bootstrapper --user=kubelet-bootstrap
clusterrolebinding "kubelet-bootstrap" created

node节点执行启动服务

[root@k8s-node01 ~]# systemctl start kubelet

node 节点kubelet启动后,会向master申请csr证书,需要在master上同意证书申请

master节点执行命令,查看csr状态是Pending

[root@kubm-01 ~]# kubectl get csr
NAME                                                   AGE     REQUESTOR           CONDITION
node-csr-mgZK4Cqvb7kZA7tDqVmszNQYLq27Yydia5LCqKJnnEI   4m11s   kubelet-bootstrap   Pending

master节点执行命令批准证书

[root@kubm-01 ~]# 
kubectl certificate approve node-csr-mgZK4Cqvb7kZA7tDqVmszNQYLq27Yydia5LCqKJnnEI

master节点执行命令接受证书申请,同意后查看状态变成 Approved,Issued

[root@kubm-01 ~]# kubectl get csr
NAME                                                   AGE     REQUESTOR           CONDITION
node-csr-mgZK4Cqvb7kZA7tDqVmszNQYLq27Yydia5LCqKJnnEI   5m39s   kubelet-bootstrap   Approved,Issued

node节点验证

在node节点ssl目录可以看到,多了4个kubelet的证书文件

[root@kubnode-02 kubernetes]# ls /kubernetes/ssl/kubelet*
/kubernetes/ssl/kubelet-client-2019-05-05-22-15-53.pem  /kubernetes/ssl/kubelet-client-current.pem  /kubernetes/ssl/kubelet.crt  /kubernetes/ssl/kubelet.key

删除csr证书 (按需执行)

[root@kubm-01 ~]# kubectl delete csr node-csr-mgZK4Cqvb7kZA7tDqVmszNQYLq27Yydia5LCqKJnnEI
certificatesigningrequest.certificates.k8s.io "node-csr-mgZK4Cqvb7kZA7tDqVmszNQYLq27Yydia5LCqKJnnEI" deleted

验证删除:

kubectl get csr

返回为空

排查过程有点坑。。。。。。。

参考文档:

https://www.liuyalei.top/1433.html