k8s v1.13.2 安装排错日志

本文记录了在安装k8s v1.13.2过程中出现的各式问题以及解决方案,不定期更新,供日后查看。正常安装步骤见:

Kubernetes实践指南:kubeadm安装集群K8s:v1.13.2 -


1.kubelet启动报错:W0203 MemoryAccounting CPUAccounting not enabled for pid...

[root@k8s-node2 ~]# service kubelet status
Redirecting to /bin/systemctl status kubelet.service
 kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since 日 2019-02-03 11:35:52 CST; 1h 49min ago
     Docs: https://kubernetes.io/docs/
 Main PID: 9766 (kubelet)
   CGroup: /system.slice/kubelet.service
           └─9766 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/con...

2月 03 13:10:53 k8s-node2 kubelet[9766]: W0203 13:10:53.182621    9766 container_manager_linux.go:804] CPUAccounting not enabled for pid: 9766
2月 03 13:10:53 k8s-node2 kubelet[9766]: W0203 13:10:53.182630    9766 container_manager_linux.go:807] MemoryAccounting not enabled for pid: 9766
2月 03 13:15:53 k8s-node2 kubelet[9766]: W0203 13:15:53.183017    9766 container_manager_linux.go:804] CPUAccounting not enabled for pid: 9085
2月 03 13:15:53 k8s-node2 kubelet[9766]: W0203 13:15:53.183056    9766 container_manager_linux.go:807] MemoryAccounting not enabled for pid: 9085
2月 03 13:15:53 k8s-node2 kubelet[9766]: W0203 13:15:53.183156    9766 container_manager_linux.go:804] CPUAccounting not enabled for pid: 9766
2月 03 13:15:53 k8s-node2 kubelet[9766]: W0203 13:15:53.183161    9766 container_manager_linux.go:807] MemoryAccounting not enabled for pid: 9766
2月 03 13:20:53 k8s-node2 kubelet[9766]: W0203 13:20:53.184116    9766 container_manager_linux.go:804] CPUAccounting not enabled for pid: 9085
2月 03 13:20:53 k8s-node2 kubelet[9766]: W0203 13:20:53.184155    9766 container_manager_linux.go:807] MemoryAccounting not enabled for pid: 9085
2月 03 13:20:53 k8s-node2 kubelet[9766]: W0203 13:20:53.184237    9766 container_manager_linux.go:804] CPUAccounting not enabled for pid: 9766
2月 03 13:20:53 k8s-node2 kubelet[9766]: W0203 13:20:53.184243    9766 container_manager_linux.go:807] MemoryAccounting not enabled for pid: 9766

首先查看内存的使用情况:# free -h 发现并没有存在内存不够的情况。解决办法:增加一个配置文件,明确启用DefaultCPUAccounting和DefaultMemoryAccounting:

 # mkdir -p /etc/systemd/system.conf.d
 # cat </etc/systemd/system.conf.d/kubernetes-accounting.conf
 [Manager]
 DefaultCPUAccounting=yes
 DefaultMemoryAccounting=yes
 EOF
# systemctl daemon-reload && systemctl restart kubelet

2.Kubernetes Node节点执行 kubectl get all 错误:The connection to the server localhost:8080 was refused.

[root@k8s-node2 ~]#  kubectl get all
The connection to the server localhost:8080 was refused - did you specify the right host or port?

使用 netstat -ntlp 命令检查是否监听了localhost:8080端口,发现并没有。而在Master节点上使用kubectl命令虽然不会报错,但其8080端口仍然未被监听。

事实上,kubectl命令是通过kube-apiserver接口进行集群管理。该命令可以在Master节点上运行是因为kube-apiserver处于工作状态:

[root@k8s-master ~]# docker ps | grep apiserver
269a09fc31ce        177db4b8e93a           "kube-apiserver --..."   20 hours ago        Up 20 hours                             k8s_kube-apiserver_kube-apiserver-k8s-master_kube-system_e65c58fe4249c7d1554ca017bda21943_0
dcf07ff997a1        k8s.gcr.io/pause:3.1   "/pause"                 20 hours ago        Up 20 hours                             k8s_POD_kube-apiserver-k8s-master_kube-system_e65c58fe4249c7d1554ca017bda21943_0

而同时,在Node节点上只有kube-proxy和kubelet处于工作状态:

[root@k8s-node1 ~]# docker ps
CONTAINER ID        IMAGE                  COMMAND                  CREATED             STATUS              PORTS               NAMES
fa14d993436a        142953928206           "/install-cni.sh"        20 hours ago        Up 20 hours                             k8s_install-cni_calico-node-clc9p_kube-system_ac5f61a7-26d2-11e9-9274-000c29d747fb_0
4e77ea62ac14        01cfa56edcfc           "/usr/local/bin/ku..."   20 hours ago        Up 20 hours                             k8s_kube-proxy_kube-proxy-nzfvg_kube-system_ac5f6294-26d2-11e9-9274-000c29d747fb_0
2bb208e1573d        e537e5882f91           "start_runit"            20 hours ago        Up 20 hours                             k8s_calico-node_calico-node-clc9p_kube-system_ac5f61a7-26d2-11e9-9274-000c29d747fb_0
8490970048da        k8s.gcr.io/pause:3.1   "/pause"                 20 hours ago        Up 20 hours                             k8s_POD_calico-node-clc9p_kube-system_ac5f61a7-26d2-11e9-9274-000c29d747fb_0
f8eb0bb6693b        k8s.gcr.io/pause:3.1   "/pause"                 20 hours ago        Up 20 hours                             k8s_POD_kube-proxy-nzfvg_kube-system_ac5f6294-26d2-11e9-9274-000c29d747fb_0

因此,kubectl命令其实不是为Node节点的主机准备的,而是应该运行在一个Client主机上:如K8s-Master节点的非root用户。当我们kubeadm init success后,系统会提示我们将admin.conf文件保存到Client主机上:

Your Kubernetes master has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of machines by running the following on each node
as root:

  kubeadm join 192.168.1.120:6443 --token oe50fb.0pt36rwvz2utey4d --discovery-token-ca-cert-hash sha256:60bd336002b8f5d269996f1daf324c0a71814d6a25d82ab7b1d17ddeddd68860

我们查看 /etc/kubernetes/admin.conf文件

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUN5RENDQWJDZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFWTVJNd0VRWURWUVFERXdwcmRXSmwKY201bGRHVnpNQjR
YRFRFNU1ESXdNakE1TlRZMU5sb1hEVEk1TURFek1EQTVOVFkxTmxvd0ZURVRNQkVHQTFVRQpBeE1LYTNWaVpYSnVaWFJsY3pDQ0FTSXdEUVlKS29aSWh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBTHFwCnRScTl4Smk5cz
NTdUVsVXljNmMwcGhhWSs4OHlQcUpYQnBsZk1YOFpJcmJVWDdHTFB5ZDVzZlBrS0lrblJ6dUgKeTZxb091NUVVbWtYZ1dldlNzK1JITGdYbHNuUFBhSHhCK0o5Y1pxNjg5cnQrd3huMDl6OVpNT0ROc0ZMTHRVMgoxUEFoY3lRZ
TNOZVBPSUdseHQvckZRRlBUV05KQTErbmJCSk9sZEhlVUhmWjNaaVcwbVFHM0IrWk1SUUpWdkM0CmIrdHRVaUpaK3FQL09SaUZKR3VUYmJzS2tsUlNIaG9xMnVtSExxYmhLTVJNQXRRbTIxZWMzaXVxVVp4QWl4MlcKdnR1Uzgr
ZUV0U3lIQW8xTm00bzd2dFh3eGVrTkYzT2lVOUZ5T1VvS3NxdVRKenVhdk9UdVJoYjd1REpQZERoaApFRzZzMlZvUjZyRDB2UjFmZUZVQ0F3RUFBYU1qTUNFd0RnWURWUjBQQVFIL0JBUURBZ0trTUE4R0ExVWRFd0VCCi93UUZ
NQU1CQWY4d0RRWUpLb1pJaHZjTkFRRUxCUUFEZ2dFQkFBSytla09IT1dQTGhsVzJva2g0bTlRNTRJY3oKOEJPU1VEYnJsSk9iSXFUaWNvWktsOGNNMjM3OTlDcXUrVDh2WHA3YXRQc0xtd2xRK2VVK2lUVUNZVGk3d013Lwo1M1
lxWjNCSHVQS2F0RDNoVGpFRlVIbzFZVHMyYmZqVHZ5Z2hLbGhDVnBGL1k4NmFHOVFUVUxmc0g5VXpwbWtjCk5DZzU3T0tUWjFNc3FQUmIrM1hRSEFCWHVaR1RNVG4zaGVZR2dnYklVaC9vdTJyM2RhdFY0ZWdTaDhveFBJcmoKa
FdhU0JOcmVaaE45a1VsVmNoT3RsZ2lvcDJzR1A0V2RLQisxc2kxU2x2YUI5aGR6VklpTHFGWnlhY3I5ZUlvaAp1ckVib2lZYXovU2hGeSs1UCs1SWViZ0h5QWtuWm5EbXFKT3ZXbjducUNhc3RmYi81bERHYVZCcmxtZz0KLS0t
LS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
    server: https://192.168.1.120:6443
  name: kubernetes

即可发现,当Client使用该config文件启动kubelet后,他将访问Master节点的6443端口获得数据(Master 6443端口是处于LISTEN状态的),而非localhost:8080端口(因为Node节点无法找到该config文件)。我们也可以把Client客户端放在其他主机中,甚至Node节点。只要将该config文件按照系统提示方式添加到Client客户端中即可。我们使用scp命令将文件发送至目标主机:

 # scp -r .kube/ 192.168.1.110:/root    //在此我直接将/root/.kube文件夹发至目标主机

即可实现使用kubectl访问Master节点。

[root@localhost .kube]# kubectl get no
NAME         STATUS   ROLES    AGE   VERSION
k8s-master   Ready    master   18h   v1.13.2
k8s-node1    Ready       18h   v1.13.2
k8s-node2    Ready       18h   v1.13.2

也就是说,我们正常向Master注册pod的过程也是在Client客户端完成的,而非在Node节点或Master节点完成。


3.kubelet启动报错:E0208 node "k8s-master" not found

2月 08 15:55:36 k8s-master kubelet[6164]: E0208 15:55:36.068126    6164 kubelet.go:2266] node "k8s-master" not found
2月 08 15:55:36 k8s-master kubelet[6164]: E0208 15:55:36.169675    6164 kubelet.go:2266] node "k8s-master" not found
2月 08 15:55:36 k8s-master kubelet[6164]: E0208 15:55:36.238707    6164 kubelet_node_status.go:94] Unable to register node "k8s-master" with API server: Post https://192.168.1.120:6443/api/v1/nodes: dial tcp 192.168.1.120:6443: connect: connection refused

kubeadm在Master节点也安装了kubelet,默认情况下不参与负载。这个错误比较明显,即节点(kubelet)无法连接至Master(kube-apiserver),不是因为Master节点6443端口关闭,而是后来我修改过Master节点的IP地址,导致旧的IP地址无法正确匹配。解决方法其一可以通过kubeadm reset 重新安装。这里我们尝试修改已安装好的kubernetes参数来使其正确运行。

[root@k8s-master ~]# cd /etc/kubernetes && ls 
总用量 36
-rw-------  1 root root 5455 2月   8 16:05 admin.conf
-rw-------  1 root root 5487 2月   8 16:05 controller-manager.conf
-rw-------  1 root root 5483 2月   8 16:06 kubelet.conf
drwxr-xr-x. 2 root root  113 2月   8 16:08 manifests
drwxr-xr-x. 3 root root 4096 2月   2 17:56 pki
-rw-------  1 root root 5435 2月   8 16:08 scheduler.conf

将conf文件中旧的IP地址(192.168.1.120)修改为(192.168.111.120)并保存,重新加载kubelet服务(文件夹内也有conf文件)

[root@k8s-master ~]# systemctl daemon-reload
[root@k8s-master ~]# systemctl restart kubelet && journactl -xefu kubelet
2月 08 16:47:04 k8s-master kubelet[19409]: E0208 16:47:04.188505   19409 kubelet.go:2266] node "k8s-master" not found
2月 08 16:47:04 k8s-master kubelet[19409]: E0208 16:47:04.290432   19409 kubelet.go:2266] node "k8s-master" not found
2月 08 16:47:04 k8s-master kubelet[19409]: E0208 16:47:04.326230   19409 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/kubelet.go:444: Failed to list *v1.Service: Get https://192.168.111.120:6443/api/v1/services?limit=500&resourceVersion=0: x509: certificate is valid for 10.96.0.1, 192.168.1.120, not 192.168.111.120
2月 08 16:47:04 k8s-master kubelet[19409]: E0208 16:47:04.356546   19409 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://192.168.111.120:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dk8s-master&limit=500&resourceVersion=0: x509: certificate is valid for 10.96.0.1, 192.168.1.120, not 192.168.111.120
2月 08 16:47:04 k8s-master kubelet[19409]: E0208 16:47:04.362324   19409 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/kubelet.go:453: Failed to list *v1.Node: Get https://192.168.111.120:6443/api/v1/nodes?fieldSelector=metadata.name%3Dk8s-master&limit=500&resourceVersion=0: x509: certificate is valid for 10.96.0.1, 192.168.1.120, not 192.168.111.120

从错误日志判断来看,是Master上的kubelet在与同一节点上的kube-apiserver通信过程中,发现这个apiserver返回的tls证书是属于192.168.1.120的,而非192.168.111.120的apiserver,于是报了错。为了要解决这个问题,我们需要为新的IP地址生成自己的数字证书。贴上参考链接:
生成apiserver数字证书(3.2)
stackoverflow invalid-x509
(好了,这个问题得到此为止了,因为我要自己生成太多的数字证书。最后我在各个节点上重新安装了一遍。)


4.不关闭swap进行安装k8s的解决方案(以kubeadm为例)

1.kubelet启动时加入参数--fail-swap-on=false并重启(KUBELET_EXTRA_ARGS:/etc/sysconfig/kubelet)
2.kubeadm init --ignore-preflight-errors=Swap(在kubeadm join时也需要手动加入--ignore...)。


5.unknown container "/system.slice/kubelet.service":

kubelet运行时报错:
Failed to get system container stats for "/system.slice/kubelet.service": failed to get cgroup stats for "/system.slice/kubelet.service": failed to get container info for "/system.slice/kubelet.service": unknown container "/system.slice/kubelet.service":
启动时添加参数:--runtime-cgroups=/systemd/system.slice
--kubelet-cgroups=/systemd/system.slice

你可能感兴趣的:(k8s v1.13.2 安装排错日志)