查看日志,关键语句如下
Failed to start cAdvisor inotify_add_watch /sys/fs/cgroup/blkio: no space left on device
或
Failed to start cAdvisor inotify_add_watch /sys/fs/cgroup/cpu,cpuacct: no space left on device
解决办法,参考https://blog.csdn.net/xiaofang2015/article/details/80649548
[root@node6 ~]# cat /proc/sys/fs/inotify/max_user_watches
8196
[root@node6 ~]# sysctl fs.inotify.max_user_watches=1048576
现象如下:
解决办法:参考地址:https://serverfault.com/questions/712928/systemctl-commands-timeout-when-ran-as-root
[root@node7 ~]# systemctl --force --force reboot
Rebooting.
packet_write_wait: Connection to 10.180.3.107 port 22: Broken pipe
Warning SystemOOM 34m (x8 over 34m) kubelet, node5 System OOM encountered
Sep 30 18:36:22 node5 kubelet[134096]: E0930 18:36:22.037042 134096 kubelet_node_status.go:106] Unable to register node "node5" with API server: Post https://localhost:6443/api/v1/nodes: dial tcp 127.0.0.1:6443: getsockopt: connection refused
Sep 30 18:36:21 node5 kubelet[134096]: E0930 18:36:21.898997 134096 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://localhost:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dnode5&limit=500&resourceVersion=0: dial tcp 127.0.0.1:6443: getsockopt: connection refused
直接重启机器吧,原因不知道
现象:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Thu, 10 Oct 2019 20:38:34 +0800 Mon, 08 Oct 2018 23:08:22 +0800 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Thu, 10 Oct 2019 20:38:34 +0800 Fri, 23 Aug 2019 21:12:52 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Thu, 10 Oct 2019 20:38:34 +0800 Wed, 15 May 2019 16:12:45 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure
Ready False Thu, 10 Oct 2019 20:38:34 +0800 Thu, 10 Oct 2019 20:27:48 +0800 KubeletNotReady PLEG is not healthy: pleg was last seen active 13m51.370099044s ago; threshold is 3m0s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal NodeNotReady 10m (x2 over 137d) kubelet, node37 Node node37 status is now: NodeNotReady
Warning ContainerGCFailed 2m (x4 over 11m) kubelet, node37 rpc error: code = DeadlineExceeded desc = context deadline exceeded
解决办法:
systemctl daemon-reexec
systemctl restart docker(是的,需要重启docker)
现象:
node notready,查看describe无event时间信息。
在node上查看kubelet日志systemctl status kubelet.service -l
,如下
解决办法:
在node上重启kubelet。有时候重启也不行,看下磁盘,cpu,内存,文件描述符这些资源的占用情况。
有一次就是因为磁盘根目录使用率到99%了,重启完kubelet后,刚恢复又notready了,但是describe里有没有显示出来磁盘空间不足,导致没有往磁盘空间这个方面考虑。
还有一次是因为内存不足了