集群重启后发现node1节点出现notready状态,问题排查及解决(kubelet与docker的cgroup驱动不同导致)

集群重启后发现node1节点出现notready状态
集群重启后发现node1节点出现notready状态,问题排查及解决(kubelet与docker的cgroup驱动不同导致)_第1张图片
排查:
1、查看服务器的物理环境
free -mh/df -h
2、查看内存是否溢出,磁盘空间是否够用,经查均在正常使用范围内;
3、top查看cpu使用状态,在可用范围内;
4、再查master组件scheduer,controller-manager,apiserver等都在正常运行;
5、查看node详细信息
[root@master ~]# kubectl describe nodes node1

Name:               node1
Roles:              
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    disk=ssd
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=node1
                    kubernetes.io/os=linux
Annotations:        flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"76:06:85:be:2e:f1"}
                    flannel.alpha.coreos.com/backend-type: vxlan
                    flannel.alpha.coreos.com/kube-subnet-manager: true
                    flannel.alpha.coreos.com/public-ip: 192.168.213.183
                    kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Sun, 27 Nov 2022 10:18:28 +0800
Taints:             node.kubernetes.io/unreachable:NoExecute
                    node.kubernetes.io/unreachable:NoSchedule
Unschedulable:      false
Lease:
  HolderIdentity:  node1
  AcquireTime:     
  RenewTime:       Thu, 09 Feb 2023 14:30:01 +0800
Conditions:
  Type                 Status    LastHeartbeatTime                 LastTransitionTime                Reason              Message
  ----                 ------    -----------------                 ------------------                ------              -------
  NetworkUnavailable   False     Wed, 08 Feb 2023 14:38:51 +0800   Wed, 08 Feb 2023 14:38:51 +0800   FlannelIsUp         Flannel is running on this node
  MemoryPressure       Unknown   Thu, 09 Feb 2023 14:26:23 +0800   Mon, 13 Feb 2023 09:15:51 +0800   NodeStatusUnknown   Kubelet stopped posting node status.
  DiskPressure         Unknown   Thu, 09 Feb 2023 14:26:23 +0800   Mon, 13 Feb 2023 09:15:51 +0800   NodeStatusUnknown   Kubelet stopped posting node status.
  PIDPressure          Unknown   Thu, 09 Feb 2023 14:26:23 +0800   Mon, 13 Feb 2023 09:15:51 +0800   NodeStatusUnknown   Kubelet stopped posting node status.
  Ready                Unknown   Thu, 09 Feb 2023 14:26:23 +0800   Mon, 13 Feb 2023 09:15:51 +0800   NodeStatusUnknown   Kubelet stopped posting node status.
Addresses:
  InternalIP:  192.168.213.139
  Hostname:    node1
Capacity:
  cpu:                2
  ephemeral-storage:  17394Mi
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             4002416Ki
  pods:               110
Allocatable:
  cpu:                2
  ephemeral-storage:  16415037823
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             3900016Ki
  pods:               110
System Info:
  Machine ID:                 7f16913a43d84397bd33fc081680947a
  System UUID:                41fc4d56-275d-b583-b585-db862b9a5cc8
  Boot ID:                    e3856941-2c2c-4afc-ae83-572b98bb1c82
  Kernel Version:             5.4.221-1.el7.elrepo.x86_64
  OS Image:                   CentOS Linux 7 (Core)
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  docker://19.3.15
  Kubelet Version:            v1.21.0
  Kube-Proxy Version:         v1.21.0
PodCIDR:                      10.244.1.0/24
PodCIDRs:                     10.244.1.0/24
Non-terminated Pods:          (4 in total)
  Namespace                   Name                      CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                      ------------  ----------  ---------------  -------------  ---
  default                     nginx-6799fc88d8-j2f5v    0 (0%)        0 (0%)      0 (0%)           0 (0%)         5d20h
  default                     nginx-6799fc88d8-xstkz    0 (0%)        0 (0%)      0 (0%)           0 (0%)         5d20h
  kube-system                 kube-flannel-ds-kvx26     100m (5%)     100m (5%)   50Mi (1%)        50Mi (1%)      79d
  kube-system                 kube-proxy-gj29x          0 (0%)        0 (0%)      0 (0%)           0 (0%)         79d
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests   Limits
  --------           --------   ------
  cpu                100m (5%)  100m (5%)
  memory             50Mi (1%)  50Mi (1%)
  ephemeral-storage  0 (0%)     0 (0%)
  hugepages-1Gi      0 (0%)     0 (0%)
  hugepages-2Mi      0 (0%)     0 (0%)
Events:              

由此一直kubelet已经不再工作,无法将node节点的状态信息提供给master。
7、登录node所在机器
查看kubelet状态
集群重启后发现node1节点出现notready状态,问题排查及解决(kubelet与docker的cgroup驱动不同导致)_第2张图片
虽显示启动状态,但下面的事项说明中表名他其实是启动失败了的。
查看日志:[root@node1 ~]# journalctl -u kubelet
在这里插入图片描述
发现报错:

"Failed to run kubelet" err="failed to run Kubelet: misconfiguration: kubelet cgroup driver: \"systemd\" is different from docker cgroup driver: \"cgroupfs\""

由此可知,kubernets所使用的驱动与docker所使用驱动不同,导致kubelet启动失败。
这里我们将docker驱动修改与kubelet驱动一致即可解决。
修改配置文件:
[root@node1 ~]# vim /etc/docker/daemon.json
添加如下配置即可
集群重启后发现node1节点出现notready状态,问题排查及解决(kubelet与docker的cgroup驱动不同导致)_第3张图片
最后重启docker,kubelet即可
[root@node1 ~]# systemctl daemon-reload
[root@node1 ~]# systemctl restart docker
[root@node1 ~]# systemctl restart kubelet
回到master节点进行查验
集群重启后发现node1节点出现notready状态,问题排查及解决(kubelet与docker的cgroup驱动不同导致)_第4张图片
node已为ready状态。

你可能感兴趣的:(docker,kubelet,kubernetes)