k8s集群维护点滴

1. 关于k8s集群calico pod not ready的情况

1. 查看pod 信息的event事件,calico 健康检查没有通过,报错如下:
 Warning  Unhealthy  6m22s (x499 over 89m)  kubelet, k8s-master  (combined from similar events): 
Readiness probe failed: calico/node is not ready: BIRD is not ready: BGP not established with 
172.16.64.232,172.16.64.2352020-02-12 12:36:36.591 [INFO][12943] health.go 156: Number of node(s) 
with BGP peering established = 0
解决方案:
调整calico网络插件的网卡发现机制,修改IP_AUTODETECTION_METHOD对应的value值。

官方提供的yaml文件中,ip识别策略(IPDETECTMETHOD)没有配置,即默认为first-found,
这会导致一个网络异常的ip作为nodeIP被注册,从而影响node-to-node mesh。
我们可以修改成can-reach或者interface的策略,尝试连接某一个Ready的node的IP,以此选择出正确的IP。

在calico.yaml文件中添加如下两行内容

- name: IP_AUTODETECTION_METHOD
  value: "interface=eth.*"  # eth根据实际网卡开头配置
实际配置如下:
# Cluster type to identify the deployment type
- name: CLUSTER_TYPE
  value: "k8s,bgp"
# Specify interface
- name: IP_AUTODETECTION_METHOD
  value: "interface=eth.*"
# Auto-detect the BGP IP address.
- name: IP
  value: "autodetect"
# Enable IPIP
- name: CALICO_IPV4POOL_IPIP
  value: "Always"

重新应用一下calico.yaml 文件
kubectl apply -f calico.yaml
image.png

2. 出现unknown container "/system.slice/docker.service"

May 27 03:50:19 master1 kubelet: E0527 03:50:19.606463    4333 summary.go:102] Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get container info for "/system.slice/docker.service": unknown container "/system.slice/docker.service"
May 27 03:50:19 master1 kubelet: E0527 03:50:19.607274    4333 summary.go:102] Failed to get system container stats for "/system.slice/kubelet.service": failed to get cgroup stats for "/system.slice/kubelet.service": failed to get container info for "/system.slice/kubelet.service": unknown container "/system.slice/kubelet.service"
May 27 03:50:20 master1 kubelet: E0527 03:50:20.029785    4333 summary.go:102] Failed to get system container stats for "/system.slice/kubelet.service": failed to get cgroup stats for "/system.slice/kubelet.service": failed to get container info for "/system.slice/kubelet.service": unknown container "/system.slice/kubelet.service"
May 27 03:50:20 master1 kubelet: E0527 03:50:20.029824    4333 summary.go:102] Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get container info for "/system.slice/docker.service": unknown container "/system.slice/docker.service"

解决办法:

Append "--runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slice" to "KUBELET_CGROUP_ARGS" in /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

1. # vi /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
# vi /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_SYSTEM_PODS_ARGS=--pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true"
Environment="KUBELET_NETWORK_ARGS=--network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin"
Environment="KUBELET_DNS_ARGS=--cluster-dns=10.96.0.10 --cluster-domain=cluster.local"
Environment="KUBELET_AUTHZ_ARGS=--authorization-mode=Webhook --client-ca-file=/etc/kubernetes/pki/ca.crt"
Environment="KUBELET_CADVISOR_ARGS=--cadvisor-port=0"

# 添加内容 start
Environment="KUBELET_CGROUP_ARGS=--cgroup-driver=systemd --runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slice"
# 添加内容 end

Environment="KUBELET_CERTIFICATE_ARGS=--rotate-certificates=true --cert-dir=/var/lib/kubelet/pki"
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_SYSTEM_PODS_ARGS $KUBELET_NETWORK_ARGS $KUBELET_DNS_ARGS $KUBELET_AUTHZ_ARGS $KUBELET_CADVISOR_ARGS $KUBELET_CGROUP_ARGS $KUBELET_CERTIFICATE_ARGS $KUBELET_EXTRA_ARGS
# sudo systemctl daemon-reload
# sudo systemctl restart kubelet

你可能感兴趣的:(k8s集群维护点滴)