Kubernetes QA

1.QA

Q:   控制Master节点是否可调度
A:   允许 kubectl taint nodes --all node-role.kubernetes.io/master- 
     禁止 kubectl taint nodes centos-master-1 node-role.kubernetes.io/master=true:NoSchedule
Q:   Jan 11 09:42:40 k8s78 kubelet[517]: E0111 09:42:40.935017     517 summary.go:92] Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get container info for "/system.slice/docker.service": unknown container "/system.slice/docker.service"
A:   在kubelet中追加配置
--runtime-cgroups=/sys/fs/cgroup/systemd/system.slice --kubelet-cgroups=/sys/fs/cgroup/systemd/system.slice

Q:   Failed to list *v1.Node: nodes is forbidden
A:   认证加Node,策略加NodeRestriction
--authorization-mode=Node,RBAC \
--admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,ResourceQuota,DefaultTolerationSeconds,NodeRestriction 

Q:   iptables中会自动增加服务的reject问题
A:   关于iptables规则中的KUBE-FIREWALL并不会影响服务,影响的主要原因是如下策略,该规则是动态产生不人为控制,
这也是kubernetes内部的熔断机制(类似于nginx的健康检查,k8s中的service也有),即当的服务中(只限有endpoint的服务)的所有容器都无法请求时,会自动增加reject做内部防护

Q:   Failed to start container manager: inotify_add_watch /sys/fs/cgroup/cpuacct,cpu: no such file or directory
A:   最新版cAdvisor的bug,记录下workaround
mount -o remount,rw '/sys/fs/cgroup'
ln -s /sys/fs/cgroup/cpu,cpuacct /sys/fs/cgroup/cpuacct,cpu

Q:   Failed to start cAdvisor inotify_add_watch /sys/fs/cgroup/cpu,cpuacct/system.slice/run-26637.scope: no space left on device
A:   cat /proc/sys/fs/inotify/max_user_watches # default is 8192
sudo sysctl fs.inotify.max_user_watches=1048576 # increase to 1048576

Q:   unable to get metrics for resource memory: unable to fetch metrics from resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)
A:   [ips@ips81 bin]$ curl 10.254.16.15:18082/apis/metrics/v1alpha1/nodes
      curl: (7) Failed connect to 10.254.16.15:18082; Connection refused
     [ips@ips81 bin]$ ./kubectl top node
     Error from server (ServiceUnavailable): the server is currently unable to handle the request (get services http:heapster:)

    1.由于flannel网络不通
    2./kubectl -n kube-system get ep 查看端口是否正确

Q:   Metric-server: x509: subject with cn=front-proxy-client is not in the allowed list: [aggregator]
A:   请求头部标识不正确,在kube-apiserver中增加配置
     --requestheader-allowed-names=aggregator,front-proxy-client 

Q:  failed to register unfinished metric admission_quota_controller: duplicate metrics collector registration attempted

2.相关问题思路

  • 通过VIP+keepalived实现高可用,如果master和node在同一台节点,当master切换时,原Master上的node无法连接到VIP对的应新的Master,也就是VIP浮动导致连接无效的问题(https://github.com/kubernetes/kubernetes/issues/48638)
现象: Error updating node status, will retry: error getting node "10.1.235.82": Get https://10.1.235.7:8443/api/v1/nodes/10.1.235.82?timeout=10s: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

分析: 刚开始的思路是由于TCP的长连接导致,kubelet的连接没有断开 ,而且现象也一直是有连接,但是端口没变,IP使用虚拟IP
原因: 由于kubelet和连接虚拟IP时是是通过虚拟IP连接的,当切换完成后IP虽然切换但是仍然有只是在其他主机上而已,需要让kubelet连接时使用本机的IP,而非虚拟IP。解决办法需要在keepalived中调整配置。
方案: 
#keepalived调整前:
virtual_ipaddress {
    10.1.235.7
}

#keepalived调整后:一定要指定网络所对应的子网掩码
virtual_ipaddress {
    10.1.235.7/24
}


#调整前网卡信息:
eth0:  mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether fa:16:3e:e6:a9:c8 brd ff:ff:ff:ff:ff:ff
    inet 10.1.235.82/24 brd 10.1.235.255 scope global dynamic eth0
       valid_lft 62206361sec preferred_lft 62206361sec
    inet 10.1.235.7/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fee6:a9c8/64 scope link 
       valid_lft forever preferred_lft forever

#调整后网卡信息:
eth0:  mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether fa:16:3e:e6:a9:c8 brd ff:ff:ff:ff:ff:ff
    inet 10.1.235.82/24 brd 10.1.235.255 scope global dynamic eth0
       valid_lft 62198673sec preferred_lft 62198673sec
    inet 10.1.235.7/24 scope global secondary eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fee6:a9c8/64 scope link 
       valid_lft forever preferred_lft forever

# 调整前kubelet和apiserver的连接信息:
tcp        0      0 10.1.235.7:44134       10.1.235.7:8443         ESTABLISHED 23559/kubelet

# 调整后kubelet和apiserver的连接信息:
tcp        0      0 10.1.235.82:44134       10.1.235.7:8443         ESTABLISHED 23559/kubelet
  • 停止docker前没有删除相应的容器
Q:  rm: cannot remove ‘work/kubernetes/kubelet/pods/736b274c-d68a-11e8-8c3b-001b21992e84/volumes/kubernetes.io~secret/calico-node-token-9kss7’: Device or resource busy
A:  cat /proc/mounts |grep "kube" |awk '{print $2}' |xargs umount
  • 8calico容器访问外网
iptables -t nat -A POSTROUTING -s 192.168.0.0/24  -j MASQUERADE
iptables -t nat -A POSTROUTING -s 192.168.0.0/24 ! -d 192.168.89.0/16 -j MASQUERADE

  • Deployment滚动更新过程中流量负载均衡异常,会出现丢失请求的情况
分析: Pod Terminating过程中,有些机器的Iptable还未刷新,导致部分流量仍然请求到Terminating的Pod上,导致请求出错。
方案: 利用Kubernetes的preStop特性为每个Pod设置一个退出时间,让每个Pod收到退出信号时时默认等待一段时间再退出。
  • Kubernetes1.9之前Apiserver挂掉之后Kubernetes Endpoints不更新,导致部分访问失败
分析: Kubernetes1.9之前只要Apiserver启动成功Kubernetes Endpoints便不再更新,需手动维护。 
方案: 升级到Kubernetes 1.10版本后设置 –endpoint-reconciler-type = lease 
Use an endpoint reconciler (master-count, lease, none)

15.pids 无法mount 【version1.5.1】

Jul 31 11:12:08 node1 kubelet[16285]: F0731 11:12:08.727594   16285 kubelet.go:1370] Failed to start ContainerManager failed to initialize top level QOS containers: failed to update top level Burstable QOS cgroup : failed to set supported cgroup subsystems for cgroup [kubepods burstable]: Failed to find subsystem mount for required subsystem: pids
#操作系统不支持pids subsystem,升级操作系统
[root@node1 /]# uname -r
3.10.0-327.el7.x86_64
[root@node1 /]# cat /proc/cgroups
#subsys_name    hierarchy       num_cgroups     enabled
cpuset  3       4       1
cpu     6       59      1
cpuacct 6       59      1
memory  4       59      1
devices 9       59      1
freezer 5       4       1
net_cls 7       4       1
blkio   8       59      1
perf_event      2       4       1
hugetlb 10      4       1

3.待解决

待解决问题:/sys/fs/cgroup/memory/system.slice下run-*.scope 很多

你可能感兴趣的:(Kubernetes QA)