dashboard与coredns服务启动发生ContainerCreating的对应方法

这篇文章memo一下kubernetes的coredns和dashbaord无法正常启动,始终处在ContainerCreating的一种情况的过程分析和解决方法。

现象

[root@localhost ansible]# kubectl get pods -n kube-system
NAME                                    READY   STATUS              RESTARTS   AGE
coredns-b7d8c5745-4qxnh                 0/1     ContainerCreating   0          168m
kubernetes-dashboard-7d75c474bb-pq5fc   0/1     ContainerCreating   0          168m
[root@localhost ansible]# 

过程分析

首先使用describe命令确认一下pod的详细信息,发现主要问题提示为:
starting container process caused "process_linux.go:303: getting the final child's pid from pipe caused \"EOF\""
详细信息如下:

[root@localhost ansible]# kubectl describe pod coredns-b7d8c5745-4qxnh -n kube-system
Name:           coredns-b7d8c5745-4qxnh
Namespace:      kube-system
Priority:       0
Node:           192.168.211.200/192.168.211.200
Start Time:     Wed, 14 Aug 2019 17:54:13 +0800
Labels:         k8s-app=kube-dns
                pod-template-hash=b7d8c5745
Annotations:    seccomp.security.alpha.kubernetes.io/pod: docker/default
Status:         Pending
IP:             
Controlled By:  ReplicaSet/coredns-b7d8c5745
Containers:
  coredns:
    Container ID:  
    Image:         k8s.gcr.io/coredns:1.2.6
    Image ID:      
    Ports:         53/UDP, 53/TCP, 9153/TCP
    Host Ports:    0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Environment:  
    Mounts:
      /etc/coredns from config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from coredns-token-8m4wv (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns
    Optional:  false
  coredns-token-8m4wv:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  coredns-token-8m4wv
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  
Tolerations:     CriticalAddonsOnly
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                  Age                    From                      Message
  ----     ------                  ----                   ----                      -------
  Warning  FailedCreatePodSandBox  16m (x7551 over 157m)  kubelet, 192.168.211.200  Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "coredns-b7d8c5745-4qxnh": Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:303: getting the final child's pid from pipe caused \"EOF\"": unknown
  Normal   SandboxChanged          69s (x9193 over 171m)  kubelet, 192.168.211.200  Pod sandbox changed, it will be killed and re-created.
[root@localhost ansible]# 

确认另外一个pod,主要的错误信息如下所示

Warning  FailedCreatePodSandBox  31s (x825 over 15m)  kubelet, 10.0.2.15  Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "kubernetes-dashboard-7d75c474bb-j9tmq": Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:430: container init caused \"write /proc/self/attr/keycreate: permission denied\"": unknown

然后确认了一下kubelet相关信息,未得到更为详细的信息。在确认各个组件的状态时,发现flannel未正常起效,信息如下所示,docker0仍按照缺省的地址进行了规划。

[root@localhost ansible]# ip addr show docker0
4: docker0:  mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:92:86:ff:1e brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
[root@localhost ansible]# ip addr show flannel.1
3: flannel.1:  mtu 1450 qdisc noqueue state UNKNOWN group default 
    link/ether 66:1b:3e:2e:7d:af brd ff:ff:ff:ff:ff:ff
    inet 10.254.104.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
[root@localhost ansible]# 

手动设定flannel,重启flannel和docker服务,使得起正常有效之后仍然没有解决此问题。

对应方法

在docker的github上发现如下两个issue跟此问题有所关联,主要是write /proc/self/attr/keycreate: permission denied的提示信息关联较为紧密,此为selinux关联,

  • https://github.com/moby/moby/issues/39109
  • https://github.com/haxorof/ansible-role-docker-ce/issues/107

总结一下对应方法,有如下两个:

  1. 把selinux设定为true (enforcing + dockerd服务中的selinux=false改为true)
  2. 把selinux的版本升级一下

注:,把selinux设定为disabled,在功能学习或者特性验证的早期时候,可以省去很多麻烦, 虽然setenforce也有使用,考虑到这个可能是由于selinux真正起效所关联的问题,所以考虑使用了上述未提到的如下方式先行试验一下。

  1. 保证selinux为disabled状态下(把/etc/selinux/config里面的SELINUX=disabled设定上),reboot机器,重新安装。
  • 重启之后,重新安装问题解决

总结

基本判定此未selinux关联的问题,但是由于在使用的时候已经将selinux设定为disabled,加之同样的Ansible脚本在其他几乎完全一样的环境中毫无问题,另外已经同时使用setenforcing 0进行了设定,但是依然存在问题,可能在不同的环境仍然有其他原因导致。

你可能感兴趣的:(kubernetes,k8s,selinux,coredns,dashboard,Kubernetes)