这篇文章memo一下kubernetes的coredns和dashbaord无法正常启动,始终处在ContainerCreating的一种情况的过程分析和解决方法。
[root@localhost ansible]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-b7d8c5745-4qxnh 0/1 ContainerCreating 0 168m
kubernetes-dashboard-7d75c474bb-pq5fc 0/1 ContainerCreating 0 168m
[root@localhost ansible]#
首先使用describe命令确认一下pod的详细信息,发现主要问题提示为:
starting container process caused "process_linux.go:303: getting the final child's pid from pipe caused \"EOF\""
详细信息如下:
[root@localhost ansible]# kubectl describe pod coredns-b7d8c5745-4qxnh -n kube-system
Name: coredns-b7d8c5745-4qxnh
Namespace: kube-system
Priority: 0
Node: 192.168.211.200/192.168.211.200
Start Time: Wed, 14 Aug 2019 17:54:13 +0800
Labels: k8s-app=kube-dns
pod-template-hash=b7d8c5745
Annotations: seccomp.security.alpha.kubernetes.io/pod: docker/default
Status: Pending
IP:
Controlled By: ReplicaSet/coredns-b7d8c5745
Containers:
coredns:
Container ID:
Image: k8s.gcr.io/coredns:1.2.6
Image ID:
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Environment:
Mounts:
/etc/coredns from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from coredns-token-8m4wv (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
coredns-token-8m4wv:
Type: Secret (a volume populated by a Secret)
SecretName: coredns-token-8m4wv
Optional: false
QoS Class: Burstable
Node-Selectors:
Tolerations: CriticalAddonsOnly
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreatePodSandBox 16m (x7551 over 157m) kubelet, 192.168.211.200 Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "coredns-b7d8c5745-4qxnh": Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:303: getting the final child's pid from pipe caused \"EOF\"": unknown
Normal SandboxChanged 69s (x9193 over 171m) kubelet, 192.168.211.200 Pod sandbox changed, it will be killed and re-created.
[root@localhost ansible]#
确认另外一个pod,主要的错误信息如下所示
Warning FailedCreatePodSandBox 31s (x825 over 15m) kubelet, 10.0.2.15 Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "kubernetes-dashboard-7d75c474bb-j9tmq": Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:430: container init caused \"write /proc/self/attr/keycreate: permission denied\"": unknown
然后确认了一下kubelet相关信息,未得到更为详细的信息。在确认各个组件的状态时,发现flannel未正常起效,信息如下所示,docker0仍按照缺省的地址进行了规划。
[root@localhost ansible]# ip addr show docker0
4: docker0: mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:92:86:ff:1e brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
[root@localhost ansible]# ip addr show flannel.1
3: flannel.1: mtu 1450 qdisc noqueue state UNKNOWN group default
link/ether 66:1b:3e:2e:7d:af brd ff:ff:ff:ff:ff:ff
inet 10.254.104.0/32 scope global flannel.1
valid_lft forever preferred_lft forever
[root@localhost ansible]#
手动设定flannel,重启flannel和docker服务,使得起正常有效之后仍然没有解决此问题。
在docker的github上发现如下两个issue跟此问题有所关联,主要是write /proc/self/attr/keycreate: permission denied的提示信息关联较为紧密,此为selinux关联,
总结一下对应方法,有如下两个:
注:,把selinux设定为disabled,在功能学习或者特性验证的早期时候,可以省去很多麻烦, 虽然setenforce也有使用,考虑到这个可能是由于selinux真正起效所关联的问题,所以考虑使用了上述未提到的如下方式先行试验一下。
基本判定此未selinux关联的问题,但是由于在使用的时候已经将selinux设定为disabled,加之同样的Ansible脚本在其他几乎完全一样的环境中毫无问题,另外已经同时使用setenforcing 0进行了设定,但是依然存在问题,可能在不同的环境仍然有其他原因导致。