启动 Pod 会调 CRI 接口创建容器,容器运行时创建容器时通常会在数据目录下为新建的容器创建一些目录和文件,如果数据目录所在的磁盘空间满了就会创建失败并报错:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreatePodSandBox 2m (x4307 over 16h) kubelet, 10.179.80.31 (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = failed to create a sandbox for pod "apigateway-6dc48bf8b6-l8xrw": Error response from daemon: mkdir /var/lib/docker/aufs/mnt/1f09d6c1c9f24e8daaea5bf33a4230de7dbc758e3b22785e8ee21e3e3d921214-init: no space left on device
如果节点上内存碎片化严重,缺少大页内存,会导致即使总的剩余内存较多,但还是会申请内存失败,
如果 limit 设置过小以至于不足以成功运行 Sandbox 也会造成这种状态,常见的是因为 memory limit 单位设置不对造成的 limit 过小,比如误将 memory 的 limit 单位像 request 一样设置为小 m,这个单位在 memory 不适用,会被 k8s 识别成 byte, 应该用 Mi 或 M。,
举个例子: 如果 memory limit 设为 1024m 表示限制 1.024 Byte,这么小的内存, pause 容器一起来就会被 cgroup-oom kill 掉,导致 pod 状态一直处于 ContainerCreating。
这种情况通常会报下面的 event:
Pod sandbox changed, it will be killed and re-created。
kubelet 报错:
to start sandbox container for pod ... Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:301: running exec setns process for init caused \"signal: killed\"": unknown
镜像拉取失败也分很多情况,这里列举下:
发生 CNI 网络错误, 通常需要检查下网络插件的配置和运行状态,如果没有正确配置或正常运行通常表现为:
查看 master 上 kube-controller-manager 状态,异常的话尝试重启。
如果节点上本身有 docker 或者没删干净,然后又安装 docker,比如在 centos 上用 yum 安装:
yum install -y docker
这样可能会导致 dockerd 创建容器一直不成功,从而 Pod 状态一直 ContainerCreating,查看 event 报错:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreatePodSandBox 18m (x3583 over 83m) kubelet, 192.168.4.5 (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "nginx-7db9fccd9b-2j6dh": Error response from daemon: ttrpc: client shutting down: read unix @->@/containerd-shim/moby/de2bfeefc999af42783115acca62745e6798981dff75f4148fae8c086668f667/shim.sock: read: connection reset by peer: unknown
Normal SandboxChanged 3m12s (x4420 over 83m) kubelet, 192.168.4.5 Pod sandbox changed, it will be killed and re-created.
可能是因为重复安装 docker 版本不一致导致一些组件之间不兼容,从而导致 dockerd 无法正常创建容器。
如果节点上已有同名容器,创建 sandbox 就会失败,event:
Warning FailedCreatePodSandBox 2m kubelet, 10.205.8.91 Failed create pod sandbox: rpc error: code = Unknown desc = failed to create a sandbox for pod "lomp-ext-d8c8b8c46-4v8tl": operation timeout: context deadline exceeded
Warning FailedCreatePodSandBox 3s (x12 over 2m) kubelet, 10.205.8.91 Failed create pod sandbox