pod一直处于ContainerCreating,查看报错信息为挂载错误MountVolume.SetUp failed for volume

背景,在搭建redis集群时,使用的是nfs挂载卷,中途我好像把挂载盘的文件移走了,当我再次启动pod时就出现挂载错误。

[root@master redis-cluster-sts]# kubectl apply -f redis-sts.yml
configmap/redis-cluster created
statefulset.apps/redis-cluster created
[root@master redis-cluster-sts]# kubectl get pods
NAME                                      READY   STATUS              RESTARTS   AGE
nfs-client-provisioner-78cbf94495-4zwqj   1/1     Running             4          39h
redis-cluster-0                           0/1     ContainerCreating   0          4s
[root@master redis-cluster-sts]# kubectl describe pods redis-cluster-0
Name:           redis-cluster-0
Namespace:      default
Priority:       0
Node:           node3/172.31.17.120
Start Time:     Fri, 15 May 2020 10:03:14 +0800
Labels:         app=redis
                appCluster=redis-cluster
                controller-revision-hash=redis-cluster-75f6c9b7c8
                statefulset.kubernetes.io/pod-name=redis-cluster-0
Annotations:    <none>
Status:         Pending
IP:
IPs:            <none>
Controlled By:  StatefulSet/redis-cluster
Containers:
  redis:
    Container ID:
    Image:         redis:latest
    Image ID:
    Ports:         6379/TCP, 16379/TCP
    Host Ports:    0/TCP, 0/TCP
    Command:
      /conf/update-node.sh
      redis-server
      /conf/redis.conf
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:
      POD_IP:          (v1:status.podIP)
      METADATA_NAME:  redis-cluster-0 (v1:metadata.name)
    Mounts:
      /conf from conf (rw)
      /data from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-jpxkg (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-redis-cluster-0
    ReadOnly:   false
  conf:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      redis-cluster
    Optional:  false
  default-token-jpxkg:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-jpxkg
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason       Age        From               Message
  ----     ------       ----       ----               -------
  Normal   Scheduled    <unknown>  default-scheduler  Successfully assigned default/redis-cluster-0 to node3
  Warning  FailedMount  16s        kubelet, node3     MountVolume.SetUp failed for volume "pvc-6cea9fce-006d-40f9-8e4a-3c27f323ef80" : mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/d73302e3-da43-4d4f-b218-68e0a6cbbcb6/volumes/kubernetes.io~nfs/pvc-6cea9fce-006d-40f9-8e4a-3c27f323ef80 --scope -- mount -t nfs vol:/share/nfs/default-data-redis-cluster-0-pvc-6cea9fce-006d-40f9-8e4a-3c27f323ef80 /var/lib/kubelet/pods/d73302e3-da43-4d4f-b218-68e0a6cbbcb6/volumes/kubernetes.io~nfs/pvc-6cea9fce-006d-40f9-8e4a-3c27f323ef80
Output: Running scope as unit run-18199.scope.
mount.nfs: mounting vol:/share/nfs/default-data-redis-cluster-0-pvc-6cea9fce-006d-40f9-8e4a-3c27f323ef80 failed, reason given by server: No such file or directory
  Warning  FailedMount  15s  kubelet, node3  MountVolume.SetUp failed for volume "pvc-6cea9fce-006d-40f9-8e4a-3c27f323ef80" : mount failed: exit status 32

排错:
1、以为是nfs出问题了,于是在外部新建一个目录,挂载到vol的/share/nfs,发现是挂载成功的。

#查看vol分享出来的目录
[root@master redis-cluster-sts]# showmount -e vol
Export list for vol:
/share                    172.31.0.0/16
/data/opv                 172.31.0.0/16
/home/ttebdadmin/nfs/data 172.31.0.0/16
#创建一个文件
[root@master redis-cluster-sts]# mkdir share
#将vol:/share/nfs挂载至此目录下
[root@master redis-cluster-sts]# mount vol:/share/nfs share/
#挂载成功
[root@master redis-cluster-sts]# mount | grep vol:/share/nfs
vol:/share/nfs on /usr/local/ttebd/k8s-base-v16.2/redis-cluster-sts/share type nfs4 (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=172.31.17.51,local_lock=none,addr=172.31.17.54)
#查看nfs目录下的文件已经能够查看了
[root@master redis-cluster-sts]# ls share/
default-redis-cluster-data-redis-app-0-pvc-e2069f3b-d370-4ad1-aeca-ac64f8c426ad
kafka-datadir-zk-0-pvc-f4dd3352-f8ba-4977-b154-882ea509e7af
kafka-datadir-zk-1-pvc-701b3432-aa3b-440e-8cac-81b1e6de69a3
kafka-datadir-zk-2-pvc-8bbd84fb-fdfc-4aae-ab98-29fa971d19b9
nodes.conf
update-node.sh

至此说明nfs是没问题的,上面报错表示文件路径找不到,再看下报错信息vol:/share/nfs/default-data-redis-cluster-0-pvc-6cea9fce-006d-40f9-8e4a-3c27f323ef80 是default-data-XXX文件找不到,查看nfs目录下确实是没有这个文件的,于是我想到是不是有缓存,于是换了个挂载路径,重启了主机,发现还是不行,一直会找这个地址,最终发现其实是pvc,查看pvc和pv

[root@master redis-cluster-sts]# kubectl get pvc
NAME                             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
data-redis-cluster-0             Bound    pvc-6cea9fce-006d-40f9-8e4a-3c27f323ef80   1Gi        RWO            nfs-storage    36h
data-redis-cluster-1             Bound    pvc-527fe5c7-0cb4-4816-825d-44e471299206   1Gi        RWO            nfs-storage    36h
data-redis-cluster-2             Bound    pvc-bad20305-2b9c-4c81-8383-5b2a011c18c1   1Gi        RWO            nfs-storage    36h
data-redis-cluster-3             Bound    pvc-40f40264-0457-4b50-8f80-fa57571d0acf   1Gi        RWO            nfs-storage    36h
data-redis-cluster-4             Bound    pvc-eafb5c30-b432-4d8a-9cb4-9b52de34b0d4   1Gi        RWO            nfs-storage    36h
data-redis-cluster-5             Bound    pvc-659e52ec-bf6e-4d51-b549-91dfbf0a941a   1Gi        RWO            nfs-storage    36h
redis-cluster-data-redis-app-0   Bound    pvc-e2069f3b-d370-4ad1-aeca-ac64f8c426ad   1G         RWO,RWX        nfs-storage    36h
[root@master redis-cluster-sts]# kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                    STORAGECLASS   REASON   AGE
pvc-40f40264-0457-4b50-8f80-fa57571d0acf   1Gi        RWO            Retain           Bound    default/data-redis-cluster-3             nfs-storage             36h
pvc-527fe5c7-0cb4-4816-825d-44e471299206   1Gi        RWO            Retain           Bound    default/data-redis-cluster-1             nfs-storage             36h
pvc-659e52ec-bf6e-4d51-b549-91dfbf0a941a   1Gi        RWO            Retain           Bound    default/data-redis-cluster-5             nfs-storage             36h
pvc-6cea9fce-006d-40f9-8e4a-3c27f323ef80   1Gi        RWO            Retain           Bound    default/data-redis-cluster-0             nfs-storage             36h
pvc-701b3432-aa3b-440e-8cac-81b1e6de69a3   1G         RWX            Retain           Bound    kafka/datadir-zk-1                       nfs-storage             35h
pvc-8bbd84fb-fdfc-4aae-ab98-29fa971d19b9   1G         RWX            Retain           Bound    kafka/datadir-zk-2                       nfs-storage             35h
pvc-bad20305-2b9c-4c81-8383-5b2a011c18c1   1Gi        RWO            Retain           Bound    default/data-redis-cluster-2             nfs-storage             36h
pvc-e2069f3b-d370-4ad1-aeca-ac64f8c426ad   1G         RWO,RWX        Retain           Bound    default/redis-cluster-data-redis-app-0   nfs-storage             36h
pvc-eafb5c30-b432-4d8a-9cb4-9b52de34b0d4   1Gi        RWO            Retain           Bound    default/data-redis-cluster-4             nfs-storage             36h
pvc-f4dd3352-f8ba-4977-b154-882ea509e7af   1G         RWX            Retain           Bound    kafka/datadir-zk-0                       nfs-storage             35h

再仔细查看上面的报错

  Warning  FailedMount  16s        kubelet, node3     MountVolume.SetUp failed for volume "pvc-6cea9fce-006d-40f9-8e4a-3c27f323ef80" : mount failed: exit status 32

是挂载pv时报错,查看pv信息

[root@master redis-cluster-sts]# kubectl describe pv pvc-40f40264-0457-4b50-8f80-fa57571d0acf
Name:            pvc-40f40264-0457-4b50-8f80-fa57571d0acf
Labels:          <none>
Annotations:     pv.kubernetes.io/provisioned-by: fuseim.pri/ifs
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    nfs-storage
Status:          Bound
Claim:           default/data-redis-cluster-3
Reclaim Policy:  Retain
Access Modes:    RWO
VolumeMode:      Filesystem
Capacity:        1Gi
Node Affinity:   <none>
Message:
Source:
    Type:      NFS (an NFS mount that lasts the lifetime of a pod)
    Server:    vol
    Path:      /share/nfs/default-data-redis-cluster-3-pvc-40f40264-0457-4b50-8f80-fa57571d0acf
    ReadOnly:  false
Events:        <none>

pod启动时查找到对应的pv,然后查找pv对应的路径,发现文件路径不存在,于是报错,解决办法
1、新建相关的文件,例如在vol的/share/nfs目录下新建default-data-redis-cluster-0-pvc-6cea9fce-006d-40f9-8e4a-3c27f323ef80文件。

#vol上操作,新建文件
[root@vol nfs]# mkdir default-data-redis-cluster-0-pvc-6cea9fce-006d-40f9-8e4a-3c27f323ef80
#赋予权限
[root@vol nfs]# chmod 777 default-data-redis-cluster-0-pvc-6cea9fce-006d-40f9-8e4a-3c27f323ef80
[root@vol nfs]# ls
default-data-redis-cluster-0-pvc-6cea9fce-006d-40f9-8e4a-3c27f323ef80
default-redis-cluster-data-redis-app-0-pvc-e2069f3b-d370-4ad1-aeca-ac64f8c426ad
kafka-datadir-zk-0-pvc-f4dd3352-f8ba-4977-b154-882ea509e7af
kafka-datadir-zk-1-pvc-701b3432-aa3b-440e-8cac-81b1e6de69a3
kafka-datadir-zk-2-pvc-8bbd84fb-fdfc-4aae-ab98-29fa971d19b9
nodes.conf
update-node.sh

# master上查看pod状态,发现redis-cluster-0这个pod跑起来了,要想其他的也跑起来,需要把其他的文件也补起来
[root@master redis-cluster-sts]# kubectl get pods
NAME                                      READY   STATUS              RESTARTS   AGE
nfs-client-provisioner-78cbf94495-4zwqj   1/1     Running             4          40h
redis-cluster-0                           1/1     Running             0          44m
redis-cluster-1                           0/1     ContainerCreating   0          15m

上面那种办法在找不到之前的文件的情况下恢复起来很麻烦,最简单的办法就是删除pvc和pv,让它自己重新创建文件

2、删除pvc及pv,重新启动pod即可。

[root@master redis-cluster-sts]# kubectl delete -f redis-sts.yml
configmap "redis-cluster" deleted
statefulset.apps "redis-cluster" deleted

#删除pvc,注意0-5都需删除
[root@master redis-cluster-sts]# kubectl delete pvc data-redis-cluster-0

#若pvc一直处于Terminating状态,则可使用patch直接删除
[root@master redis-cluster-sts]# kubectl patch pvc data-redis-cluster-0  -p '{"metadata":{"finalizers":null}}' -n default
persistentvolumeclaim/data-redis-cluster-0 patched

#查看pv
[root@master redis-cluster-sts]# kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS     CLAIM                                    STORAGECLASS   REASON   AGE
pvc-40f40264-0457-4b50-8f80-fa57571d0acf   1Gi        RWO            Retain           Released   default/data-redis-cluster-3             nfs-storage             36h
pvc-527fe5c7-0cb4-4816-825d-44e471299206   1Gi        RWO            Retain           Released   default/data-redis-cluster-1             nfs-storage             36h
pvc-659e52ec-bf6e-4d51-b549-91dfbf0a941a   1Gi        RWO            Retain           Released   default/data-redis-cluster-5             nfs-storage             36h
pvc-6cea9fce-006d-40f9-8e4a-3c27f323ef80   1Gi        RWO            Retain           Released   default/data-redis-cluster-0             nfs-storage             36h
pvc-bad20305-2b9c-4c81-8383-5b2a011c18c1   1Gi        RWO            Retain           Released   default/data-redis-cluster-2             nfs-storage             36h
pvc-eafb5c30-b432-4d8a-9cb4-9b52de34b0d4   1Gi        RWO            Retain           Released   default/data-redis-cluster-4             nfs-storage             36h

#删除pv,注意0-5都需删除
[root@master redis-cluster-sts]# kubectl delete pv pvc-527fe5c7-0cb4-4816-825d-44e471299206
persistentvolume "pvc-527fe5c7-0cb4-4816-825d-44e471299206" deleted

重新运行redis-sts.yaml,查看pod运行正常

[root@master redis-cluster-sts]# kubectl apply -f redis-sts.yml
configmap/redis-cluster created
statefulset.apps/redis-cluster created
[root@master redis-cluster-sts]# kubectl get pods
NAME                                      READY   STATUS    RESTARTS   AGE
nfs-client-provisioner-78cbf94495-4zwqj   1/1     Running   4          40h
redis-cluster-0                           1/1     Running   0          70s
redis-cluster-1                           1/1     Running   0          66s
redis-cluster-2                           1/1     Running   0          63s
redis-cluster-3                           1/1     Running   0          59s
redis-cluster-4                           1/1     Running   0          55s
redis-cluster-5                           1/1     Running   0          52s

你可能感兴趣的:(k8s)