背景,在搭建redis集群时,使用的是nfs挂载卷,中途我好像把挂载盘的文件移走了,当我再次启动pod时就出现挂载错误。
[root@master redis-cluster-sts]# kubectl apply -f redis-sts.yml
configmap/redis-cluster created
statefulset.apps/redis-cluster created
[root@master redis-cluster-sts]# kubectl get pods
NAME READY STATUS RESTARTS AGE
nfs-client-provisioner-78cbf94495-4zwqj 1/1 Running 4 39h
redis-cluster-0 0/1 ContainerCreating 0 4s
[root@master redis-cluster-sts]# kubectl describe pods redis-cluster-0
Name: redis-cluster-0
Namespace: default
Priority: 0
Node: node3/172.31.17.120
Start Time: Fri, 15 May 2020 10:03:14 +0800
Labels: app=redis
appCluster=redis-cluster
controller-revision-hash=redis-cluster-75f6c9b7c8
statefulset.kubernetes.io/pod-name=redis-cluster-0
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: StatefulSet/redis-cluster
Containers:
redis:
Container ID:
Image: redis:latest
Image ID:
Ports: 6379/TCP, 16379/TCP
Host Ports: 0/TCP, 0/TCP
Command:
/conf/update-node.sh
redis-server
/conf/redis.conf
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment:
POD_IP: (v1:status.podIP)
METADATA_NAME: redis-cluster-0 (v1:metadata.name)
Mounts:
/conf from conf (rw)
/data from data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-jpxkg (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-redis-cluster-0
ReadOnly: false
conf:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: redis-cluster
Optional: false
default-token-jpxkg:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-jpxkg
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/redis-cluster-0 to node3
Warning FailedMount 16s kubelet, node3 MountVolume.SetUp failed for volume "pvc-6cea9fce-006d-40f9-8e4a-3c27f323ef80" : mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/d73302e3-da43-4d4f-b218-68e0a6cbbcb6/volumes/kubernetes.io~nfs/pvc-6cea9fce-006d-40f9-8e4a-3c27f323ef80 --scope -- mount -t nfs vol:/share/nfs/default-data-redis-cluster-0-pvc-6cea9fce-006d-40f9-8e4a-3c27f323ef80 /var/lib/kubelet/pods/d73302e3-da43-4d4f-b218-68e0a6cbbcb6/volumes/kubernetes.io~nfs/pvc-6cea9fce-006d-40f9-8e4a-3c27f323ef80
Output: Running scope as unit run-18199.scope.
mount.nfs: mounting vol:/share/nfs/default-data-redis-cluster-0-pvc-6cea9fce-006d-40f9-8e4a-3c27f323ef80 failed, reason given by server: No such file or directory
Warning FailedMount 15s kubelet, node3 MountVolume.SetUp failed for volume "pvc-6cea9fce-006d-40f9-8e4a-3c27f323ef80" : mount failed: exit status 32
排错:
1、以为是nfs出问题了,于是在外部新建一个目录,挂载到vol的/share/nfs,发现是挂载成功的。
#查看vol分享出来的目录
[root@master redis-cluster-sts]# showmount -e vol
Export list for vol:
/share 172.31.0.0/16
/data/opv 172.31.0.0/16
/home/ttebdadmin/nfs/data 172.31.0.0/16
#创建一个文件
[root@master redis-cluster-sts]# mkdir share
#将vol:/share/nfs挂载至此目录下
[root@master redis-cluster-sts]# mount vol:/share/nfs share/
#挂载成功
[root@master redis-cluster-sts]# mount | grep vol:/share/nfs
vol:/share/nfs on /usr/local/ttebd/k8s-base-v16.2/redis-cluster-sts/share type nfs4 (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=172.31.17.51,local_lock=none,addr=172.31.17.54)
#查看nfs目录下的文件已经能够查看了
[root@master redis-cluster-sts]# ls share/
default-redis-cluster-data-redis-app-0-pvc-e2069f3b-d370-4ad1-aeca-ac64f8c426ad
kafka-datadir-zk-0-pvc-f4dd3352-f8ba-4977-b154-882ea509e7af
kafka-datadir-zk-1-pvc-701b3432-aa3b-440e-8cac-81b1e6de69a3
kafka-datadir-zk-2-pvc-8bbd84fb-fdfc-4aae-ab98-29fa971d19b9
nodes.conf
update-node.sh
至此说明nfs是没问题的,上面报错表示文件路径找不到,再看下报错信息vol:/share/nfs/default-data-redis-cluster-0-pvc-6cea9fce-006d-40f9-8e4a-3c27f323ef80 是default-data-XXX文件找不到,查看nfs目录下确实是没有这个文件的,于是我想到是不是有缓存,于是换了个挂载路径,重启了主机,发现还是不行,一直会找这个地址,最终发现其实是pvc,查看pvc和pv
[root@master redis-cluster-sts]# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
data-redis-cluster-0 Bound pvc-6cea9fce-006d-40f9-8e4a-3c27f323ef80 1Gi RWO nfs-storage 36h
data-redis-cluster-1 Bound pvc-527fe5c7-0cb4-4816-825d-44e471299206 1Gi RWO nfs-storage 36h
data-redis-cluster-2 Bound pvc-bad20305-2b9c-4c81-8383-5b2a011c18c1 1Gi RWO nfs-storage 36h
data-redis-cluster-3 Bound pvc-40f40264-0457-4b50-8f80-fa57571d0acf 1Gi RWO nfs-storage 36h
data-redis-cluster-4 Bound pvc-eafb5c30-b432-4d8a-9cb4-9b52de34b0d4 1Gi RWO nfs-storage 36h
data-redis-cluster-5 Bound pvc-659e52ec-bf6e-4d51-b549-91dfbf0a941a 1Gi RWO nfs-storage 36h
redis-cluster-data-redis-app-0 Bound pvc-e2069f3b-d370-4ad1-aeca-ac64f8c426ad 1G RWO,RWX nfs-storage 36h
[root@master redis-cluster-sts]# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-40f40264-0457-4b50-8f80-fa57571d0acf 1Gi RWO Retain Bound default/data-redis-cluster-3 nfs-storage 36h
pvc-527fe5c7-0cb4-4816-825d-44e471299206 1Gi RWO Retain Bound default/data-redis-cluster-1 nfs-storage 36h
pvc-659e52ec-bf6e-4d51-b549-91dfbf0a941a 1Gi RWO Retain Bound default/data-redis-cluster-5 nfs-storage 36h
pvc-6cea9fce-006d-40f9-8e4a-3c27f323ef80 1Gi RWO Retain Bound default/data-redis-cluster-0 nfs-storage 36h
pvc-701b3432-aa3b-440e-8cac-81b1e6de69a3 1G RWX Retain Bound kafka/datadir-zk-1 nfs-storage 35h
pvc-8bbd84fb-fdfc-4aae-ab98-29fa971d19b9 1G RWX Retain Bound kafka/datadir-zk-2 nfs-storage 35h
pvc-bad20305-2b9c-4c81-8383-5b2a011c18c1 1Gi RWO Retain Bound default/data-redis-cluster-2 nfs-storage 36h
pvc-e2069f3b-d370-4ad1-aeca-ac64f8c426ad 1G RWO,RWX Retain Bound default/redis-cluster-data-redis-app-0 nfs-storage 36h
pvc-eafb5c30-b432-4d8a-9cb4-9b52de34b0d4 1Gi RWO Retain Bound default/data-redis-cluster-4 nfs-storage 36h
pvc-f4dd3352-f8ba-4977-b154-882ea509e7af 1G RWX Retain Bound kafka/datadir-zk-0 nfs-storage 35h
再仔细查看上面的报错
Warning FailedMount 16s kubelet, node3 MountVolume.SetUp failed for volume "pvc-6cea9fce-006d-40f9-8e4a-3c27f323ef80" : mount failed: exit status 32
是挂载pv时报错,查看pv信息
[root@master redis-cluster-sts]# kubectl describe pv pvc-40f40264-0457-4b50-8f80-fa57571d0acf
Name: pvc-40f40264-0457-4b50-8f80-fa57571d0acf
Labels: <none>
Annotations: pv.kubernetes.io/provisioned-by: fuseim.pri/ifs
Finalizers: [kubernetes.io/pv-protection]
StorageClass: nfs-storage
Status: Bound
Claim: default/data-redis-cluster-3
Reclaim Policy: Retain
Access Modes: RWO
VolumeMode: Filesystem
Capacity: 1Gi
Node Affinity: <none>
Message:
Source:
Type: NFS (an NFS mount that lasts the lifetime of a pod)
Server: vol
Path: /share/nfs/default-data-redis-cluster-3-pvc-40f40264-0457-4b50-8f80-fa57571d0acf
ReadOnly: false
Events: <none>
pod启动时查找到对应的pv,然后查找pv对应的路径,发现文件路径不存在,于是报错,解决办法
1、新建相关的文件,例如在vol的/share/nfs目录下新建default-data-redis-cluster-0-pvc-6cea9fce-006d-40f9-8e4a-3c27f323ef80文件。
#vol上操作,新建文件
[root@vol nfs]# mkdir default-data-redis-cluster-0-pvc-6cea9fce-006d-40f9-8e4a-3c27f323ef80
#赋予权限
[root@vol nfs]# chmod 777 default-data-redis-cluster-0-pvc-6cea9fce-006d-40f9-8e4a-3c27f323ef80
[root@vol nfs]# ls
default-data-redis-cluster-0-pvc-6cea9fce-006d-40f9-8e4a-3c27f323ef80
default-redis-cluster-data-redis-app-0-pvc-e2069f3b-d370-4ad1-aeca-ac64f8c426ad
kafka-datadir-zk-0-pvc-f4dd3352-f8ba-4977-b154-882ea509e7af
kafka-datadir-zk-1-pvc-701b3432-aa3b-440e-8cac-81b1e6de69a3
kafka-datadir-zk-2-pvc-8bbd84fb-fdfc-4aae-ab98-29fa971d19b9
nodes.conf
update-node.sh
# master上查看pod状态,发现redis-cluster-0这个pod跑起来了,要想其他的也跑起来,需要把其他的文件也补起来
[root@master redis-cluster-sts]# kubectl get pods
NAME READY STATUS RESTARTS AGE
nfs-client-provisioner-78cbf94495-4zwqj 1/1 Running 4 40h
redis-cluster-0 1/1 Running 0 44m
redis-cluster-1 0/1 ContainerCreating 0 15m
上面那种办法在找不到之前的文件的情况下恢复起来很麻烦,最简单的办法就是删除pvc和pv,让它自己重新创建文件
2、删除pvc及pv,重新启动pod即可。
[root@master redis-cluster-sts]# kubectl delete -f redis-sts.yml
configmap "redis-cluster" deleted
statefulset.apps "redis-cluster" deleted
#删除pvc,注意0-5都需删除
[root@master redis-cluster-sts]# kubectl delete pvc data-redis-cluster-0
#若pvc一直处于Terminating状态,则可使用patch直接删除
[root@master redis-cluster-sts]# kubectl patch pvc data-redis-cluster-0 -p '{"metadata":{"finalizers":null}}' -n default
persistentvolumeclaim/data-redis-cluster-0 patched
#查看pv
[root@master redis-cluster-sts]# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-40f40264-0457-4b50-8f80-fa57571d0acf 1Gi RWO Retain Released default/data-redis-cluster-3 nfs-storage 36h
pvc-527fe5c7-0cb4-4816-825d-44e471299206 1Gi RWO Retain Released default/data-redis-cluster-1 nfs-storage 36h
pvc-659e52ec-bf6e-4d51-b549-91dfbf0a941a 1Gi RWO Retain Released default/data-redis-cluster-5 nfs-storage 36h
pvc-6cea9fce-006d-40f9-8e4a-3c27f323ef80 1Gi RWO Retain Released default/data-redis-cluster-0 nfs-storage 36h
pvc-bad20305-2b9c-4c81-8383-5b2a011c18c1 1Gi RWO Retain Released default/data-redis-cluster-2 nfs-storage 36h
pvc-eafb5c30-b432-4d8a-9cb4-9b52de34b0d4 1Gi RWO Retain Released default/data-redis-cluster-4 nfs-storage 36h
#删除pv,注意0-5都需删除
[root@master redis-cluster-sts]# kubectl delete pv pvc-527fe5c7-0cb4-4816-825d-44e471299206
persistentvolume "pvc-527fe5c7-0cb4-4816-825d-44e471299206" deleted
重新运行redis-sts.yaml,查看pod运行正常
[root@master redis-cluster-sts]# kubectl apply -f redis-sts.yml
configmap/redis-cluster created
statefulset.apps/redis-cluster created
[root@master redis-cluster-sts]# kubectl get pods
NAME READY STATUS RESTARTS AGE
nfs-client-provisioner-78cbf94495-4zwqj 1/1 Running 4 40h
redis-cluster-0 1/1 Running 0 70s
redis-cluster-1 1/1 Running 0 66s
redis-cluster-2 1/1 Running 0 63s
redis-cluster-3 1/1 Running 0 59s
redis-cluster-4 1/1 Running 0 55s
redis-cluster-5 1/1 Running 0 52s