mount failed: mount failed: exit status 1

k8s pod挂载pvc,pvc通过pv连接了glusterfs。 pod一直处于creating状态,kubectl descibe pod 显示如下错误:

Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/6e95525e-63ea-11e8-9cf1-5254000c4f61/volumes/kubernetes.io~glusterfs/tfworkflow-glusterfs-pv --scope -- mount -t glusterfs -o log-level=ERROR,log-file=/var/lib/kubelet/plugins/kubernetes.io/glusterfs/tfworkflow-glusterfs-pv/busybox-q46jq-glusterfs.log,backup-volfile-servers=172.16.1.1 172.16.1.1:gv-test /var/lib/kubelet/pods/6e95525e-63ea-11e8-9cf1-5254000c4f61/volumes/kubernetes.io~glusterfs/tfworkflow-glusterfs-pv
Output: Running scope as unit run-22169.scope.
Mount failed. Please check the log file for more details.

the following error information was pulled from the glusterfs log to help diagnose this issue: 
[2018-05-30 09:19:09.574623] I [fuse-bridge.c:5834:fini] 0-fuse: Unmounting '/var/lib/kubelet/pods/6e95525e-63ea-11e8-9cf1-5254000c4f61/volumes/kubernetes.io~glusterfs/tfworkflow-glusterfs-pv'.
[2018-05-30 09:19:09.574633] I [fuse-bridge.c:5839:fini] 0-fuse: Closing fuse connection to '/var/lib/kubelet/pods/6e95525e-63ea-11e8-9cf1-5254000c4f61/volumes/kubernetes.io~glusterfs/tfworkflow-glusterfs-pv'.
Warning FailedMount 30m kubelet, st3-aknode-28.prod.yiran.com MountVolume.SetUp failed for volume "tfworkflow-glusterfs-pv" : mount failed: mount failed: exit status 1

注:pod在机器node-2(172.16.1.2)上。

 

describe pod看不到有用的错误信息, 手动执行

systemd-run --description=Kubernetes transient mount for /var/lib/kubelet/pods/6e95525e-63ea-11e8-9cf1-5254000c4f61/volumes/kubernetes.io~glusterfs/tfworkflow-glusterfs-pv --scope -- mount -t glusterfs -o log-level=ERROR,log-file=/var/lib/kubelet/plugins/kubernetes.io/glusterfs/tfworkflow-glusterfs-pv/busybox-q46jq-glusterfs.log,backup-volfile-servers=172.16.1.1 172.16.1.1:gv-test /var/lib/kubelet/pods/6e95525e-63ea-11e8-9cf1-5254000c4f61/volumes/kubernetes.io~glusterfs/tfworkflow-glusterfs-pv

报错:

Failed to find executable transient: No such file or directory

检查命令中用到的目录和文件也都存在。偶尔看了下日志文件/var/lib/kubelet/plugins/kubernetes.io/glusterfs/tfworkflow-glusterfs-pv/busybox-q46jq-glusterfs.log的内容,

发现日志文件/var/lib/kubelet/plugins/kubernetes.io/glusterfs/tfworkflow-glusterfs-pv/busybox-q46jq-glusterfs.log中有如下报错:

[2018-05-30 09:18:36.778752] I [MSGID: 101190] [event-epoll.c:602:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
[2018-05-30 09:18:36.778845] W [MSGID: 101174] [graph.c:360:_log_if_unknown_option] 0-gv0-readdir-ahead: option 'parallel-readdir' is not recognized
[2018-05-30 09:18:36.778908] W [MSGID: 101174] [graph.c:360:_log_if_unknown_option] 0-gv0-client-0: option 'transport.socket.keepalive-count' is not recognized
[2018-05-30 09:18:36.778928] I [MSGID: 114020] [client.c:2356:notify] 0-gv0-client-0: parent translators are ready, attempting connect on transport
[2018-05-30 09:18:36.780784] E [MSGID: 101075] [common-utils.c:314:gf_resolve_ip6] 0-resolver: getaddrinfo failed (Name or service not known)
[2018-05-30 09:18:36.780795] E [name.c:262:af_inet_client_get_remote_sockaddr] 0-gv0-client-0: DNS resolution failed on host node-1

注: glusterfs在机器node-1(172.16.1.1)上。

 

找到Name or service not known错误, 就容易排查了。

 

原因: 在机器node-2(172.16.1.2)上ping 172.16.1.1可以, 但是ping node-1是不通的。

解决办法:手动清除缓存

nscd -i passwd
nscd -i hosts
nscd -i group

 

 

参考: https://blog.csdn.net/lufeisan/article/details/53416122

你可能感兴趣的:(mount failed: mount failed: exit status 1)