mount进程在systemctl守护的情况下,mount dir进程被oom后重新启动失败的处理方法

systemctl 如果用于守护mount 进程时,建议在systemctl代码段ExecStart指向的mount脚本中增加umount命令再去执行mount命令,因为一旦一个mount的目录的进程被OOM后,这个mount目录其实还是被占用的,需要umount后才能再次mount上去

mount脚本如下

root@DAILAPGDBUP001:~# cat /root/mountdatadomaindir.sh
/opt/emc/boostfs/bin/boostfs mount /mnt/datadomaindir -d DAILADD01.dai.netdai.com -s daipostgres -o allow-others=true

systemctl代码段ExecStart指向了该mount脚本,systemctl信息如下

root@DAILAPGDBUP001:~# vim /usr/lib/systemd/system/mountdatadomaindir.service
[Unit]
Description=mountdatadomaindir
After=network.target

[Service]
User=root
Group=root
Type=forking
ExecStart=/bin/bash /root/mountdatadomaindir.sh
Restart=on-failure

[Install]
WantedBy=multi-user.target
root@DAILAPGDBUP001:~# systemctl enable mountdatadomaindir

有一次发生了OOM,咱们systemctl已经是加了Restart=on-failure的,但是没看到/mnt/datadomaindir被挂载了,/var/log/syslogs有如下记录,

Oct 15 01:01:02 DAILAPGDBUP001 systemd[1]: mountdatadomaindir.service: A process of this unit has been killed by the OOM killer.
Oct 15 01:01:02 DAILAPGDBUP001 systemd[1]: mountdatadomaindir.service: Main process exited, code=killed, status=9/KILL
Oct 15 01:01:02 DAILAPGDBUP001 systemd[1]: mountdatadomaindir.service: Failed with result 'oom-kill'.
Oct 15 01:01:02 DAILAPGDBUP001 systemd[1]: mountdatadomaindir.service: Consumed 36min 37.125s CPU time.
Oct 15 01:01:03 DAILAPGDBUP001 systemd[1]: mountdatadomaindir.service: Scheduled restart job, restart counter is at 1.
Oct 15 01:01:03 DAILAPGDBUP001 systemd[1]: Stopped mountdatadomaindir.
Oct 15 01:01:03 DAILAPGDBUP001 systemd[1]: mountdatadomaindir.service: Consumed 36min 37.125s CPU time.
Oct 15 01:01:03 DAILAPGDBUP001 systemd[1]: Starting mountdatadomaindir...
Oct 15 01:01:04 DAILAPGDBUP001 bash[1896219]: Not able to access the mount point /mnt/datadomaindir
Oct 15 01:01:04 DAILAPGDBUP001 systemd[1]: mountdatadomaindir.service: Control process exited, code=exited, status=1/FAILURE
Oct 15 01:01:04 DAILAPGDBUP001 systemd[1]: mountdatadomaindir.service: Failed with result 'exit-code'.
Oct 15 01:01:04 DAILAPGDBUP001 systemd[1]: Failed to start mountdatadomaindir.
Oct 15 01:01:04 DAILAPGDBUP001 systemd[1]: mountdatadomaindir.service: Scheduled restart job, restart counter is at 2.
Oct 15 01:01:04 DAILAPGDBUP001 systemd[1]: Stopped mountdatadomaindir.
Oct 15 01:01:04 DAILAPGDBUP001 systemd[1]: Starting mountdatadomaindir...
Oct 15 01:01:06 DAILAPGDBUP001 bash[1896286]: Not able to access the mount point /mnt/datadomaindir
Oct 15 01:01:06 DAILAPGDBUP001 systemd[1]: mountdatadomaindir.service: Control process exited, code=exited, status=1/FAILURE

并且ll /mnt/显示ls: cannot access ‘datadomaindir’: Transport endpoint is not connected,并且挂载的目录信息都是显示?问号

root@DAILAPGDBUP001:~# ll /mnt/
ls: cannot access 'datadomaindir': Transport endpoint is not connected
total 8
drwxr-xr-x  3 root root 4096 Sep 16 04:16 ./
drwxr-xr-x 20 root root 4096 Aug 31 06:36 ../
d?????????  ? ?    ?       ?            ? datadomaindir/

解决方法:在/root/mountdatadomaindir.sh中增加一段umount /mnt/datadomaindir,原因就是一旦一个mount的目录的进程被OOM后,这个mount目录其实还是被占用的,需要umount后才能再次mount上去

root@DAILAPGDBUP001:~# vim /root/mountdatadomaindir.sh
umount /mnt/datadomaindir
/opt/emc/boostfs/bin/boostfs mount /mnt/datadomaindir -d DAILADD01.dai.netdai.com -s daipostgres -o allow-others=true

你可能感兴趣的:(Linux,linux)