MFS分布式文件系统异常关闭,重新启动服务会导致失败及其解决办法

问题:

    在部署完master后,容然后进行配置了chunk,由于有事情,就没有再进行实验,

     直接就进行了关机,导致第二次开启moosefs-master,服务就起不来~~~

报错:

[root@server1 mfs]# systemctl start moosefs-master
Job for moosefs-master.service failed because the control process exited with error code. See "systemctl status moosefs-master.service" and "journalctl -xe" for details.

原因:

网上差了资料才知道是由于,刚做完在server2上的同步,然后异常关闭mfsmaster后,导致mfsmaster和moosefs-chunkservers数据不同步,所以才导致了mfsmaster服务无法启动;

解决方法:

方法一:

1、服务起不来,首先  > /var/log/messages   ,然后重启服务,查看报错:

[root@server1 mfs]# > /var/log/messages 
[root@server1 mfs]# systemctl start moosefs-master
Job for moosefs-master.service failed because the control process exited with error code. See "systemctl status moosefs-master.service" and "journalctl -xe" for details.
[root@server1 mfs]# cat /var/log/messages 
Dec 20 03:30:14 server1 systemd: Starting MooseFS Master server...
Dec 20 03:30:14 server1 mfsmaster[2167]: open files limit has been set to: 16384
Dec 20 03:30:14 server1 mfsmaster: open files limit has been set to: 16384
Dec 20 03:30:14 server1 mfsmaster[2167]: set gid to 994
Dec 20 03:30:14 server1 mfsmaster: working directory: /var/lib/mfs
Dec 20 03:30:14 server1 mfsmaster: lockfile created and locked
Dec 20 03:30:14 server1 mfsmaster: initializing mfsmaster modules ...
Dec 20 03:30:14 server1 mfsmaster: exports file has been loaded
Dec 20 03:30:14 server1 mfsmaster: topology file has been loaded
Dec 20 03:30:14 server1 mfsmaster[2167]: set uid to 996
Dec 20 03:30:14 server1 mfsmaster[2167]: out of memory killer disabled
Dec 20 03:30:14 server1 mfsmaster[2167]: monotonic clock function: clock_gettime
Dec 20 03:30:14 server1 mfsmaster[2167]: monotonic clock speed: 163133 ops / 10 mili seconds
Dec 20 03:30:14 server1 mfsmaster[2167]: exports file has been loaded
Dec 20 03:30:14 server1 mfsmaster[2167]: topology file has been loaded
Dec 20 03:30:14 server1 mfsmaster[2167]: can't find metadata.mfs - try using option '-a'
Dec 20 03:30:14 server1 systemd: moosefs-master.service: control process exited, code=exited status=1
Dec 20 03:30:14 server1 mfsmaster: loading metadata ...
Dec 20 03:30:14 server1 mfsmaster: can't find metadata.mfs - try using option '-a'
Dec 20 03:30:14 server1 mfsmaster: init: metadata manager failed !!!
Dec 20 03:30:14 server1 mfsmaster: error occurred during initialization - exiting
Dec 20 03:30:14 server1 mfsmaster[2167]: init: metadata manager failed !!!
Dec 20 03:30:14 server1 systemd: Failed to start MooseFS Master server.
Dec 20 03:30:14 server1 mfsmaster[2167]: exititng ...
Dec 20 03:30:14 server1 systemd: Unit moosefs-master.service entered failed state.
Dec 20 03:30:14 server1 mfsmaster[2167]: process exited successfully (status:1)
Dec 20 03:30:14 server1 systemd: moosefs-master.service failed

2、报错信息:
cat't find metadata.mfs - try using option '-a'

3、使用脚本文件启动服务: /usr/sbin/mfsmaster start -a

[root@server1 mfs]# /usr/sbin/mfsmaster start -a
open files limit has been set to: 16384
working directory: /var/lib/mfs
lockfile created and locked
initializing mfsmaster modules ...
exports file has been loaded
topology file has been loaded
loading metadata ...
metadata file has been loaded
no charts data file - initializing empty charts
master <-> metaloggers module: listen on *:9419
master <-> chunkservers module: listen on *:9420
main master server module: listen on *:9421
mfsmaster daemon initialized properly

4、查看端口:发现9419/9420/9421的端口都被打开了
5、但是此时查看moosefs-master.service状态发现是failed的状态

[root@server1 mfs]# systemctl status moosefs-master
● moosefs-master.service - MooseFS Master server
   Loaded: loaded (/usr/lib/systemd/system/moosefs-master.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Fri 2019-12-20 03:47:02 EST; 7min ago
  Process: 2175 ExecStart=/usr/sbin/mfsmaster start (code=exited, status=1/FAILURE)

6、 进入系统的服务文件:
vim /usr/lib/systemd/system/moosefs-master.service
7、修改系统的启动脚本文件:

# 8 ExecStart=/usr/sbin/mfsmaster start -a

8、重新加载文件:
systemctl daemon-reload
9、启动服务:
systemctl start moosefs-master 会有报错
10、查看日志,报错信息:
cat't start :lockfile is already locked by another process
11、因为之前已经使用脚本文件已经启动过一次了,现在使用systemctl start moosefs-master,就会出现报错;但是用脚本文件/usr/sbin/mfsmaster stop-a关闭,他会报错,报错信息如下:
cat't find process to terminate
12、因此需要查看进程,输入:ps aux ,使用进程直接将其进程杀死:kill -9 【mfsmaster进程号】
13、再次输入: systemctl restart moosefs-master
14、重启成功

此时查看master状态,显示开启~~~~

方法二:

当使用脚本文件启动服务后,/usr/sbin/mfsmaster start,就不要
使用systemctl start moosefs-master再启动,否则会导致服务端口冲突;
因为当你使用脚本文件启动服务后,再查看systemctl status moosefs-master,并没有启动成功,但是端口已经被打开;netstat -antlp ==

并且想要使用systemctl start moosefs-master启动服务的话,首先>关闭脚本启动的服务;
脚本使用stop关闭是不行的,你需要将其进程杀死;然后在系统的启>动服务的配置文件
vim /usr/lib/systemd/system/moosefs-master.service
添加ExecStart=/usr/sbin/mfsmaster start -a
使用systemctl start moosefs-master启动服务即可成功

#当关机时,切忌master、metalogger、chunker、client端,服务器>关机和重启时,程序都是正常关闭,再启动服务是正常,无需修复。 整个mfs体系中,直接断电只有master有可能无法启动 使用mfsmetarestore -a修复才能启动。

#将moosefs-chunkservers的挂载进行正常卸载,然后当你开机时,再
启动服务也是可以的;

你可能感兴趣的:(平时的问题解决)