最近2天,nagios总是报警,inotifywait进程为0
express_1这台主机有2个rsync脚本,express_1向express_2同步,开启后,会有2个inotifywait进程。
每隔几个小时就会挂掉,需要手动启动一下。但是这样太麻烦了,一晚上就发了十几条nagios报警。
所以我就想用monit来监控inotifywait进程。
创建启动脚本
vi /manage/express_monit.sh
#!/bin/bash
case "$1" in
start)
echo "Starting express..."
/manage/rsync/rsync_express.sh &
sleep 1
ps -aux | grep inotifywait |grep express | head -1 | awk '{print $2}' > /var/run/express.pid
;;
stop)
echo "Stopping express..."
kill -9 `cat /var/run/express.pid`
;;
restart)
echo "Stopping express..."
kill -9 `cat /var/run/express.pid`
sleep 1
echo
echo "Starting express..."
/manage/rsync/rsync_express.sh &
sleep 1
ps -aux | grep inotifywait | grep express | head -1 | awk '{print $2}' > /var/run/express.pid
;;
*)
echo "Usage: $prog {start|stop|restart}"
;;
esac
exit 0
设置权限
chmod 755 express_monit.sh
安装monit,最好使用rpm安装,使用编码包编译有问题
yum install -y monit
编辑配置文件
vim /etc/monit.conf
修改检查时间为3秒以及id文件路径和开启日志
set daemon 3 # check services at 2-minute intervals
# set logfile syslog facility log_daemon
set logfile /var/log/monit.log
set idfile /var/.monit.id
set statefile /var/.monit.state
注释倒数第3行
# set daemon mode timeout to 1 minute
#set daemon 60
进入配置目录
cd /etc/monit.d/
添加express同步进程监控
vi express
check process express with pidfile /var/run/express.pid
start program = "/manage/express_monit.sh start"
stop program = "/manage/express_monit.sh stop"
启动monit
/etc/init.d/monit start
kill掉inotifywait进程
pkill inotifywait
观察monit日志
tail -f /var/log/monit
[CST Apr 20 10:41:07] error : 'express' process is not running
[CST Apr 20 10:41:07] info : 'express' trying to restart
[CST Apr 20 10:41:07] info : 'express' start: /manage/express_monit.sh
[CST Apr 20 10:41:12] info : 'express' process is running with pid 14139
查看进程是否启动
[root@iZ23vu75locZ ~]# ps -aux | grep ino
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ
root 14306 0.0 0.0 6344 776 ? S 10:41 0:00 /usr/local/inotify/bin/inotifywait -mrq --timefmt %d/%m/%y %H:%M --format %T %w%f -e modify,delete,create,attrib /www/express/
root 17537 0.0 0.0 103252 864 pts/2 S+ 10:47 0:00 grep ino
测试正常。