需求:ping一台需要被检测的服务器,如果丢包率为100%,则表示机器出问题了,随后发送报警邮件(首先需要一个邮箱账号,并开启smtp服务,报警邮件将由该邮箱发出)
创建发送邮件的Python脚本:
#!/usr/bin/python
#coding:utf-8
import smtplib
from email.mime.text import MIMEText
import sys
#发信地址
mail_user = '[email protected]'
#发信地址的SMTP授权密码
mail_pass = 'xxxxxxxx'
def send_mail(to_list,subject,content):
me = "邮件报警"+"<"+mail_user+">"
msg = MIMEText(content, 'plain', 'utf-8')
msg['Subject'] = subject
msg['From'] = me
msg['to'] = to_list
try:
#定义网易163邮箱提供的SMTP服务地址
s = smtplib.SMTP("smtp.163.com", 25)
s.login(mail_user,mail_pass)
s.sendmail(me,to_list,msg.as_string())
s.close()
return True
except Exception,e:
print str(e)
return False
if __name__ == "__main__":
send_mail(sys.argv[1], sys.argv[2], sys.argv[3])
假设被检测机器IP为192.168.234.125,ping该地址:
[root@linux ~]# ping -c5 192.168.234.125
PING 192.168.234.125 (192.168.234.125) 56(84) bytes of data.
64 bytes from 192.168.234.125: icmp_seq=1 ttl=64 time=0.304 ms
64 bytes from 192.168.234.125: icmp_seq=2 ttl=64 time=0.982 ms
64 bytes from 192.168.234.125: icmp_seq=3 ttl=64 time=0.837 ms
64 bytes from 192.168.234.125: icmp_seq=4 ttl=64 time=0.863 ms
64 bytes from 192.168.234.125: icmp_seq=5 ttl=64 time=0.382 ms
--- 192.168.234.125 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4002ms
rtt min/avg/max/mdev = 0.304/0.673/0.982/0.276 ms
#关注倒数第二行的丢包率(packet loss)即可
脚本思路:排除丢包率为空或非数字的情况下,当丢包率为100%,即发送报警邮件
#!/bin/bash
ip=192.168.234.125
m=[email protected]
n=`ping -c5 $ip |grep "packet loss"|awk -F '%' '{print $1}'|awk '{print $NF}'`
if [ -z $n ]
then
echo "脚本执行出错"
/usr/bin/python /data/mail.py $m "检测机器存活脚本:$0错误" "丢包率变量获取不到值"
exit
else
n1=`echo $n |sed 's/[0-9]//g'`
if [ -n $n1 ]
then
echo "脚本执行出错"
/usr/bin/python /data/mail.py $m "检测机器存活脚本:$0错误" "丢包率变量含非数字的字符"
exit
fi
fi
if [ $n -eq 100 ]
then
/usr/bin/python /data/mail.py $m "邮件报警" "$ip丢包率:$n%"
fi
通过检测web服务的端口是否被监听,判断web服务状态,以nginx的80端口为例
脚本思路:查看80端口是否被监听,如果没有,重启nginx并发送邮件通知,每30秒检测一次
#!/bin/bash
m=`[email protected]`
while :
do
n=`netstat -lntp |grep ":80 "|wc -l`
if [ $n -eq 0 ]
then
/usr/bin/systemctl restart nginx
/usr/bin/python mail.py $m "邮件通知" "检测到80端口未监听,已重启nginx"
fi
sleep 30
done
#由于使用了循环,执行脚本时需要放到后台运行,也可以不使用循环,通过设置crontab任务计划设定检测间隔时间,检测端口还能使用nmap命令,判断对应端口state列的值是否为closed即可
502是nginx最普遍的错误状态码,一般由于php程序将php-fpm服务资源耗尽所导致,这种情况临时解决方法是重启php-fpm,事后通过分析日志寻找解决方法
日志示例:(可以发现每条请求的状态码前面都有空格,脚本中利用前后空格来更精准匹配)
192.168.234.1 - - [22/Oct/2019:20:34:07 +0800] "GET / HTTP/1.1" 200 53570 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36" "-"
192.168.234.1 - - [22/Oct/2019:20:34:07 +0800] "GET /wp-includes/css/dist/block-library/style.min.css?ver=5.2.3 HTTP/1.1" 301 169 "http://www.blog.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36" "-"
192.168.234.1 - - [22/Oct/2019:20:34:07 +0800] "GET /wp-includes/css/dist/block-library/theme.min.css?ver=5.2.3 HTTP/1.1" 301 169 "http://www.blog.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36" "-"
192.168.234.1 - - [22/Oct/2019:20:34:07 +0800] "GET /wp-content/themes/twentyseventeen/style.css?ver=5.2.3 HTTP/1.1" 301 169 "http://www.blog.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36" "-"
192.168.234.1 - - [22/Oct/2019:20:34:07 +0800] "GET /wp-content/themes/twentyseventeen/assets/js/jquery.scrollTo.js?ver=2.1.2 HTTP/1.1" 301 169 "http://www.blog.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36" "-"
脚本思路:获取访问日志中的http状态码,假设每30秒请求数为100,那么每30秒执行一次脚本,获取日志最后面的100条请求记录,匹配502关键字,当行数大于50时,(502错误码出现频率高于50%)重启php-fpm,重启后判断是否成功,如未成功,邮件报警
#!/bin/bash
log=/data/logs/access.log
while :
do
n=`tail -n 100 $log|grep -c ' 502 '`
if [ -z $n ]
then
exit
fi
if [ $n -gt 50 ]
then
/etc/init.d/php-fpm restart >/dev/null 2>/tmp/php-fpm.err
php_n=`pgrep -l php-fpm|wc -l`
if [ $php_n -eq 0 ]
then
/usr/bin/pyhton mail.py [email protected] "php-fpm重启失败" "`head /tmp/php-fpm.err`"
exit
fi
fi
sleep 30
done