目标:实现在监控3306端口服务时,出现1次critical软状态时或者在上一次执行后没有成功后出现的第一次硬状态critical情况下,远程执行mysql重启服务,并且每次执行远程重启服务前把报告事件记录到DB中
牵涉技术:
(1)Nagios事件处理原理
(2)Ssh无密码登录执行命令
(3)Perl操作mysql
如果大家对以上三条都掌握了,相信看懂这篇文章也就不成话下了。
##进入正题##
前期准备工作
I.制作ssh无密码登录
实现目标:nagios用户无密码登录server
大家对root用户无密码登录都做过。但是今天,我要做的是普通用户nagios用户无密码登录(在此感谢我同事的技术支持).
角色 |
Host_ip |
备注 |
Client |
192.168.x.x |
Nagios监控端作为Client,目的是为了远程执行脚本 |
Server |
192.168.x.y |
存启动服务脚本,如:mysql脚本 |
Client端(192.168.x.x)制作
---------------------------------------------------------------------------------------------------
(1) 创建nagios用户略过(Server端也需要)
(2) su – nagios环境下执行
ssh-keygen -t rsa
一路回车便可,无需密码。
(3)将公钥copy到server端nagios家目录下
[nagios@nagios ~]$ scp .ssh/id_rsa.pub [email protected]:/home/nagios/
The authenticity of host '192.168.x.y (192.168.x.y)' can't be established.
RSA key fingerprint is 66:9a:b5:86:3d:81:22:9b:f8:67:9e:af:aa:4c:4a:97.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.x.y' (RSA) to the list of known hosts.
[email protected]'s password:
id_rsa.pub 100% 411 0.4KB/s 00:00
---------------------------------------------------------------------------------------------------
Server端(192.168.x.x)制作
--------------------------------------------------------------------------------------------------
(1) 进入server端,登入nagios帐号
(2) 创建mkdir /home/nagios/.ssh
(3) 将公钥匙写入authorized_keys文件:
cat /home/nagios/id_rsa.pub >>.ssh/authorized_keys
(4) 改权限(以root身份或者通过visudo授权给nagios):
chmod 700 /home/nagios/.ssh
chmod 600 /home/nagios/.ssh/authorized_keys
检查 |
SERVER端权限检查 [root@centos-server nagios]# ls -la /home/nagios|grep .ssh drwx------- 2 nagios nagios 4096 Aug 3 09:04 .ssh
[root@centos-server nagios]# ls -la /home/nagios/.ssh/ total 12 drw------- 2 nagios nagios 4096 Aug 3 09:04 . drwx------ 4 nagios nagios 4096 Aug 3 09:03 .. -rw------- 1 nagios nagios 411 Aug 3 09:04 authorized_keys
请确保红色标识的内容(保证.ssh目录的权限为700, authorized_keys的权限为600) nagios用户持有者
CLIENT端登录测试 [nagios@nagios ~]$ ssh [email protected] Last login: Wed Aug 3 09:15:59 2011 from 192.168.x.x [nagios@centos-server ~]$ 看到没?从192.168.x.x登录到192.168.x.y无需密码了。 如果没有这样的效果,大家看下是不是前面的权限问题。我曾今也是因为权限折腾了我同事半天。哈哈。 |
II.无密码登录远程执行命令
实现目标:nagios用户远程启动server端mysql服务
-----------------------------------------------------------------------------------------------
Server端(192.168.x.x)制作
------------------------------------------------------------------------------------------------
(1) 配置mysql启动控制脚本
输入以下SQL语句,创建一个具有root权限的用户(admin)和密码(controlmysql):
GRANT ALL PRIVILEGES ON *.* TO 'admin'@'localhost' IDENTIFIED BY ' controlmysql ';
GRANT ALL PRIVILEGES ON *.* TO 'admin'@'127.0.0.1' IDENTIFIED BY ' controlmysql ';
作用:用与启动/关闭控制mysql服务
Mysql控制(启动/停止等)脚本 |
#!/bin/sh
mysql_port=3306 mysql_username="admin" mysql_password=" controlmysql " mysql_scripts_path="/data0/mysql/3306" mysqld_path="/usr/local/webserver/mysql"
start_mysql() { printf "Starting MySQL...\n" /bin/sh ${mysqld_path}/bin/mysqld_safe --defaults-file=/data0/mysql/${mysql_port}/my.cnf 2>&1 > /dev/null & }
stop_mysql() { printf "Stoping MySQL...\n" ${mysqld_path}/bin/mysqladmin -u ${mysql_username} -p${mysql_password} -S /tmp/mysql.sock shutdown }
restart_mysql() { printf "Restarting MySQL...\n" stop_mysql sleep 5 start_mysql }
kill_mysql() { kill -9 $(ps -ef | grep 'bin/mysqld_safe' | grep -v 'grep'| awk '{printf $2}') kill -9 $(ps -ef | grep 'libexec/mysqld' | grep -v 'grep' |awk '{printf $2}') }
if [ "$1" = "start" ]; then start_mysql elif [ "$1" = "stop" ]; then stop_mysql elif [ "$1" = "restart" ]; then restart_mysql elif [ "$1" = "kill" ]; then kill_mysql else printf "Usage: ${mysql_scripts_path}/mysql {start|stop|restart|kill}\n" fi
|
(2) 配置sudo,允许nagios用户执行脚本
**如果没有sudo,yum –y install sudo**
#visudo
添加
nagios ALL=(root) NOPASSWD:/data0/mysql/3306/mysql start
检查 |
SERVER端脚本测试检查 [root@centos-server ~]# netstat -an|grep 3306 [root@centos-server ~]# 说明mysql没有起来
[root@centos-server ~]# /data0/mysql/3306/mysql start Starting MySQL... [root@centos-server ~]# netstat -an|grep 3306 tcp 0 0 :::3306 :::* LISTEN [root@centos-server ~]# 脚本OK,正常
Client端测试(以nagios用户登录) [nagios@nagios ~]$ ssh [email protected] "sudo /data0/mysql/3306/mysql start" sudo: sorry, you must have a tty to run sudo
解决: Server端修改visudo,将下面一行注释 Defaults requiretty
再试 [nagios@nagios ~]$ ssh [email protected] "sudo /data0/mysql/3306/mysql start" Starting MySQL... 正常启动 检查SERVER端 端口3306是否存在 恭喜,基本功已经做完。我们可以去玩监控端nagios配置了 |
III.Nagios监控端配置
(1)nagios基本配置文件如下:
mfs_hosts.cfg |
define host{ use mfs-server host_name mfs-192.168.x.y alias mfs-192.168.x.y address 192.168.x.y } |
mfs_hostgroups.cfg |
define hostgroup{ hostgroup_name mfs-servers alias Mfs Linux Servers members mfs-192.168.x.y } |
mfs_services.cfg |
define service { name mfs-services service_description checkport check_command check_tcp!3306 check_period 24x7 max_check_attempts 2 normal_check_interval 3 retry_check_interval 1 notification_interval 5 notification_period 24x7 notification_options w,u,c,r register 0 }
define service{ use mfs-services host_name mfs-192.168.x.y event_handler_enabled 1 event_handler restart_mysql
}
define service{ use mfs-service host_name mfs-192.168.x.y service_description PING check_command check_ping!100.0,20%!500.0,60% } |
commands.cfg |
define command{ command_name restart_mysql command_line /usr/local/nagios/libexec/restart_mysql $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$ $HOSTADDR ESS$ } |
(2)改写/usr/local/nagios/libexec/restart_mysql
restart_mysql |
HostAddress=$4 debug=1 if [ $debug -eq 1 ];then echo "MysqlServer:${HostAddress}" >>/tmp/ReMysql.log fi case "$1" in OK) ;; WARNING) ;; UNKNOWN) ;; CRITICAL) case "$2" in
SOFT)
case "$3" in
1) if [ $debug -eq 1 ];then echo "Restarting Mysql service (1rd soft critical state)..." >>/tmp/ReMysql.log fi /usr/bin/ssh nagios@${HostAddress} "sudo /data0/mysql/3306/mysql start" ;; esac ;;
HARD) if [ $debug -eq 1 ];then echo "Restarting Mysql service..." >>/tmp/ReMysql.log fi /usr/bin/ssh nagios@${HostAddress} "sudo /data0/mysql/3306/mysql start" ;; esac ;; esac exit 0 注明:测试最好将debug设置为1 申明:本脚本暂时的作用是远程重启mysql,后续还要添加写入数据库的脚本。 |
检查 |
Nagios配置文件检查 /usr/local/nagios/bin/nagios –v /usr/local/nagios/etc/nagios.cfg 无错误,重启nagios Service nagios restart
被监控端开启mysql等相关服务,保证监控一切正常!如图: 尝试正常关闭mysql服务 实现目标:当出现第一次软状态的critical情况下,去尝试重启mysql.
以下4条信息足以证明我们想达到的效果已经实现! (1)检查监控端nagios图 (2)检查监控端脚本日志 [root@nagios tmp]# tail -f ReMysql.log MysqlServer:192.168.x.y Restarting Mysql service (1rd soft critical state)... (3)被监控端检查端口是否存在 [root@centos-server ~]# netstat -an|grep 3306 tcp 0 0 :::3306 :::* LISTEN (4)再检查监控端nagios图 注明:到这里我们已经实现了第一个想法,就是远程重启服务。下面,我们要实现将事件记录到mysql中。 |
=======================================================
IV将通知信息写入Mysql
实现目标:将nagios报错信息写入到mysql DB中
角色 |
Host_ip |
备注 |
Client |
192.168.x.x |
Nagios监控端作为Client,执行将报错信息写入数据库脚本 |
DB Server |
192.168.x.z |
存储报错信息的DB |
DB Server端操作:
-----------------------------------------------------------------------------------------------
(1)创建库
create database nagios;
(2)授权
输入以下SQL语句,创建一个具有插入/修改/删除/浏览权限的用户(nagioslog)和密码(nagioslog)(允许nagios监控端远程登录):
GRANT ALL PRIVILEGES ON nagios.* TO 'nagioslog'@'192.168.x.x' IDENTIFIED BY '12345678';
作用:用与插入/修改/删除/浏览数据
(3)以nagioslog用户登录创建log表
create table log(host_ip varchar(50),services_desc varchar(200),plugin_out varchar(500)) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Client端操作
-----------------------------------------------------------------------------------------
(1)安装perl操作mysql环境
perl -MCPAN -e "install DBI"
perl -MCPAN -e "install DBD::mysql"
(2)操作mysql脚本
Perl远程操作mysql脚本 |
#!/bin/perl #Last Modifed by Hahazhu 2011/08/03 use DBI;
##########INIT DEFINED########### my $remote_mysql="192.168.x.z"; my $remote_db="nagios"; my $remote_mysql_user="nagioslog"; my $remote_mysql_pwd="12345678"; my $debug=1; ##########Recevice Values######### my $host_ip=$ARGV[0]; my $service_desc=$ARGV[1]; my $plugin_out=$ARGV[2];
my $dbh = DBI->connect("DBI:mysql:database=$remote_db;host=$remote_mysql", "$remote_mysql_user", "$remote_mysql_pwd", {'RaiseError' => 1});
my $rows = $dbh->do("INSERT INTO log (host_ip, services_desc, plugin_out) VALUES ('$host_ip', '$service_desc', '$plugin_out')"); if ($debug){ print "$rows row(s) affected \n"; } if($debug){ my $sth = $dbh->prepare("SELECT host_ip, services_desc , plugin_out FROM log"); $sth->execute();
while (@data=$sth->fetchrow_array()){ print "$data[0] $data[1] $data[2]\n"; } } $dbh->disconnect();
申明:测试前请将$debug设置为1. |
检查 |
Nagios端以nagios用户执行插入数据脚本 [nagios@nagios libexec]$ perl insert_log_to_mysql.pl 1.1.1.1 check_3306 "connection refused" 1 row(s) affected 1.1.1.1 check_3306 connection refused
DB Server端检查 mysql> select * from log; +---------+---------------+--------------------+ | host_ip | services_desc | plugin_out | +---------+---------------+--------------------+ | 1.1.1.1 | check_3306 | connection refused | +---------+---------------+--------------------+ 1 row in set (0.00 sec)
OK,脚本测试无问题。后面的工作就是将其加入到nagios配置里了。 |
V.Nagios服务配置调整
Commands.cfg |
define command{ command_name restart_mysql command_line /usr/local/nagios/libexec/restart_mysql $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$ $HOSTADDRESS$ $SERVICEDESC$ "$SERVICEOUTPUT$" } |
启动mysql脚本要调整restart_mysql |
#/bin/sh HostAddress=$4 Services_desc=$5 Plugin_out=$6 debug=1 if [ $debug -eq 1 ];then echo "MysqlServer:${HostAddress}" >>/tmp/ReMysql.log fi case "$1" in OK) ;; WARNING) ;; UNKNOWN) ;; CRITICAL) case "$2" in
SOFT)
case "$3" in
1) if [ $debug -eq 1 ];then echo "Restarting Mysql service (1rd soft critical state)..." >>/tmp/ReMysql.log fi /usr/bin/perl /usr/local/nagios/libexec/insert_log_to_mysql.pl ${HostAddress} ${Services_desc} ${Plugin_out} /usr/bin/ssh nagios@${HostAddress} "sudo /data0/mysql/3306/mysql start" ;; esac ;;
HARD) if [ $debug -eq 1 ];then echo "Restarting Mysql service..." >>/tmp/ReMysql.log fi /usr/bin/perl /usr/local/nagios/libexec/insert_log_to_mysql.pl ${HostAddress} ${Services_desc} “${Plugin_out}” /usr/bin/ssh nagios@${HostAddress} "sudo /data0/mysql/3306/mysql start" ;; esac ;; esac exit 0 申明:调试前最好把debug设置为1 |
检查 |
到了本文最后一部分了,有点激动… 看看,我们验证能不能达到我们下面的目标.
实现目标: 重启mysql服务,必把相关日志记录到另一台mysql DB中。
试验:stop mysql服务 Nagios端检查图:
Nagios端日志: [root@nagios ~]# tail -f /tmp/ReMysql.log
MysqlServer:192.168.x.y Restarting Mysql service (1rd soft critical state)...
此时检查mysql服务端 [root@centos-server ~]# netstat -an|grep 3306 tcp 0 0 :::3306 :::* LISTEN
再检查记录日志情况: mysql> select * from nagios.log; +--------------+---------------+--------------------+ | host_ip | services_desc | plugin_out | +--------------+---------------+--------------------+ | 192.168.x.y | checkport | Connection refused | +--------------+---------------+--------------------+ 1 row in set (0.00 sec)
OK,目标已经实现。不仅实现了远程开机。而且将错误记录下来了。 |
到此,本文结束。我相信大家会有更多的想法去扩展…
下一篇,我将会带大家学习下nagios 分布式监控!