1在lvs server上安装nrpe客户端:
1.1,rpm方式安装nrpe客户端
下载地址:http://download.csdn.net/detail/mchdba/7493875
[root@localhost nagios]# ll 总计 768 -rw-r--r-- 1 root root 713389 12-16 12:08 nagios-plugins-1.4.11-1.x86_64.rpm -rw-r--r-- 1 root root 32706 12-16 12:09 nrpe-2.12-1.x86_64.rpm -rw-r--r-- 1 root root 18997 12-16 12:08 nrpe-plugin-2.12-1.x86_64.rpm [root@localhost nagios]# rpm -ivh *.rpm --nodeps --force
1.2 在配置文件最末尾,加入配置信息以及监控主机serverip地址
[root@ localhost nagios]# vim /etc/nagios/nrpe.cfg # add by tim on 2014-06-11 command[check_users]=/usr/local/nagios/libexec/check_users -w 8 -c 15 command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20 command[check_sda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z #command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 50 -c 80 command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 750 -c 800 command[check-host-alive]=/usr/local/nagios/libexec/check_ping -H localhost -w 3000.0,80% -c 5000.0,100% -p 5 allowed_hosts = 127.0.0.1, 10.2xx.3.xx
check下命令是否生效:
[root@web-9 nrpe-2.15]# /usr/local/nagios/libexec/check_users -w 8 -c 15 USERS OK - 2 users currently logged in |users=2;8;15;0 [root@web-9 nrpe-2.15]#
看到已经USERS OK -….命令已经生效。
1.3 启动nrpe报错例如以下:
[root@web-9 ~]# service nrpe restart Shutting down nrpe: [失败] Starting nrpe: /usr/sbin/nrpe: error while loading shared libraries: libssl.so.6: cannot open shared object file: No such file or directory [失败] [root@web-9 ~]# [root@db-m2-slave-1 nagios_client]# service nrpe start Starting nrpe: /usr/sbin/nrpe: error while loading shared libraries: libssl.so.6: cannot open shared object file: No such file or directory [失败] [root@db-m2-slave-1 nagios_client]#
建立连接
[root@db-m2-slave-1 nagios_client]# ln -s /usr/lib64/libssl.so /usr/lib64/libssl.so.6 (假设没有libssl.so,就採用别的libssl.so.10来做软连接,ln -s /usr/lib64/libssl.so.10 /usr/lib64/libssl.so.6) [root@db-m2-slave-1 nagios_client]#
再又一次启动例如以下:
[root@db-m2-slave-1 nagios_client]# service nrpe start Starting nrpe: /usr/sbin/nrpe: error while loading shared libraries: libcrypto.so.6: cannot open shared object file: No such file or directory [失败] [root@web-10 ~]# ll /usr/lib64/libcrypto.so lrwxrwxrwx. 1 root root 18 10月 13 2013 /usr/lib64/libcrypto.so -> libcrypto.so.1.0.0 [root@db-m2-slave-1 nagios_client]#
再建链接:
[root@db-m2-slave-1 nagios_client]# ln -s /usr/lib64/libcrypto.so /usr/lib64/libcrypto.so.6 (或者假设没有libcrypto.so,就採用libcrypto.so.10做软连接, ln -s /usr/lib64/libcrypto.so.10 /usr/lib64/libcrypto.so.6) [root@db-m2-slave-1 nagios_client]# service nrpe start Starting nrpe: [确定] [root@db-m2-slave-1 nagios_client]#
1.4 检測下nrpe是否正常执行:
去nagiosserver端check下
[root@cache-2 ~]# /usr/local/nagios/libexec/check_nrpe -H xx.xx3.xx NRPE v2.12 [root@cache-2 ~]#
NRPE v2.12
[root@cache-2 ~]#
看到返回NRPE v2.15表示已经连接成功。
2 编写shell脚本实现lvs监控
2.1 监控脚本
Nagios里面没有现成的监控lvs的状态脚本,所以须要去网上找一个简单的监控脚本check_lvs.sh,copy到/usr/lib/nagios/plugins/文件夹,赋予nagios权限,脚本内容例如以下:
#!/bin/bash # http://www.ohlinux.com/archives/632/ # add by tim on 20140613 USAGE_Method=\"$(basename $0)[-h|--hostname] <Free ip or hostname> [-w|--warning] <Free integer> [-c|--critical] <Free integer>\" USAGE_Value=\"warning value must be small than critical value: `basename $0` $*\" STATE_OK=0 STATE_WARNING=1 STATE_CRITICAL=2 STATE_UNKNOWN=3 if [ $# -lt 4 ];then echo echo \"Usage: $USAGE_Method\" echo exit 0 fi while [ $# -gt 0 ]; do case \"$1\" in -w|--warning) shift warning=$1 ;; -c|--critical) shift critical=$1 ;; esac shift done if [[ $warning == $critical || $warning -gt $critical ]] then #echo $warning #echo $critical echo \"$USAGE_Value\" echo \"Usage: $USAGE_Method\" exit 0 fi ACT_COUNT=0 Inactive_count=0 stat1=`sudo ipvsadm | grep http | grep Route|wc -l` if [ $stat1 -ne 0 ];then for NUM in `sudo ipvsadm | grep http | grep Route | awk \'{print $5}\'` do ACT_COUNT=$(($ACT_COUNT+ $NUM)) done for NUM in `sudo ipvsadm | grep http | grep Route | awk \'{print $6}\'` do Inactive_count=$(($Inactive_count+ $NUM)) done else echo \" stat1:$stat1, lvs critical,lvs is down now.\" exit 3 fi if [[ \"$ACT_COUNT\" -gt \"$critical\" ]] then echo \"critical - lvs connetion is : $ACT_COUNT active\" exit 2 fi if [[ \"$ACT_COUNT\" -gt \"$warning\" && \"$ACT_COUNT\" -lt \"$critical\" ]] then echo \"warning - lvs connetions is : $ACT_COUNT active\" exit 1 fi if [[ \"$ACT_COUNT\" -lt \"$warning\" || $ACT_COUNT == 0 ]] then echo \"LVS OK - LVS is running (conn: $ACT_COUNT active, $Inactive_count inactive)|active=$ACT_COUNT;69999;99999;0; inactive=$Inactive_count;69999;99999;0;\" exit 0 fi
2.2 nrpe.cfg里面配置例如以下
Vim /etc/nagios/nrpe.cfg,在里面加入一行check_lvs命令:
command[check_lvs]=/usr/lib/nagios/plugins/check_lvs -w 300 -c 600之后重新启动nrpe
[root@/root/nagios/check_lvs ~]# service nrpe restart; Shutting down nrpe: [确定] Starting nrpe: [确定] [root@/root/nagios/check_lvs ~]#service nrpe restart;
2.3 去nagios服务端check一下
[root@cache-2 ~]# /usr/local/nagios/libexec/check_nrpe -H 1x.xx4.x.x5 -c check_lvs lvs critical,lvs is down now. [root@cache-2 ~]#
看到check出来lvs服务已经处于down模式。
说明:因为check_lvs是要调用ipvsadm命令来获取LVS状态的,而ipvsadm命令是仅仅能以root用户来执行的, 所以须要将nagios用户设置成能够无需password直接su成root,这样就能以nagios用户执行命令sudo /usr/lib/nagios/plugins/check_lvs 。在centos系统中,无法直接调用sudo命令,须要改动/etc/sudoers, 找到 #Defaults requiretty 并取消凝视,另外新增一行。表示nagios用户不须要登陆终端就能够调用命令,例如以下所看到的:
Defaults requiretty Defaults:nagios !requiretty #加入nagios 请求sudo,同意特定指令时(可跟參数),不须要password(如)。 nagios ALL=(ALL) NOPASSWD: ALL
再去naigosserver上面check下,已经生效,例如以下所看到的:
[root@cache-2 etc]# /usr/local/nagios/libexec/check_nrpe -H 10.xx.xx.xx -c check_lvs LVS OK - LVS is running (conn: 16 active, 77 inactive)|active=16;69999;99999;0; inactive=77;69999;99999;0; [root@cache-2 etc]#
2.4 在nagiosserver上加入配置
vim services.cfg define service{ host_name lvs-lan service_description Check lvs check_command check_nrpe!check_lvs max_check_attempts 5 normal_check_interval 3 retry_check_interval 2 check_period 24x7 notification_interval 10 notification_period 24x7 notification_options w,c,r contact_groups opsweb } vim objects/commands.cfg define command{ command_name check_lvs command_line $USER1$/check_lvs -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ }
之后又一次载入nagios既完毕了对lvs的监控服务。
[root@cache-2 etc]# service nagios reload Running configuration check... Reloading nagios configuration... done [root@cache-2 etc]#
至此,nagios以下对lvs服务的监控已经完毕。
參考资料:http://c20031776.blog.163.com/blog/static/684716252013627506890/