监控内部信息,需要通过客户端的NRPE 插件收集内部信息,服务器通过check_nrpe 接收客户端的信息。

一、配置客户端

[root@server nrpe-2.12]# groupadd nagios
[root@server nrpe-2.12]# useradd -g nagios -s /sbin/nologin nagios
[root@server libexec]# chown nagios.nagios /usr/local/nagios
[root@server libexec]# chown -R nagios.nagios /usr/local/nagios/libexec


1. 安装nrpe 插件
[root@server ~]# tar zxvf nrpe-2.12.tar.gz
[root@server ~]# cd nrpe-2.12
[root@server nrpe-2.12]# ./configure
[root@server nrpe-2.12]# make all
[root@server nrpe-2.12]# make install-plugin
[root@server nrpe-2.12]# make install-daemon
[root@server nrpe-2.12]# make install-daemon-config


2. 安装nagios 补丁
[root@server ~]# tar zxvf nagios-plugins-1.4.14.tar.gz
[root@server ~]# cd nagios-plugins-1.4.14
[root@server nagios-plugins-1.4.14]# ./configure --with-nagios-user=nagios --with-

nagios-group=nagios
[root@server nagios-plugins-1.4.14]# make
[root@server nagios-plugins-1.4.14]# make install

3. 配置nrpe.cfg
[root@server libexec]# vim /usr/local/nagios/etc/nrpe.cfg 
allowed_hosts=127.0.0.1,192.168.30.100  # 添加监控主机的IP

4. 启动nrpe守护进程
/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
检查nrpe是否已经启动
[root@server ~]# netstat -nultp |grep 5666
tcp  0  0 0.0.0.0:5666     0.0.0.0:*      LISTEN      18929/nrpe

5. 测试nrpe功能
[root@server ~]# /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1
NRPE v2.12
正常的返回值为被监控服务器上安装的NRPE的版本信息,如果能看到这些,表示NRPE已经正常工作了

6. 定义监控服务器内容
 要监控一个远程服务器下的某些信息,首先要在远程服务器中定义监控的内容,例如,如果

要监控一台远程服务器上的当前用户数、cpu负载、磁盘利用率、交换空间使用情况时,则需要在

nrpe.conf中定义监控内容:
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_sda5]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda5
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
command[check_swap]=/usr/local/nagios/libexec/check_swap -w 20 -c 10

  其中,command后面中括号里面的内容就是定义的变量,变量名可以随意指定。

二.服务端上的配置

1. 安装nrpe
[root@server ~]# tar zxvf nrpe-2.12.tar.gz
[root@server ~]# cd nrpe-2.12
[root@server nrpe-2.12]# ./configure
[root@server nrpe-2.12]# make all
[root@server nrpe-2.12]# make install-plugin
[root@server nrpe-2.12]# make install-daemon
[root@server nrpe-2.12]# make install-daemon-config

检测nrpe是否能正常和客户端通信
[root@server ~]# /usr/local/nagios/libexec/check_nrpe  -H 192.168.30.110
CHECK_NRPE: Error - Could not complete SSL handshake.
注意:这里有一个报错。
解决办法:
1).检查是否安装了openssl和openssl-devel包
[root@server ~]# rpm -qa |grep ssl
openssl-devel-1.0.0-20.el6_2.5.x86_64
openssl-1.0.0-20.el6_2.5.x86_64

2). 检查/usr/local/nagios/etc/nrpe.cfg 此配置文件是否配置正确
allowed_hosts=127.0.0.1,192.168.30.100

若check_nrpe 的返回值是其版本号,则表示已正常通信
[root@server nrpe-2.12]# /usr/local/nagios/libexec/check_nrpe  -H 192.168.30.110
NRPE v2.12

2.定义一个check_nrpe监控命令
[root@server objects]# vim commands.cfg
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}

3.测试命令执行
在nrpe.cfg文件中定义的几条默认的配置可以直接使用,我们在使用前先测试一下,看看需不需对命

令的参数进行一些调整,以符合我们实际情况:
在监控主机上运行:
[root@server objects]#  /usr/local/nagios/libexec/check_nrpe -H 192.168.30.110 -c

check_users
USERS OK - 1 users currently logged in |users=1;5;10;0

4.编辑services.cfg
[root@server ~]# vim /usr/local/nagios/etc/services.cfg
define service{
        use     generic-service
        host_name       node-1
        service_description     check_users
        check_command           check_nrpe!check_users
        max_check_attempts 5
        normal_check_interval 1
}
define service{
        use     generic-service
        host_name       node-1
        service_description     check_load
        check_command           check_nrpe!check_load
        max_check_attempts 5
        normal_check_interval 1
}
define service{
        use     generic-service
        host_name      node-1
        service_description     check_sda
        check_command           check_nrpe!check_sda
        max_check_attempts 5
        normal_check_interval 1
}
define service{
        use     generic-service
        host_name       node-1
        service_description     check_zombie_procs
        check_command           check_nrpe!check_zombie_procs
        max_check_attempts 5
        normal_check_interval 1
}
define service{
        use     generic-service
        host_name       node-1
        service_description     check_total_procs
        check_command           check_nrpe!check_total_procs
        max_check_attempts 5
        normal_check_interval 1
}
define service{
        use     generic-service
        host_name       node-1
        service_description     check_swap
        check_command           check_nrpe!check_swap
        max_check_attempts 5
        normal_check_interval 1
}

注意:这里的命令需要先在客户机的nrpe.cfg 上定义,并且要对应上!

5.重启nrpe和nagios服务
客户端:
[root@client ~]# killall nrpe
[root@client ~]# /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d

服务器:
通过-v 检查是否存在错误,若没错误则重启nagios
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
[root@server ~]# service nagios restart
Running configuration check...done.
Stopping nagios: done.
Starting nagios: done.

Nagios(四)——监控内部信息_第1张图片

 

相关软件包下载:http://down.51cto.com/data/699395