nagios监控多台主机(nrpe)


在被监控机上安装nrpe

 

http://nchc.dl.sourceforge.net/project/nagiosplug/nagiosplug/1.4.15/nagios-plugins-1.4.15.tar.gz
http://nchc.dl.sourceforge.net/project/nagios/nrpe-2.x/nrpe-2.12/nrpe-2.12.tar.gz

 

须先安装nagios插件


方法1:


# useradd -s /sbin/nologin -M nagios

# apt-get install libssl-dev

# ln -s /usr/lib/x86_64-linux-gnu/libssl.so /usr/lib/

# tar -zxvf  nagios-plugins-1.4.14.tar.gz
# cd  nagios-plugins-1.4.14
# ./configure --prefix=/usr/local/nagios
# make && make install

 

# tar zxvf nrpe-2.12.tar.gz
# cd nrpe-2.12
# ./configure
# make all
# make install-plugin (安装check_nrpe插件) 
# make install-daemon (安装deamon)
# make install-daemon-config (安装配置文件)

编辑nrpe配置文件

# vi /usr/local/nagios/etc/nrpe.cfg

allowed_hosts=127.0.0.1,192.168.10.8
默认为allowed_hosts=127.0.0.1

:wq

 启动nrpe

# /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d


方法2(ubuntu):


# useradd -s /sbin/nologin -M nagios


# apt-get install nagios-nrpe-server nagios-plugins


编辑nrpe配置文件
# vi /etc/nagios/nrpe.cfg


allowed_hosts=127.0.0.1,192.168.10.8
默认为allowed_hosts=127.0.0.1


:wq


启动nrpe

# service nagios-nrpe-server start



查看NRPE 是否已经启动

# netstat -nltp |grep nrpe

tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN 5163/nrpe

测试NRPE 是否则正常工作

 
# /usr/local/nagios/libexec/check_nrpe -H localhost

NRPE v2.12

 

nrpe开机自启动:

# vi /etc/rc.local

/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d

:wq


查看被监控机上的check命令,监控机监控时要用

# vi /usr/local/nagios/etc/nrpe.cfg


command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20 -c 10 -p /dev/hda1
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
command[check_swap]=/usr/local/nagios/libexec/check_swap -w 20% -c 10%

:wq


注:可以自行添加也可以修改后面的值(报警值)


如:

command[check_mapper]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/mapper/VolGroup00-LogVol00   (监控硬盘卷)

command[check_sda1]=/usr/local/nagios/libexec/check_disk -w 20 -c 10 -p /dev/sda1        (有的是sda,要视情况而定)


command[check_sda2]=/usr/local/nagios/libexec/check_disk -w 20 -c 10 -p /dev/sda2        (可以对硬盘各个分区都进行监控)

 


在监控机器上安装nrpe


1、安装check_nrpe 插件

# apt-get install libssl-dev

# ln -s /usr/lib/x86_64-linux-gnu/libssl.so /usr/lib/

# tar -zxvf nrpe-2.8.1.tar.gz
# cd nrpe-2.8.1
# ./configure
# make all
# make install-plugin

只运行这一步就行了,因为只需要check_nrpe 插件

2、测试监控机与被监控机运行的nrpedaemon 之间的通信.

# /usr/local/nagios/libexec/check_nrpe -H 192.168.1.14

NRPE v2.8.1

看到已经正确返回了NRPE 的版本信息,说明一切正常.


3、对主机192.168.1.14进行监控


在commands.cfg 中增加对check_nrpe 的定义

# vi /usr/local/nagios/etc/objects/commands.cfg

# 'check_nrpe ' command definition

define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}

:wq

意义如下 :

command_name check_nrpe (定义命令名称为check_nrpe,在services.cfg 中要使用这个名称)
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$  (这是定义实际运行的插件程序)(-c 后面带的$ARG1$参数是传给nrpe
daemon 执行的检测命令)

 


# cd /usr/local/nagios/etc/objects

# cp localhost.cfg ming.cfg

# vi ming.cfg

 

将host中的host_name改为ming,address改为192.168.1.14        (ming是随意写的)

 

将hostgroup_name改为ming,members也改为ming


define service {
  
               use                generic-service
               host_name           ming
               service_description  check_load
               check_command        check_nrpe!check_load
}


define service {
  
               use                generic-service
               host_name           ming
               service_description  check_users
               check_command        check_nrpe!check_users
}

define service {
  
               use                generic-service
               host_name           ming
               service_description  check_total
               check_command        check_nrpe!check_total_procs
}


define service {
  
               use                generic-service
               host_name           ming
               service_description  check_hda1
               check_command        check_nrpe!check_hda1
}

 

:wq


注:check_command后面的命令是依据被监控机的nrpe.cfg来写,那上面有才能写


# vi /usr/local/nagios/etc/nagios.cfg (任意处添加)


cfg_file=/usr/local/nagios/etc/objects/ming.cfg

:wq


重启nagios服务

# service nagios restart