部署看这个文档 学习nagios还是看马哥的文档 老男孩的文档
本文主要上根据马哥nagios的操作总结自己的操作。
监控端:
安装前的准备工作
(1)解决安装Nagios的依赖关系:
Nagios基本组件的运行依赖于httpd、gcc和gd。可以通过以下命令来检查nagios所依赖的rpm包是否已经完全安装:
# yum -y install httpd gcc glibc glibc-common gd gd-devel php php-mysql mysql mysql-devel mysql-server
说明:以上软件包您也可以通过编译源代码的方式安装,只是后面许多要用到的相关文件的路径等需要按照您的源代码安装时的配置逐一修改。此外,您还得按需启动必要的服务,如httpd等。
(2)添加nagios运行所需要的用户和组:
# groupadd nagcmd
# useradd -G nagcmd nagios
# passwd nagios
把apache加入到nagcmd组,以便于在通过web Interface操作nagios时能够具有足够的权限:
# usermod -a -G nagcmd apache
3、编译安装nagios:
# tar zxf nagios-3.3.1.tar.gz
# cd nagios-3.3.1
# ./configure --with-command-group=nagcmd --enable-event-broker
# make all
# make install
# make install-init
# make install-commandmode
# make install-config
为email指定您想用来接收nagios警告信息的邮件地址,默认是本机的nagios用户:
# vi /usr/local/nagios/etc/objects/contacts.cfg
email nagios@localhost #这个是默认设置
在httpd的配置文件目录(conf.d)中创建Nagios的Web程序配置文件:
# make install-webconf
创建一个登录nagios web程序的用户,这个用户帐号在以后通过web登录nagios认证时所用:
# htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
以上过程配置结束以后需要重新启动httpd:
# service httpd restart
4. 编译、安装nagios-plugins nrpe
当Nagios需要监控某个远程linux主机的服务或者资源情况时
第一步:nagios服务器运行check_nrpe插件,我们要在nagios配置文件中告诉它要检查什么
第二步:check_nrpe插件会通过SSL连接到远程的被监控的Linux客户端上的NRPE daemon
第三步:被监控的Linux客户端上的NRPE daemon会运行相应的nagios插件来执行检查本地资源或服务
第四步:被监控的Linux客户端上的NRPE daemon的NRPE daemon将检查的结果返回给check_nrpe插件,插件将其递交给进行nagios做处理
注 意:NRPE daemon需要nagios-plugin插件安装在远程被监控linux主机上,否则NRPE daemon不能做任何的监控;别外因为它们间的通信是加密的SSL,所以在编译安装时都要加上选项:/configure --enable-ssl --with-ssl-lib=/lib/,否则会出错
nagios的所有监控工作都是通过插件完成的,因此,在启动nagios之前还需要为其安装官方提供的插件。
# tar zxf nagios-plugins-1.4.15.tar.gz
# cd nagios-plugins-1.4.15
# ./configure --with-nagios-user=nagios --with-nagios-group=nagios
# make
# make install
安装nrpe(服务端nrpe安装的时候加个--prefix路径方便一些)
tar -zxvf nrpe-2.12.tar.gz && cd nrpe-2.12
./configure --prefix=/usr/local/nrpe --enable-ssl --with-ssl-lib (前提是已经安装了openssl与openssl-devel)
make all
make install-plugin
make install-daemon
make install-daemon-config
配置nrpe
#配置nrpe信息
vi /usr/local/nagios/etc/nrpe.cfg,查找并修改如下一行
serverip = 本机ip
allowed_hosts=192.168.1.100,127.0.0.1 #注意修改为nagios服务器的IP:192.168.1.100
5、配置并启动Nagios
(1)把nagios添加为系统服务并将之加入到自动启动服务队列:
# chkconfig --add nagios
# chkconfig nagios on
(2)检查其主配置文件的语法是否正确:
# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
(3)如果上面的语法检查没有问题,接下来就可以正式启动nagios服务了:
# service nagios start
(4)配置selinux
如果您的系统开启了selinux服务,则默认为拒绝nagios web cgi程序的运行。您可以通过下面的命令来检查您的系统是否开启了selinux:
#getenforce
如果上面命令的结果显示开启了selinux服务,您可以通过下面的命令暂时性的将其关闭:
#setenforce 0
(5)关闭防火墙
(6)通过web界面查看nagios:
http://your_nagios_IP/nagios
被监控端:
1)先添加nagios用户
# useradd -s /sbin/nologin nagios
2)NRPE依赖于nagios-plugins,因此,需要先安装之
# tar zxf nagios-plugins-1.4.15.tar.gz
# cd nagios-plugins-1.4.15
# ./configure --with-nagios-user=nagios --with-nagios-group=nagios
# make all
# make install
3)安装NRPE
# tar -zxvf nrpe-2.12.tar.gz
# cd nrpe-2.12.tar.gz
# ./configure --with-nrpe-user=nagios \
--with-nrpe-group=nagios \
--with-nagios-user=nagios \
--with-nagios-group=nagios \
--enable-command-args \
--enable-ssl
# make all
# make install-plugin
# make install-daemon
# make install-daemon-config
4)配置NRPE
# vim /usr/local/nagios/etc/nrpe.conf
log_facility=daemon
pid_file=/var/run/nrpe.pid
server_address=172.16.100.11
server_port=5666
nrpe_user=nagios
nrpe_group=nagios
allowed_hosts=172.16.100.1
command_timeout=60
connection_timeout=300
debug=0
上述配置指令可以做到见名知义,因此,配置过程中根据实际需要进行修改即可。其中,需要特定说明的是allowed_hosts指令用于定义本机所允许的监控端的IP地址。
5)启动NRPE
# /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg –d
为了便于NRPE服务的启动,可以将如下内容定义为/etc/init.d/nrped脚本:
#!/bin/bash
# chkconfig: 2345 88 12
# description: NRPE DAEMON
NRPE=/usr/local/nagios/bin/nrpe
NRPECONF=/usr/local/nagios/etc/nrpe.cfg
case "$1" in
start)
echo -n "Starting NRPE daemon..."
$NRPE -c $NRPECONF -d
echo " done."
;;
stop)
echo -n "Stopping NRPE daemon..."
pkill -u nagios nrpe
echo " done."
;;
restart)
$0 stop
sleep 2
$0 start
;;
*)
echo "Usage: $0 start|stop|restart"
;;
esac
exit 0
重要点:
监控端nrpe.conf 的统一地方,因为默认nrpe.conf 有很多没有command 需要单独设置 客户端的nrpe.conf可以都用这个
我已经下载了一个被监控端的nrpe.conf 可以直接使用,到时候修改下
server_address=192.168.2.10
allowed_hosts=192.168.2.124,192.168.2.10
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_ping]=/usr/local/nagios/libexec/check_ping!100.0,20%!500.0,60%
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_sda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda1
command[check_sda2]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda2
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
command[check_mem]=/usr/local/nagios/libexec/check_mem -f -w 20 -c 10 -C
command[check_cpu]=/usr/local/nagios/libexec/check_cpu -w 60 -c 80
#command[check_ntp]=/usr/local/nagios/libexec/check_ntp -H 172.16.30.167 -w 0.5 -c 1
command[check_iostat]=/usr/local/nagios/libexec/check_iostat -d sda1 -w 5000 -c 6000
command[check_swap]=/usr/local/nagios/libexec/check_swap -w 40% -c 20%
command[check_disk]=/usr/local/nagios/libexec/check_disk -w 20 -c 10
command[check_ping]=/usr/local/nagios/libexec/check_ping -H 192.168.2.1 -w 100.0,20% -c 500.0,60%
监控端commands.cfg启用nrpe命令 默认
/usr/local/nagios/libexec里面没有check_nrpe 把
/usr/local/nrpe/libexec/check_nrpe 复制到上面的目录即可
#在commands.cfg中定义nrpe这个外部构件
vi /usr/local/nagios/etc/nagios.cfg,打开下面这一行
cfg_file=/usr/local/nagios/etc/objects/commands.cfg
vi /usr/local/nagios/etc/objects/commands.cfg,增加如下一行
#check nrpe
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
客户端和监控端的libexe我已经打包部署的时候可以直接使用
监控端监控的时候 主机和服务分开 服务的每个主机都分开
我已经下载好到时候用的时候也可以直接使用
hosts.cfg
define host{
use linux-server
host_name 192.168.2.10
alias 192.168.2.10
address 192.168.2.10
}
define host{
use linux-server
host_name 192.168.2.123
alias 192.168.2.123
address 192.168.2.123
}
192.168.2.10.cfg
define service{
use generic-service
host_name 192.168.2.10
service_description check_disk
check_command check_nrpe!check_disk
}
define service{
use generic-service
host_name 192.168.2.10
service_description check_load
check_command check_nrpe!check_load
}
define service{
use generic-service
host_name 192.168.2.10
service_description check-users
check_command check_nrpe!check_users
}
define service{
use generic-service
host_name 192.168.2.10
service_description check_cpu
check_command check_nrpe!check_cpu
}
define service{
use generic-service
host_name 192.168.2.10
service_description check_iostat
check_command check_nrpe!check_iostat
}
define service{
use generic-service
host_name 192.168.2.10
service_description check_mem
check_command check_nrpe!check_mem
}
define service{
use generic-service
host_name 192.168.2.10
service_description check_swap
check_command check_nrpe!check_swap
}
define service{
use generic-service
host_name 192.168.2.10
service_description total_procs
check_command check_nrpe!check_total_procs
}
define service{
use generic-service
host_name 192.168.2.10
service_description check_zombie_procs
check_command check_nrpe!check_zombie_procs
}
#define service{
#use generic-service
#host_name 192.168.2.10
#service_description check_ntp
#check_command check_nrpe!check_ntp
#}
#define service{
#use generic-service
#host_name 192.168.2.10
#service_description check_ping
#check_command check_nrpe!check_ping
#}
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
service nagios restart