Nagios配置安装文档
服务端(监控端)
#rpm -Uvh http://repo.webtatic.com/yum/el6/latest.rpm
Retrieving http://repo.webtatic.com/yum/el6/latest.rpm
warning: /var/tmp/rpm-tmp.94GrDP: Header V4 DSA/SHA1 Signature, key ID cf4c4ff9: NOKEY
Preparing... ########################################### [100%]
package webtatic-release-6-6.noarch is already installed
如果出现waring,是由于yum安装了旧版本的GPG keys造成的,解决办法如下
执行 # rpm --import /etc/pki/rpm-gpg/RPM*
#yum install php54w php54w-cli php54w-common php54w-devel php54w-gd php54w-mbstring php54w-mcrypt php54w-mysql php54w-odbc php54w-pdo php54w-xml mysql55w mysql55w-devel mysql55w-server mysql55w-libs httpd memcached php-pecl-memcache gd gd-devel gcc gcc-c++ glibc glibc-devel glibc-common
出现这样的报错,rpm -qa |grep 包名 ,这两包是已经安装过了,将其单独卸载了
rpm -e 包名 --nodeps ,卸载后再重新yum即可
(报错中有提示可以加 --skip-broken) ,加上重新yum
***
再yum还是出错做如图中的动作
***
# yum install gcc gcc-c++ 安装后重新执行configure
构建基本环境:让nagios的运行用户是nagios而不是root。
# useradd nagios -s /sbin/nologin
安装nagios:
# tar zxf nagios-4.1.1.tar.gz -C /usr/local/src/
# cd /usr/local/src/nagios-4.1.1/
# ./configure --prefix=/usr/local/nagios/ --with-gd-lib=/usr/lib --with-gd-inc=/usr/include --with-nagios-user=nagios --with-nagios-group=nagios --with-command-group=nagios
*****--with-gd-lib=/usr/lib --with-gd-inc=/usr/include //配置基本库
#make all
#make install // 安装二进制文件
#make install-init // 安装启动脚本
#make install-commandmode // 安装命令模式,赋予nagios权限
#make install-config // 拷贝配置文件
# chown nagios:nagios /usr/local/nagios/ -R
******#make install-webconf // // 安装网页配置文件,给apache加入nagios配置文件,--> # ls /etc/httpd/conf.d/ (生成nagios.conf),如果是rpm包安装的apache,可以执行,否则报错
接下来做如下操作即可
================================================
简单查看都安装了哪些东西:
bin Nagios执行程序所在目录,这个目录只有一个文件nagios
etc Nagios配置文件位置,初始安装完后,只有几个*.cfg-sample文件
sbin Nagios通过web方式外部执行的cgi文件所在目录,保存执行外部命令所需文件
share Nagios网页文件所在的目录
var Nagios日志文件、spid 等文件所在的目录
libexec 插件安装命令的目录,没有安装插件,里面是空的
================================================
***# vim /usr/local/apache/conf.d/nagios.conf //源码编写路径
# vim /etc/httpd/conf.d/nagios.conf //如果是yum的httpd 也可以手动编写,内容如下:(根据环境而定,下面内容只是例子仅供参考)
DocumentRoot /usr/local/nagios/share
ServerName nagios.yss.com
ErrorLog logs/cacti-host.example.com-error_log
CustomLog logs/cacti-host.example.com-access_log common
Alias /nagios/cgi-bin/images/ "/usr/local/nagios/share/images/"
AllowOverride None
Options None
Order allow,deny
Allow from all
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /usr/local/nagios/etc/.passwd.conf
Require valid-user
ScriptAlias /nagios/cgi-bin/ "/usr/local/nagios/sbin/"
AllowOverride None
Options None
Order allow,deny
Allow from all
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /usr/local/nagios/etc/.passwd.conf
Require valid-user
#Alias /nagios "/usr/local/nagios/share/"
Alias /nagios "/usr/local/nagios/share/"
AllowOverride None
Options None
Order allow,deny
Allow from all
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /usr/local/nagios/etc/.passwd.conf
Require valid-user
AddDefaultCharset utf-8
//防止部分中文乱码
***在 Apache 配置文件件/etc/httpd/conf/httpd.conf 中找到
DirectoryIndex index.html index.html.var
将其修改为:
DirectoryIndex index.html index.php
再在 Apache 配置文件下增加如下内容
AddType application/x-httpd-php .php
***以上两处主要用于增加 php 格式的支持(编译安装的PHP要加上否则不支持)
创建web管理nagios的认证文件:
# htpasswd -bc /usr/local/nagios/etc/.passwd.conf nagiosadmin nagiosadmin或则
htpasswd -c /usr/local/nagios/etc/.passwd.conf nagiosadmin
//创建认证文件设置密码保证监控信息只有允许的人才能看见!
***源码安装的执行如下
# /usr/local/apache/bin/htpasswd -c /usr/local/nagios/etc/.passwd.conf nagiosadmin //生成nagios口令密码
或者
# /usr/local/apache/bin/htpasswd -bc /usr/local/nagios/etc/.passwd.conf nagiosadmin nagiosadmin
# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg //检查文件是否出错
# /etc/init.d/httpd restart
# /etc/init.d/nagios restart
# chkconfig httpd on
# chkconfig nagios on
测试:http://IP/nagios/
# chmod o+w /usr/local/nagios/var/rw/nagios.cmd //更改事件通知时会用到
#/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg //nagios手动重启
安装插件:
nagios-plugins是nagios官方提供的一套插件程序,nagios监控主机的功能都是通过执行插件程序来实现的
*****# cd /usr/local/nagios/libexec/ --插件所在路径
*****# ./check_icmp --help //进入该路径查看插件的帮助
# tar zxf nagios-plugins-2.1.1.tar.gz -C /usr/local/src/
# cd /usr/local/src/nagios-plugins-2.1.1/
# ./configure --prefix=/usr/local/nagios/ --with-nagios-user=nagios --with-nagios-group=nagios
# make && make install
# tar zxf nrpe-2.15.tar.gz -C /usr/local/src/
# cd /usr/local/src/nrpe-2.15/
#./configure --prefix=/usr/local/nagios/
#make all
#make install-plugin
#make install-daemon
#make install-daemon-config
————————
#vim /usr/local/nagios/etc/objects/commands.cfg //添加监控远程磁盘的命令
在最后加上
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
} //需要先在resource.cfg中定义才能引用$USER1$变量
或者
define command{
command_name check_nrpe
command_line /usr/local/nagios/libexec/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
define command{
command_name check-tcp
command_line /usr/local/nagios/libexec/check_tcp -H $HOSTADDRESS$ -p $ARG1$
}
define command{
command_name check-udp
command_line /usr/local/nagios/libexec/check_udp -H $HOSTADDRESS$ -p $ARG1$
}
# /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d //启动nrpe // -c:读取配置文件 -d:放到后台启动
加入开机自启动
# echo '/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d'>>/etc/rc.local
#/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg //nagios手动重启
*******************************************************************************
客户端(被监控端)
# yum install gcc gcc-c++ openssl openssl-devel glibc glibc-common
# useradd -s /sbin/nologin nagios
安装插件:
# tar zxf nagios-plugins-2.1.1.tar.gz -C /usr/local/src/
# cd /usr/local/src/nagios-plugins-2.1.1/
# ./configure --prefix=/usr/local/nagios/ --with-nagios-user=nagios --with-nagios-group=nagios
# make && make install
# chown nagios:nagios /usr/local/nagios/ -R
# tar zxf nrpe-2.15.tar.gz -C /usr/local/src/
# cd /usr/local/src/nrpe-2.15/
# ./configure --prefix=/usr/local/nagios/
---------------------------------------
General Options:
-------------------------
NRPE port: 5666 //NRPE 服务使用的端口
NRPE user: nagios
NRPE group: nagios
Nagios user: nagios
Nagios group: nagios
---------------------------------------
# make all
# make install-plugin
# make install-daemon
# make install-daemon-config
****# make install-xinetd **//可以用xinetd来启动NRPE
****# vim /etc/xinetd.d/nrpe
only_from = 10.10.9.241 127.0.0.1 //加入nagios服务器的IP
****# yum install xinetd -y.
****# /etc/init.d/xinetd start // nrpe是由xinetd管理的
****# chkconfig xinetd on
# vim /etc/services
在最后添加
nrpe 5666/tcp #nrpe
# vim /usr/local/nagios/etc/nrpe.cfg
修改如下内容,根据自己需求修改
allowed_hosts=127.0.0.1,10.10.9.241 //添加ip用逗号分隔 监控端IP,设置给监控主机权限用以监控
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
//用来检测用户连接数的 5个警告 10个严重 (top查看时会是6个用户连接才会警告,其中算上了服务器本机的登录用户)
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
//用来检查 CPU 负载的,1分钟,5分钟,15分钟只要其中有一个值达到警告值就会警告
command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda1
//用来检测磁盘使用率,磁盘剩余空间小于20%警告小于10%严重
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
//检测僵尸进程,5个警告 10个严重
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
//总进程数检测,达到150警告 达到200严重
# /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d //启动nrpe
# netstat -anpt |grep 5666 //查看端口,看nrpe是否启动
加入开机自启动
echo '/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d'>>/etc/rc.local
*****修改nrpe.cfg后一定要重启nrpe(客户端):
***** # killall nrpe
***** # /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
*****再重启nagios(服务端)# /etc/init.d/nagios restart
本地测试:# /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 //被监控端操作
# /usr/local/nagios/libexec/check_nrpe -H 10.10.9.242 //监控端操作
都看到 NRPE v2.15 nrpe版本号即可
(可以在-H ip 后加上-c 后面跟在nagios.cfg中定义的command的名字来测试,例如:
# /usr/local/nagios/libexec/check_nrpe -H 10.10.9.242 -c check_load
OK - load average: 0.02, 0.01, 0.00|load1=0.020;15.000;30.000;0; load5=0.010;10.000;25.000;0; load15=0.000;5.000;20.000;0;)
*******************************************************************************
监控端配置:
添加监控主机和监控服务:(为了方便管理维护,自己建文件分别是host.cfg和services.cfg,注意要让主配置文件能够识别到你新添加的配置文件,如果监控的机器比较多就在/usr/local/nagios/etc/objects/下建立两单独管理定义远程主机和远程服务的目录,里面创建的文件可以以机器ip来命名如:10.10.9.242.cfg和10.10.9.242services.cfg分类管理,这样在工作中会很方便。)
# vim /usr/local/nagios/etc/objects/host.cfg
define host{
use linux-server //默认即可,该行可无
host_name webserver //主机名任意,但是一般会跟被监控的节点的主机名对应
alias apache // 别名任起,该行可无
address 10.10.9.242 //被监控节点的IP,即远程主机的IP
max_check_attempts 5 //最大尝试次数
check_period 24x7 //监控时间段,24x7是在timeperiods.cfg文件中定义过的
normal_check_interval 5 //检测的间隔,5分钟
retry_check_interval 1 //重试检测时间间隔,1分钟
contact_groups admins //联系人组,admins是在contacts.cfg文件中定义过的
notification_period 24x7 //发送通知的时间段
notification_options d,u,r //什么状态发送通知,d:down;u:unreachable或者unknown;r:recovery
}
define host{
use linux-server ; Name of host template to use
; This host definition will inherit all variables that are defined
; in (or inherited by) the linux-server host template definition.
host_name 10.10.8.242
alias localhost
address 127.0.0.1
icon_image webcamera.png //显示个小图标
statusmap_image web.gd2
2d_coords 400,300
3d_coords 400,300,100
}
# vim /usr/local/nagios/etc/nagios.cfg
cfg_file=/usr/local/nagios/etc/objects/host.cfg //添加该行,让主配置文件能够识别到你新添加的配置文件
# vim /usr/local/nagios/etc/objects/services.cfg
define service{
use local-service ; Name of service template to use
host_name webserver //被监控主机的名字,要和定义上面定义的名字相同
service_description HTTP //这个监控项目的描述(也可说是这个项目的名称)
check_command check_nrpe!check_load
notifications_enabled 0
}
*****check_nrpe!check_load:只是引用变量,变量在被监控端的nrpe.cfg里已经定义好了(本机不用加check_nrpe!)
*****command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
*****如果在/usr/local/nagios/libexec中已有的命令,那直接在被监控端nrpe.cfg中添加命令,并在主控端的services.cfg中添加服务即可!没有的可以在网上找相应脚本添加到libexec下并修改添加脚本的所有者和所属组为nagios,权限755即可,完成后可以执行下试试,再在nrpe.cfg中添加命令,
# vim /usr/local/nagios/etc/nagios.cfg
cfg_file=/usr/local/nagios/etc/objects/services.cfg //添加该行,让主配置文件能够识别到你新添加的配置文件
# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg //验证配置文件
# /etc/init.d/nagios restart
报错解决方法
HTTP WARNING: HTTP/1.1 403 Forbidden - 5240 bytes in 0.070 second response time
主要是监控端在检测被监控端下/var/www/html/index.html文件,编译安装的apache可能不在这个下面,所以创建一个就好了
在被监控机上#touch /var/www/html/index.html
监控网站主页
# vim commands.cfg
添加
define command{
command_name check_index
command_line $USER1$/check_http $ARG1$
}
# vim localhost.cfg
define service{
use local-service //默认即可
host_name webserver //主机名任意,但是一般会跟被监控的节点的主机名对应
service_description HTTP
check_command check_index!-H 192.168.0.106 -u /index.php //如果域名监控,格式是check_index!-H [url]www.testhost.test[/url] -u /index.php
notifications_enabled 0
}
监控mysql
查看libexec目录下有没有check_mysql,如果没有,yum installed mysql-devel,在重装nagios-plugin
# ./configure --prefix=/usr/local/nagios/ --with-nagios-user=nagios --with-nagios-group=nagios
# make && make install
被监控端操作:
创建nagios专用数据库
mysql> create database nagios;
Query OK, 1 row affected (0.00 sec)
mysql> grant all on nagios.* to nagios@'localhost' identified by 'nagios'; //建立nagios专用用户权限,可以只给select不给all,这样比较安全
Query OK, 0 rows affected (0.00 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
删除用户命令
mysql> select user,password,host from mysql.user; 先查看
****mysql> delete from mysql.user where user="nagiosuser" and host="%"; 删除对应的
测试
# /usr/local/nagios/libexec/check_mysql -H localhost -u nagios -d nagios -p nagios
Uptime: 44649 Threads: 10 Questions: 152675 Slow queries: 0 Opens: 292 Flush tables: 1 Open tables: 64 Queries per second avg: 3.419
****被监控端是远程服务器操作:# vim nrpe.cfg 加入下行
command[check_mysql]=/usr/local/nagios/libexec/check_mysql -H localhost -u nagios -d nagios -p nagios
****然后重启nrpe
****监控端操作(如果没定义主机先定义主机再定义服务):# vim service.cfg 加入如下
define service{
use local-service ; Name of service template to use
host_name 10.10.8.242
service_description mysql
check_command check_nrpe!check_mysql
}
****重启nagios
# vim commands.cfg //定义命令
在最后添加
#check mysql
define command{
command_name check_mysql
command_line $USER1$/check_mysql -H $HOSTADDRESS$ -u nagiosuser -d nagios -p nagiospwd (-u和-p后面跟mysql授权的用户和密码)
}
# vim localhost.cfg //定义主机
define service{
use local-service ; Name of service template to use
host_name localhost
service_description Mysql
check_command check_mysql
notifications_enabled 0
}
尝试监控远程主机的以下服务:mysql (通过监听端口实现监控)
尝试监控远程主机的登陆用户数,5个警告(w),10个严重(c) (nrpe.cfg中有例子)
尝试监控远程主机是否存活 (可以仿照本机监控写)
1、被监控端:
[root@node nrpe-2.12]# vim /usr/local/nagios/etc/nrpe.cfg
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10 //修改原有行即可
2、监控端操作:
[root@nagios nrpe-2.12]# vim /usr/local/nagios/etc/objects/services.cfg
添加如下行:
define service{
use local-service
host_name node
service_description mysql
check_command check_tcp!3306
}
define service{
use local-service
host_name node
service_description login users
check_command check_nrpe!check_users
}
define service{
use local-service
host_name node
service_description ping
check_command check_ping!100.00,40%!200.00,80% (回环时间和丢包率)
}
邮件告警:
# vim contacts.cfg
新添加如下行
define contact{
contact_name test //名字任意起
use generic-contact
alias test //别名任意起
email test@localhost //邮箱地址
}
修改如下行:
members nagiosadmin,test // 把定义的名字添加上
# /etc/init.d/nagios restart
Nrpe详解
监控对象 |
监控阀值 |
|
主 机 资 源 |
主机存活: check_ping
|
-w 3000.0,80% -c 5000.0,100% -p 5(3000毫秒响应时间内, 丢包率超过80%报警告,5000毫秒响应时间内,丢包率超过 100%报危急,一共发送5个包) |
登录用户: check_user |
-w 5 -c 10(w为警告,c为危急) |
|
系统负载: check_load |
-w 15,10,5 -c 30,25,20(1分钟,5分钟,15分钟大于对应 的等待进程数则警告或危急) |
|
磁盘占用率: check_disk |
-w 20% -c 10% -p /(根分区剩余空间为总大小的20%警告, 10%危急,-p后是根分区) |
|
脚本检测磁盘I/O: check_iostat |
-w 5 –c 10 (磁盘I/O的iowait超过5%报警告,超过10%报危急) |
|
检测僵尸进程: check_zombie _procs |
-w 5 -c 10 -s Z(有5个僵尸进程报警告,10个报危急) |
|
检测总进程数: check_total_procs |
-w 150 -c 200(总进程到150个警告,200个报危急) |
|
脚本检测内存剩余: check_mem |
-w 90% -c 95%(内存空闲率90%以上报警告,95%以上报危急) |
|
检测交换分区使用率: check_swap |
-w 20% -c 10%(交换分区剩余空间为总大小的20%警告, 10%危急) |
|
应 用 服 务 监 控 |
监控服务端口: check_tcp |
-H localhost2 -p 80(主机与对应的端口号)
|
监控页面响应时间: check_http |
-H localhost2 -u http:\/\/localhost2/test.jsp –w 5 –c 10(检查页面,超过5s报警告,超过10s报危急) |
|
脚本检测IP连接数: check_ips |
-w 200 –c 250(IP连接数超过200报警告,超过250报危急) |
|
流量 监控 |
监控server流量: Check_traffic |
-V 2c -C public -H localhost2 -I 2 -w 12,30 -c 15,35 -M –b(snmp版本,用户,主机,对应网卡,警告阀值,危急阀值) |
|
|
______________________________________________________
# vim command.cfg