Nagios安装
基础环境
[root@m01 yum.repos.d]# cat /etc/redhat-release
CentOS release 6.7 (Final)
[root@m01 yum.repos.d]# uname -r
2.6.32-573.el6.x86_64
[root@m01 yum.repos.d]# uname -m
x86_64
1、准备3台服务器
管理IP 角色 备注
10.0.0.61 nagios Nagios 服务器端
10.0.0.8 web01 被监控的客户端服务器
10.0.0.7 web02 被监控的客户端服务器
2、设置yum安装源
[root@m01 ~]# ping www.baidu.com(确保可以上网)
PING www.a.shifen.com (61.135.169.121) 56(84) bytes of data.
64 bytes from 61.135.169.121: icmp_seq=1 ttl=128 time=3.99 ms
cd /etc/yum.repos.d/
/bin/mv CentOS-Base.repo CentOS-Base.repo.oldboy.ori
wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos.repo
3.解决Perl软件编译问题
[root@m01 yum.repos.d]# echo 'export LC_ALL=C'>>/etc/profile
[root@m01 yum.repos.d]# tail -1 /etc/profile
export LC_ALL=C
[root@m01 yum.repos.d]# source /etc/profile
[root@m01 yum.repos.d]# echo $LC_ALL
C
[root@m01 yum.repos.d]# cd ~
4.关闭防火墙及selinux
[root@m01 ~]# /etc/init.d/iptables stop
[root@m01 ~]# /etc/init.d/iptables status
iptables: Firewall is not running.
[root@m01 ~]# chkconfig iptables off
[root@m01 ~]# chkconfig --list iptables
iptables 0:off 1:off 2:off 3:off 4:off 5:off 6:off
[root@m01 ~]#sed -i 's#SELINUX=enforcing#SELINUX=disabled#g' /etc/selinux/config
修改配置文件则永久生效,但是必须重启系统
[root@m01 ~]# getenforce
Disabled
5、解决系统时间同步问题
[root@m01 ~]# crontab -l
#time sync by oldboy at 2010-2-1
*/5 * * * * /usr/sbin/ntpdate time.nist.gov >/dev/null 2>&1
6、安装Nagios服务端所需安装包
yum install gcc glibc glibc-common -y
yum install gd gd-devel -y
yum install mysql-server -y
yum install httpd php php-gd -y
[root@m01 ~]# rpm -qa mysql httpd php
httpd-2.2.15-47.el6.centos.4.x86_64
php-5.3.3-46.el6_7.1.x86_64
mysql-5.1.73-5.el6_7.1.x86_64
7、创建Nagios服务器端需要的用户及组
[root@m01 ~]# /usr/sbin/useradd nagios
[root@m01 ~]# /usr/sbin/useradd apache -M -s /sbin/nologin
useradd: user 'apache' already exists
[root@m01 ~]# /usr/sbin/groupadd nagcmd
[root@m01 ~]# /usr/sbin/usermod -a -G nagcmd nagios
[root@m01 ~]# /usr/sbin/usermod -a -G nagcmd apache
[root@m01 ~]# id -n -G nagios
nagios nagcmd
[root@m01 ~]# id -n -G apache
apache nagcmd
8、上传软件包到指定目录或通过URL下载
mkdir -p /home/oldboy/tools/nagios
cd /home/oldboy/tools/nagios
rz
====================================================
安装Nagios服务器端
tar xf nagios-3.5.1.tar.gz
cd nagios
./configure --with-command-group=nagcmd
make all
make install
make install-init
make install-config
make install-commandmode
1、安装Nagios Web配置文件及创建登录用户
make install-webconf
htpasswd -bc /usr/local/nagios/etc/htpasswd.users oldboy 123456
cat /usr/local/nagios/etc/htpasswd.users
/etc/init.d/httpd reload
2、添加监控报警信息接受的Email地址
cp /usr/local/nagios/etc/objects/contacts.cfg{,.ori}
sed -i 's#nagios@localhost#[email protected]#g' /usr/local/nagios/etc/objects/contacts.cfg
使用第三方邮件服务商提供的邮箱,把下列一行添加达到/etc/mail.rc里
[root@m01 tools]# tail -1 /etc/mail.rc
set [email protected] smtp=smtp.163.com smtp-auth-user=18516688992 smtp-auth-password=tian123 smtp-auth=login
3、配置Apache服务并加入系统开机自启动
[root@m01 tools]# /etc/init.d/httpd start
Starting httpd:
[root@m01 tools]# /etc/init.d/httpd restart
Stopping httpd: [ OK ]
Starting httpd: httpd: Could not reliably determine the server's fully qualified domain name, using 172.16.1.61 for ServerName
[ OK ]
[root@m01 tools]# chkconfig httpd on
[root@m01 tools]# netstat -lntup|grep httpd
tcp 0 0 :::80 :::* LISTEN 53291/httpd
在浏览器登录
10.0.0.61/nagios
输入用户名和密码
oldboy
123456
显示nagios core就正常了
4、安装Nagios插件软件包
安装基础依赖包
yum install perl-devel openssl-devel -y
安装Nagiospluginx插件包
wget https://nagios-plugins.org/download/nagios-plugins-1.4.16.tar.gz
[root@m01 tools]# ls nagios-plugins-1.4.16.tar.gz
nagios-plugins-1.4.16.tar.gz
[root@m01 tools]# tar xf nagios-plugins-1.4.16.tar.gz
[root@m01 tools]# cd nagios-plugins-1.4.16
[root@m01 nagios-plugins-1.4.16]# ./configure --with-nagios-user=nagios --with-nagios-group=nagios --enable-perl-modules --with-mysql
[root@m01 nagios-plugins-1.4.16]# make
[root@m01 nagios-plugins-1.4.16]# make install
5、安装nrpe软件
ls /usr/local/nagios/libexec/check_nrpe
[root@m01 nagios-plugins-1.4.16]# cd ..
tar xf nrpe-2.12.tar.gz
cd nrpe-2.12
./configure
make all
make install -plugin
make install -daemon
make install -daemon-config
[root@m01 nrpe-2.12]# ls /usr/local/nagios/libexec/|wc -l
60
检查check_nrpe插件
[root@m01 tools]# ls /usr/local/nagios/libexec/check_nrpe
/usr/local/nagios/libexec/check_nrpe
[root@m01 nrpe-2.12]# ls /usr/local/nagios/libexec/|wc -l
60
到此为止Nagios服务器端的软件安装部分就配置完成了
6、配置并启动Nagios服务
添加Nagios服务到开机自启动
[root@m01 tools]# chkconfig nagios on
[root@m01 tools]# chkconfig --list nagios
nagios 0:off 1:off 2:on 3:on 4:on 5:on 6:off
更好的办法
[root@m01 tools]# echo "/etc/init.d/nagios start">>/etc/rc.local
[root@m01 tools]# tail -1 /etc/rc.local
/etc/init.d/nagios start
检查语法
[root@m01 tools]# /etc/init.d/nagios checkconfig
Running configuration check... OK.
启动Nagios服务
[root@m01 tools]# /etc/init.d/nagios start
Starting nagios: done.
检查Nagios服务器端进程及端口
[root@m01 tools]# ps -ef |grep nagios|grep -v grep
nagios 15895 1 0 16:41 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
[root@m01 tools]# netstat -lntup|grep nagios
===============================================
Nagios客户端安装
1、基础环境
[root@m01 yum.repos.d]# cat /etc/redhat-release
CentOS release 6.7 (Final)
[root@m01 yum.repos.d]# uname -r
2.6.32-573.el6.x86_64
[root@m01 yum.repos.d]# uname -m
x86_64
2、准备2台服务器
管理IP 角色 备注
10.0.0.8 web01 被监控的客户端服务器
10.0.0.7 web02 被监控的客户端服务器
3.解决Perl软件编译问题
[root@m01 yum.repos.d]# echo 'export LC_ALL=C'>>/etc/profile
[root@m01 yum.repos.d]# tail -1 /etc/profile
export LC_ALL=C
[root@m01 yum.repos.d]# source /etc/profile
[root@m01 yum.repos.d]# echo $LC_ALL
4.关闭防火墙及selinux
[root@m01 ~]# /etc/init.d/iptables stop
[root@m01 ~]# /etc/init.d/iptables status
iptables: Firewall is not running.
[root@m01 ~]# chkconfig iptables off
[root@m01 ~]# chkconfig --list iptables
iptables 0:off 1:off 2:off 3:off 4:off 5:off 6:off
[root@m01 ~]#sed -i 's#SELINUX=enforcing#SELINUX=disabled#g' /etc/selinux/config
修改配置文件则永久生效,但是必须重启系统
[root@m01 ~]# getenforce
Disabled
5、解决系统时间同步问题
[root@m01 ~]# crontab -l
#time sync by oldboy at 2010-2-1
*/5 * * * * /usr/sbin/ntpdate time.nist.gov >/dev/null 2>&1
=============================================
正式安装
1、安装基础系统软件
yum install gcc glibc glibc-common -y
yum install mysql-server -y
[root@m01 ~]# rpm -qa mysql
mysql-5.1.73-5.el6_7.1.x86_64
2、上传软件包到指定目录或通过URL下载
mkdir -p /home/oldboy/tools/nagios
cd /home/oldboy/tools/nagios
rz
unzip -q oldboy_training_nagios_soft.zip
3、添加Nagios用户
[root@web01 nagios]# useradd nagios -M -s /sbin/nologin
[root@web01 nagios]# id nagios
uid=508(nagios) gid=508(nagios) groups=508(nagios)
4、安装nagios-plugins插件
[root@web02 nagios]# yum install perl-devel perl-CPAN openssl-devel -y
[root@web02 nagios]# tar xf nagios-plugins-1.4.16.tar.gz
[root@web02 nagios]# cd nagios-plugins-1.4.16
[root@web02 nagios-plugins-1.4.16]# ./configure --with-nagios-user=nagios --with-nagios-group=nagios --enable-perl-modules --with-mysql
检查插件数
[root@web01 nagios]# ls /usr/local/nagios/libexec/|wc -l
61
5、安装nrpe软件
[root@m01 nagios-plugins-1.4.16]# cd ..
ls /usr/local/nagios/libexec/check_nrpe
tar xf nrpe-2.12.tar.gz
cd nrpe-2.12
./configure
make all
make install -plugin
下面两个会报错
make install -daemon
make install -daemon-config
[root@m01 nrpe-2.12]# ls /usr/local/nagios/libexec/|wc -l
60
检查check_nrpe插件
[root@m01 tools]# ls /usr/local/nagios/libexec/check_nrpe
/usr/local/nagios/libexec/check_nrpe
[root@m01 nrpe-2.12]# ls /usr/local/nagios/libexec/|wc -l
60
6、安装其他相关的插件
[root@web01 nrpe-2.12]# cd ..
[root@web01 nagios]#
#----------Dear,我是分隔符---------------------
tar zxf Params-Validate-0.91.tar.gz
cd Params-Validate-0.91
perl Makefile.PL
make
make install
cd ..
#----------Dear,我是分隔符---------------------
tar zxf Class-Accessor-0.31.tar.gz
cd Class-Accessor-0.31
perl Makefile.PL
make
make install
cd ..
#----------Dear,我是分隔符---------------------
tar zxf Config-Tiny-2.12.tar.gz
cd Config-Tiny-2.12
perl Makefile.PL
make
make install
cd ..
#----------Dear,我是分隔符---------------------
tar zxf Math-Calc-Units-1.07.tar.gz
cd Math-Calc-Units-1.07
perl Makefile.PL
make
make install
cd ..
#----------Dear,我是分隔符---------------------
tar zxf Regexp-Common-2010010201.tar.gz
cd Regexp-Common-2010010201
perl Makefile.PL
make
make install
cd ..
#----------Dear,我是分隔符---------------------
tar zxf Nagios-Plugin-0.34.tar.gz
cd Nagios-Plugin-0.34
perl Makefile.PL
make
make install
cd ..
#----------Dear,我是分隔符---------------------
#yum install sysstat -y
如果报错就是前面的perl环境变量没提前设置好
7、配置监控内存、磁盘I/O脚本插件
yum install dos2UNIX -y
/bin/cp /home/oldboy/tools/nagios/check_memory.pl /usr/local/nagios/libexec/
/bin/cp /home/oldboy/tools/nagios/check_iostat /usr/local/nagios/libexec/
chmod 755 /usr/local/nagios/libexec/check_memory.pl
chmod 755 /usr/local/nagios/libexec/check_iostat
dos2unix /usr/local/nagios/libexec/check_memory.pl
dos2unix /usr/local/nagios/libexec/check_iostat
8、配置Nagios客户端nrpe服务
cd /usr/local/nagios/etc/
[root@web02 etc]# sed -n '79p' nrpe.cfg
allowed_hosts=127.0.0.1
[root@web01 etc]# sed -i 's#allowed_hosts=127.0.0.1#allowed_hosts=127.0.0.1,10.0.0.61#g' nrpe.cfg
[root@web01 etc]# sed -n '79p' nrpe.cfg
allowed_hosts=127.0.0.1,10.0.0.61
9、然后在命令模式下执行shift+g命令道结尾。并进行如下操作
第一步,注释掉199-203行
#command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
#command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
#command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/hda1
#command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
#command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
第二步,在下面新添加要监控的内容:
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_mem]=/usr/local/nagios/libexec/check_memory.pl -w 10% -c 3%
command[check_disk]=/usr/local/nagios/libexec/check_disk -w 15% -c 7% -p /
command[check_swap]=/usr/local/nagios/libexec/check_swap -w 20% -c 10%
command[check_iostat]=/usr/local/nagios/libexec/check_iostat -w 6 -c 10
10、启动Nagios client nrpe守护进程
[root@web02 etc]# /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
检查启动结果
[root@web02 etc]# netstat -lntup|grep nrpe
tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN 24505/nrpe
[root@web02 etc]# ps -ef |grep nrpe |grep -v grep
nagios 24505 1 0 19:56 ? 00:00:00 /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
重启技巧(这里不用重启)
#pkill nrpe
#/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
11、加入开机自启
[root@web01 etc]# echo "#nagios nrpe process cmd by wangtian 2016-5-22">>/etc/rc.local
[root@web01 etc]# echo "/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d">>/etc/rc.local
检查
[root@web01 etc]# tail -2 /etc/rc.local
#nagios nrpe process cmd by wangtian 2016-5-22
/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
===============================================================================
Nagios服务器端监控
修改主配置文件(新手不需要,需要的话自己加上去书上582页)
[root@m01 tools]#cp /usr/local/nagios/etc/nagios.cfg{,.ori}
[root@m01 tools]# vim /usr/local/nagios/etc/nagios.cfg +34
增加如下主机和服务的配置文件
cfg_file=/usr/local/nagios/etc/objects/hosts.cfg
cfg_file=/usr/local/nagios/etc/objects/services.cfg
cfg_dir=/usr/local/nagios/etc/objects/services/
然后注释下列
# Definitions for monitoring the local (Linux) host
#cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
根据已有数据生成hosts.cfg
[root@m01 tools]# cd /usr/local/nagios/etc/objects/
[root@m01 objects]# head -51 localhost.cfg >hosts.cfg
[root@m01 objects]# chown nagios.nagios /usr/local/nagios/etc/objects/hosts.cfg
然后生成新的空services.cfg服务文件
[root@m01 objects]# touch services.cfg
[root@m01 objects]# chown nagios.nagios /usr/local/nagios/etc/objects/services.cfg
最后,生成服务的配置文件目录
[root@m01 objects]# mkdir services
[root@m01 objects]# chown -R nagios.nagios /usr/local/nagios/etc/objects/services
检查
[root@m01 objects]# ls -lrt
total 60
-rw-rw-r-- 1 nagios nagios 10812 May 22 15:14 templates.cfg
-rw-rw-r-- 1 nagios nagios 7716 May 22 15:14 commands.cfg
-rw-rw-r-- 1 nagios nagios 3208 May 22 15:14 timeperiods.cfg
-rw-rw-r-- 1 nagios nagios 5403 May 22 15:14 localhost.cfg
-rw-rw-r-- 1 nagios nagios 4019 May 22 15:14 windows.cfg
-rw-rw-r-- 1 nagios nagios 3124 May 22 15:14 printer.cfg
-rw-rw-r-- 1 nagios nagios 3293 May 22 15:14 switch.cfg
-rw-r--r-- 1 root root 2166 May 22 15:26 contacts.cfg.ori
-rw-rw-r-- 1 nagios nagios 2166 May 22 15:28 contacts.cfg
-rw-r--r-- 1 nagios nagios 1870 May 22 20:36 hosts.cfg
-rw-r--r-- 1 nagios nagios 0 May 22 20:38 services.cfg
drwxr-xr-x 2 nagios nagios 4096 May 22 20:39 services
====================================================================
配置Nagios服务器端监控项
1、定义要监控的Nagios客户端主机
[root@m01 objects]# cd /usr/local/nagios/etc/objects/
[root@m01 objects]# cp hosts.cfg.ori{,.1}
[root@m01 objects]# egrep -v "#|^$" hosts.cfg.ori >hosts.cfg
[root@m01 objects]# vim hosts.cfg
检查
[root@m01 objects]# cat hosts.cfg
define host{
use linux-server
host_name web01
alias web01
address 10.0.0.8
}
define host{
use linux-server
host_name web02
alias web02
address 10.0.0.7
}
define hostgroup{
hostgroup_name linux-servers
alias Linux Servers
members web01,web02
}
2、配置services.cfg,定义要监控的资源服务
[root@m01 objects]#cp services.cfg{,.ori}
[root@m01 objects]#vim services.cfg
[root@m01 objects]# cat services.cfg
define service {
use generic-service
host_name web01,web02
service_description Disk Partition
check_command check_nrpe!check_disk
}
define service {
use generic-service
host_name web01,web02
service_description Swap Useage
check_command check_nrpe!check_swap
}
define service {
use generic-service
host_name web01,web02
service_description MEM Useage
check_command check_nrpe!check_mem
}
define service {
use generic-service
host_name web01,web02
service_description Current Load
check_command check_nrpe!check_load
}
define service {
use generic-service
host_name web01,web02
service_description Disk Iostat
check_command check_nrpe!check_iostat!5!11
}
define service {
use generic-service
host_name web01,web02
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}
3、调试hosts.cfg和service.cfg的所有配置
[root@m01 objects]# cp commands.cfg{,.ori}
[root@m01 objects]# vim commands.cfg
[root@m01 objects]# tail -5 commands.cfg
# 'check_nrpe' command definition
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
4、检查语法
/etc/init.d/nagios checkconfig
出现OK就可以启动了
/etc/init.d/nagios start
如果已经启动了,就执行/etc/init.d/nagios reload
在网页输入服务器端IP/Nagios就可以看到结果啦
=====================================================================================
配置报警(前面已经修改过邮箱报警,需要其他报警的自行扩展)
配置报警就是配置contacts.cfg文件。可以将公司所有的运维人员都加入到这个文件中,如果有需要还可以分组。
配置报警的步骤:
(1) 添加联系人及联系组contacts.cfg;
define contact{
contact_name oldboy-pager
use generic-contact
alias Nagios users
email 18901398229
}
(2) 添加报警的命令commands.cfg
define command {
command_name notify-host-by-pager
command_line $USER1$/sms_send "$HOSTSTATE$ alert for $HOSTNAME$" $CONTACTOAGER$
}
define command {
command_name notify-service-by-pager
command_line $USER1$/sms_send "$HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$" $CONTACTOAGER$
}
(3) 调整联系人模板,添加报警的命令(来自于commands.cfg):
define contact{
name generic-contact
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r,f,s
host_notification_options d,u,r,f,s
service_notification_commands notify-service-by-email,notify-service-by-pager
host_notification_commands notify-host-by-email,notify-host-by-pager
register 0
}
(4) 在hosts.cfg和service.cfg配置文件中添加报警联系人及组,或者在模板中添加
contact_groups admins,group1,group2,user1
(1) 客户端获取值失败:
[root@client1 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.0.0.2 -c check_disk
CHECK_NRPE: Error - Could not complete SSL handshake. # 握手失败
# 这种问题的解决办法很简单,只需要执行下面这条命令即可:
[root@client1 ~]# /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -c check_disk
# 如果能够获得值,那就是没有添加网卡地址,在nrpe.cfg中修改allowed_hosts=127.0.0.1这一行
(2) 状态为CRITICAL
# 这种问题就是连接失败,要么是服务没起,要么就是防火墙没关。我们可以现在本地执行:
/usr/local/nagios/libexec/check_nrpe -H 10.0.0.2 -c check_disk
# 当然ip和参数都可以改,通过该命令就能得到答案,因为改命令就是Nagios获取监控值的过程
(3) 命令行执行能够获取数值,但是web界面去获取不到。
define service {
use generic-service
host_name 02-client1,01-nagios
service_description Disk Partition
check_command check_nrpe!check_disk # 肯定是这个参数定义错了
}
(4) Unable to read output
# 出现这种问题的原因就是获取值的插件没有执行权限,或者是这插件就是有问题的,总之就是插件的错。
command[check_mem]=/usr/local/nagios/libexec/check_memory.pl -w 6% -c 3% # check_memory.pl就是插件
[root@nagios libexec]# chmod +x check_memory.pl # 执行该命令,如果还是不行,那就是插件本身的问题了
总结,当web界面显示出现问题时:
(1) Nagios自身和配置文件;
(2) 在服务器端执行:
/usr/local/nagios/libexec/check_nrpe -H 被监控主机地址 -c 获取值的命令
(3) 在客户端本地执行:
/usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -c 获取值的命令
(4) 执行nrpe.cfg配置文件中的获取值的命令:
command[check_disk]=/usr/local/nagios/libexec/check_disk -w 20% -c 8% -p / # 执行该命令