nagios3.20安装配置笔记(带飞信短信报警,mssql和mysql监控)
nagios 官方主页
nagios http://www.nagios.org/
1,下载软件包
nagios-3.2.0.tar.gz wget http://prdownloads.sourceforge.net/sourceforge/nagios/nagios-3.2.0.tar.gz
nagios-plugins-1.4.14.tar.gz wget http://prdownloads.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.14.tar.gz
nrpe-2.12.tar.gz wget http://prdownloads.sourceforge.net/sourceforge/nagios/nrpe-2.12.tar.gz
NSClient++-Win32-0.3.6.msi wget http://nchc.dl.sourceforge.net/project/nscplus/nscplus/NSClient%2B%2B%200.3.6/NSClient%2B%2B-0.3.6-Win32.msi
2,实验环境
主机名 操作系统 IP 作用
nagios-server AS5.0 192.168.0.216 监控机(nagios主程序)
192.168.0.19 Cent5.0 192.168.0.19 被监控机
192.168.0.113 AS4.5 192.168.0.113 被监控机
192.168.0.80 Windows2k3 192.168.0.80 被监控机
192.168.0.229 Windows2k3 192.168.0.229 被监控机
3,监控目标
nagios-server 机器是否存活
ssh是否开启
磁盘负载
系统负载状况
站点是否正常
192.168.0.19 机器是否存活
mysql是否存活
ssh是否开启
磁盘负载
系统负载状况
192.168.0.113 机器是否存活
mysql是否存活
ssh是否开启
磁盘负载
系统负载状况
192.168.0.80 机器是否存活
cpu使用率
memory使用情况
c盘情况
e盘情况
f盘情况
Explorer是否正常
NSClient++是否正常
系统时间是否正常
ftp是否正常
192.168.0.229 机器是否存活
cpu使用率
memory使用情况
c盘情况
d盘情况
ms sql2000是否正常
NSClient++是否正常
系统时间是否正常
w3svc是否正常
4,配套需要的服务的安装与配置
apache安装
#tar zxvf httpd-2.2.6.tar.gz
#cd httpd-2.2.6
#./configure
--prefix=/usr/local/apache
--enable-so
--enable-ssl
--with-ssl=/usr/local/ssl
--enable-track-vars
--enable-rewrite
--with-zlib
--enable-modules=all
--enable-mods-shared=all
--with-suexec-caller=daemon
#make
#make install
飞信机器人安装
下载
wget http://www.it-adv.net/fetion/downng/fetion20090406003-linux.tar.gz
wget http://www.it-adv.net/fetion/downng/
#解压主程序
tar zxvf fetion20090406003-linux.tar.gz
mv install /usr/local/fetion
mkdir /usr/local/fetion/lib
mv library_linux.tar.gz /usr/local/fetion/lib/
tar zxvf library_linux.tar.gz
#解压后应该有一下4个文件
libACE.so.5.6.8
libACE_SSL.so.5.6.8
libcrypto.so.0.9.8
libssl.so.0.9.8
全部copy到/usr/lib下
cd /usr/local/fetion/lib/
cp libACE.so.5.6.8 libACE_SSL.so.5.6.8 libcrypto.so.0.9.8 libssl.so.0.9.8 /usr/lib/
设定lib库配置文件
#vi /etc/ld.so.conf
#增加一条
/usr/lib/
#保存退出后,执行
#ldconfig
测试是否发送成功
/usr/local/fetion/fetion --mobile=15801****** --pwd=bai******** --to=13661****** --msg-utf8=ddd
创建发送联系人的手机号的文件
# zhjczr1 mobile
13661******
# zhjczr2 mobile
13693******
编辑发送脚本
vi /usr/local/fetion/sendsms.sh
#!/bin/sh
fetionDir=/usr/local/fetion
cd $fetionDir
DIR=`pwd`
# 设置发短信的号码和飞信登录密码
user=15801******
pwd=bai********
for phone in `cat $DIR/phonelist.txt`
do
echo "$phone" | sed '/^[ \t]*$/d' | sed 's/^[ \t]*//' | sed 's/[ \t]*$//' | grep '^1[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
if (($? == 0 ));then
if [[ -f $DIR/msg.txt ]];then
cat /dev/null > msg.txt
fi
phone=`echo "$phone" | sed 's/^[ \t]*//' | sed 's/[ \t]*$//'`
echo "sms $phone $1" >> $DIR/msg.txt
echo "quit" >> $DIR/msg.txt
$fetionDir/fetion --mobile=$user --pwd=$pwd --to=$phone --msg-utf8=$1
else
continue
fi
done
5,安装与配置nagios
/usr/sbin/useradd -m nagios
进入下载目录
tar zxvf nagios-3.2.0.tar.gz
cd nagios-3.2.0
./configure --prefix=/usr/local/nagios
make all
# 使用make install来安装主程序,CGI和HTML文件
make install
# 使用make install-init在/etc/rc.d/init.d安装启动脚本
make install-init
# 使用make install-cofig来安装示例配置文件,安装的路径是/usr/local/nagios/etc.
make install-config
# 使用make install-commandmode来配置目录权限
make install-commandmode
nagios目录功能的简要说明:
bin Nagios执行程序所在目录,nagios文件即为主程序
etc Nagios配置文件位置
sbin Nagios Cgi文件所在目录,也就是执行外部命令所需文件所在的目录
Share Nagios网页文件所在的目录
var Nagios日志文件、spid 等文件所在的目录
var/archives 日志归档目录
var/rw 用来存放外部命令文件
配置apache
ScriptAlias /nagios/cgi-bin /usr/local/nagios/sbin
<Directory "/usr/local/nagios/sbin"> #Cgi文件所在目录
AuthType Basic
Options ExecCGI
AllowOverride None
Order allow,deny
Allow from all
AuthName "Nagios Access"
AuthUserFile /usr/local/nagios/etc/htpasswd #验证文件路径
Require valid-user
</Directory>
Alias /nagios /usr/local/nagios/share
<Directory "/usr/local/nagios/share"> #nagios页面文件目录
AuthType Basic
Options None
AllowOverride None
Order allow,deny
Allow from all
AuthName "nagios Access"
AuthUserFile /usr/local/nagios/etc/htpasswd #验证文件路径
Require valid-user
</Directory>
# 创建apache目录验证文件
/usr/local/apache/bin/htpasswd -c /usr/local/nagios/etc/htpasswd test
New password: (输入密码)
Re-type new password: (再输入一次密码)
# 重启apache:/usr/local/apache/bin/apachectl -k restart
以后在添加文件不需要加-c的参数,-c是建立passwdfile文件。如果passwdfile已经存在,则它被重写并截断,所以要添加新用户就直接
/usr/local/apache/bin/htpasswd /usr/local/nagios/etc/htpasswd test2
这里我们提前建立好一个以后发送短信的用户
/usr/local/apache/bin/htpasswd /usr/local/nagios/etc/htpasswd sendmsg
New password: (输入密码)
Re-type new password: (再输入一次密码)
启动Nagios
# 配置机器启动时自动启动Nagios
chkconfig --add nagios
chkconfig nagios on
安装Nagios插件
tar xzf nagios-plugins-1.4.14.tar.gz
cd nagios-plugins-1.4.14
./configure
make
make install
安装nrpe插件,用来监控Linux机器
tar xzvf nrpe-2.12.tar.gz
cd nrpe-2.12
./configure
make all
# 在Nagios服务器端只要安装nrpe监控插件就行
make install-plugin
接下来修改nagios的主配置文件nagios.cfg
vi /usr/local/nagios/etc/nagios.cfg
#添加或修改下面的配置
cfg_file=/usr/local/nagios/etc/objects/commands.cfg # 命令的配置文件路径
cfg_file=/usr/local/nagios/etc/objects/contacts.cfg # 联系人配置文件路径
cfg_file=/usr/local/nagios/etc/objects/contactgroups.cfg # 联系组配置文件路径
cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg # 监视时段配置文件路径
cfg_file=/usr/local/nagios/etc/objects/templates.cfg # 模板的配置文件路径
cfg_file=/usr/local/nagios/etc/objects/hostgroups.cfg # 主机组配置文件路径
cfg_file=/usr/local/nagios/etc/objects/hosts.cfg # 主机配置文件路径
cfg_file=/usr/local/nagios/etc/objects/services.cfg # 服务配置文件路径
#把关于windows的配置选项前面的#号去掉
cfg_file=/usr/local/nagios/etc/objects/windows.cfg
#把server目录配置前面的#去掉,记得手动创建目录
cfg_dir=/usr/local/nagios/etc/servers
#如果需要查看日志就把下面的配置加上,记得自己手动创建目录
log_file=/usr/local/nagios/var/nagios.log
debug_file=/usr/local/nagios/var/nagios.debug
debug_level=32
#修改CGI脚本控制文件cgi.cfg
vi /usr/local/nagios/etc/cgi.cfg
#是否开启验证,1是开启0是关闭
use_authentication=1
#修改默认用户
default_user_name=test
#多个用户之间用逗号隔开
authorized_for_system_information=nagiosadmin,test
authorized_for_configuration_information=nagiosadmin,test
authorized_for_system_commands=test
authorized_for_all_services=nagiosadmin,test
authorized_for_all_hosts=nagiosadmin,test
authorized_for_all_service_commands=nagiosadmin,test
authorized_for_all_host_commands=nagiosadmin,test
定义监控时间段,配置文件timeperiods.cfg
vi /usr/local/nagios/etc/objects/timeperiods.cfg
#以下的配置只能用tab键来相隔,不能用空格
#默认是有以下配置的
#定义了一个监控时间段,它的名称是24x7,监控的时间是每天全天24小时
define timeperiod{
timeperiod_name 24x7
alias 24 Hours A Day, 7 Days A Week
sunday 00:00-24:00
monday 00:00-24:00
tuesday 00:00-24:00
wednesday 00:00-24:00
thursday 00:00-24:00
friday 00:00-24:00
saturday 00:00-24:00
}
定义联系人,配置文件contacts.cfg
vi /usr/local/nagios/etc/objects/contacts.cfg
#以下的配置只能用tab键来相隔,不能用空格
#添加下面的配置
define contact{
contact_name test
alias Sys Admin ; Full name of user
contactgroups sagroup
service_notification_period 24x7 ; 服务出了状况通知的时间段,这个时间段就是上面在timeperiods.cfg中定义的.
host_notification_period 24x7 ; 主机出了状况通知的时间段, 这个时间段就是上面在timeperiods.cfg中定义的
service_notification_options w,u,c,r ; 当服务出现w―报警(warning),u―未知(unkown),c―严重(critical),或者r―从异常情况恢复正常,在这四种情况下通知联系 人.
host_notification_options d,u,r ; 当主机出现d―当机(down),u―返回不可达(unreachable),r―从异常情况恢复正常,在这3种情况下通知联系人
service_notification_commands notify-service-by-email ; 服务出问题通知采用的命令notify-service-by-email,这个命令是在commands.cfg中定义的,作用是给联系人发邮件.
host_notification_commands notify-host-by-email ; 主机出问题时采用的也是发邮件的方式通知联系人
email
[email protected] ; 联系的人email地址
}
define contact{
contact_name sendmsg
alias Sys Admin ; Full name of user
contactgroups sendmsggroup
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r
host_notification_options d,u,r
service_notification_commands notify-service-by-sendmsg ; 服务出问题通知采用的命令notify-service-by-sendmsg,这个命令是在commands.cfg中定义的,作用是给联系人发短信.
host_notification_commands notify-host-by-sendmsg ; 主机出问题时采用的也是发邮件的方式通知联系人
email
[email protected]
}
将多个联系人组成一个联系人组,创建文件contactgroups.cfg
vi /usr/local/nagios/etc/objects/contactgroups.cfg
#以下的配置只能用tab键来相隔,不能用空格
#要添加多个用户到sagroup组只需要在members后面继续增加用户名即可,需要用逗号(,)相隔
define contactgroup{
contactgroup_name sagroup
alias Novell Administrators
members test
}
define contactgroup{
contactgroup_name sendmsggroup
alias Sendmsg Administrators
members sendmsg
}
定义被监控主机,创建文件hosts.cfg
vi /usr/local/nagios/etc/objects/hosts.cfg
#以下的配置只能用tab键来相隔,不能用空格
define host{
host_name nagios-server #定义主机名字
alias wd-linux-216 #定义别名
address 192.168.0.216 #真是ip地址
check_command check-host-alive #监控的命令check-host-alive,这个命令来自commands.cfg,用来监控主机是否存活
max_check_attempts 5 #检查失败后重试的次数
check_period 24x7 #检查的时间段24x7,同样来自于我们之前在timeperiods.cfg中定义的
contact_groups sagroup #联系人组,上面在contactgroups.cfg中定义的sagroup
notification_interval 10 #提醒的间隔,每隔10秒提醒一次
notification_period 24x7 #提醒的周期, 24x7,同样来自于我们之前在timeperiods.cfg中定义的
notification_options d,u,r #指定什么情况下提醒,具体含义见之前contacts.cfg部分的介绍
}
与联系人可以组成联系人组一样,多个主机也可以组成主机组.创建文件hostgrops.cfg
vi /usr/local/nagios/etc/objects/hostgroups.cfg
#以下的配置只能用tab键来相隔,不能用空格
#同样,多台主机名需要用逗号(,)相隔
#事先加好其他的主机名到组里,稍候再讲其他主机名的配置
define hostgroup{
hostgroup_name linux-servers
alias Linux Servers
members nagios-server,192.168.0.19,192.168.0.113
}
define hostgroup{
hostgroup_name windows-servers
alias Windows Servers
members 192.168.0.80,192.168.0.229
}
定义监控的项目服务,创建services.cfg
vi /usr/local/nagios/etc/objects/services.cfg
#以下的配置只能用tab键来相隔,不能用空格
define service{
host_name nagios-server #被监控的主机,hosts.cfg中定义的
service_description check-host-alive #这个监控项目的描述(也可以说是这个项目的名称),可以空格,我们这里定义的是监控这个主机是不是存活
check_command check-host-alive #所用的命令,是commands.cfg中定义的
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7 #监控的时间段,是timeperiods.cfg中定义的
notification_interval 10
notification_period 24x7 #通知的时间段, ,是timeperiods.cfg中定义的
notification_options w,u,c,r
contact_groups sagroup #联系人组,是contactgroups.cfg中定义的
}
define service{
host_name nagios-server
service_description check-ssh
check_command check_tcp!22 #所用的命令,是commands.cfg中定义的
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
define service{
host_name nagios-server
service_description check_local_disk
check_command check_local_disk!10%!5%!/ #所用的命令,是commands.cfg中定义的
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
define service{
host_name nagios-server
service_description check-load
check_command check_nrpe!check_load #所用的命令,是commands.cfg中定义的
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
define service{
host_name nagios-server
service_description msg.baihe.com-Java
check_command check_http_uri!8080!http://lily_gsm.local/smssend/index.html #所用的命令,是commands.cfg中定义的
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups +sendmsggroup #联系人组,是contactgroups.cfg中定义的,这里的+代表不替换原有的sagroup组而是增加了sendmsggroup组
}
建立servers文件夹
mkdir /usr/local/nagios/etc/servers/
创建被监控机的配置文件
vi /usr/local/nagios/etc/servers/192.168.0.19_l.cfg
#以下的配置只能用tab键来相隔,不能用空格
define host{
host_name 192.168.0.19
alias wd-linux-19
address 192.168.0.19
check_command check-host-alive #所用的命令,是commands.cfg中定义的
max_check_attempts 5
check_period 24x7
contact_groups sagroup
notification_interval 10
notification_period 24x7
notification_options d,u,r
}
define service{
host_name 192.168.0.19
service_description check-host-alive
check_command check-host-alive #所用的命令,是commands.cfg中定义的
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
define service{
host_name 192.168.0.19
service_description check-ssh #监控ssh端口是否存活的名称,在nagios中显示的
check_command check_tcp!22 #所用的命令,是commands.cfg中定义的
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
define service{
host_name 192.168.0.19
service_description check_local_disk
check_command check_local_disk!10%!5%!/ #所用的命令,是commands.cfg中定义的
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
define service{
host_name 192.168.0.19
service_description check-load
check_command check_nrpe!check_load #所用的命令,是commands.cfg中定义的
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
define service{
host_name 192.168.0.19
service_description MySQL
check_command check_mysql!slave1!123456!3306 #所用的命令,是commands.cfg中定义的,slave1是mysql的用户名,123456是密码,3306是端口,!是相隔标志
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups +sendmsggroup #联系人组,是contactgroups.cfg中定义的,这里的+代表不替换原有的sagroup组而是增加了sendmsggroup组
}
#在这里说明下,host_name是主机名称需要在host中有定义,本例在192.168.0.19_l.cfg这个配置文件中先配置了host,然 后在配置了server,这样做的好处是便于管理,如果以后监控的机器非常多都写在hosts.cfg和services.cfg文件中,那样查看起来比 较麻烦,所以这样分机器些配置文件是便于以后的管理,现在把其他的被监控的机器也分别配置好就可以,这里可以以192.168.0.19_l.cfg为 例,来copy成其他需要被监控的文件,在修改下主机名就可以了,这样以后再加新的机器就简单多了。
vi /usr/local/nagios/etc/servers/192.168.0.113_l.cfg
#以下的配置只能用tab键来相隔,不能用空格
define host{
host_name 192.168.0.113
alias wd-linux-113
address 192.168.0.113
check_command check-host-alive
max_check_attempts 5
check_period 24x7
contact_groups +sendmsggroup
notification_interval 10
notification_period 24x7
notification_options d,u,r
}
define service{
host_name 192.168.0.113
service_description check-host-alive
check_command check-host-alive
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
define service{
host_name 192.168.0.113
service_description check-ssh
check_command check_tcp!22
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
define service{
host_name 192.168.0.113
service_description check_local_disk
check_command check_local_disk!10%!5%!/
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
define service{
host_name 192.168.0.113
service_description search.baihe.com
check_command check_http_uri!8080!http://search.baihe.com/index.jsp
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups +sendmsggroup
}
vi /usr/local/nagios/etc/servers/192.168.0.80_w.cfg
#以下的配置只能用tab键来相隔,不能用空格
define host{
host_name 192.168.0.80
alias wd-windows-80
address 192.168.0.80
check_command check-host-alive
max_check_attempts 5
check_period 24x7
contact_groups sagroup
notification_interval 10
notification_period 24x7
notification_options d,u,r
}
define service{
host_name 192.168.0.80
service_description check-host-alive
check_command check-host-alive
check_command check_ftp
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
define service{
host_name 192.168.0.80
service_description check-ftp
check_command check_ftp
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
define service{
host_name 192.168.0.80
service_description NSClient++ Version
check_command check_nt!CLIENTVERSION
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
define service{
host_name 192.168.0.80
service_description Uptime
check_command check_nt!UPTIME
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
define service{
host_name 192.168.0.80
service_description CPU Load
check_command check_nt!CPULOAD!-l 5,80,90
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
define service{
host_name 192.168.0.80
service_description Memory Usage
check_command check_nt!MEMUSE!-w 80 -c 90
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
define service{
host_name 192.168.0.80
service_description C_Drive_Space
check_command check_nt!USEDDISKSPACE!-l c -w 80 -c 90
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
define service{
host_name 192.168.0.80
service_description E_Drive_Space
check_command check_nt!USEDDISKSPACE!-l e -w 80 -c 90
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
define service{
host_name 192.168.0.80
service_description F_Drive_Space
check_command check_nt!USEDDISKSPACE!-l f -w 80 -c 90
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
define service{
host_name 192.168.0.80
service_description Explorer
check_command check_nt!PROCSTATE!-d SHOWALL -l Explorer.exe
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
vi /usr/local/nagios/etc/servers/192.168.0.229_w.cfg
#以下的配置只能用tab键来相隔,不能用空格
define host{
host_name 192.168.0.229
alias wd-windows-229
address 192.168.0.229
check_command check-host-alive
max_check_attempts 5
check_period 24x7
contact_groups sagroup
notification_interval 10
notification_period 24x7
notification_options d,u,r
}
define service{
host_name 192.168.0.229
service_description check-host-alive
check_command check-host-alive
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
define service{
host_name 192.168.0.229
service_description NSClient++ Version
check_command check_nt!CLIENTVERSION
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
define service{
host_name 192.168.0.229
service_description Uptime
check_command check_nt!UPTIME
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
define service{
host_name 192.168.0.229
service_description CPU Load
check_command check_nt!CPULOAD!-l 5,80,90
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
define service{
host_name 192.168.0.229
service_description Memory Usage
check_command check_nt!MEMUSE!-w 80 -c 90
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
define service{
host_name 192.168.0.229
service_description C_Drive_Space
check_command check_nt!USEDDISKSPACE!-l c -w 80 -c 90
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
define service{
host_name 192.168.0.229
service_description D_Drive_Space
check_command check_nt!USEDDISKSPACE!-l d -w 90 -c 95
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
define service{
host_name 192.168.0.229
service_description W3SVC
check_command check_nt!SERVICESTATE!-d SHOWALL -l W3SVC
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
define service{
host_name 192.168.0.229
service_description Ms Sql
check_command check_mssql!sa!sa!2000 #监控sqlserver需要安装freeTDS,下面会讲到freeTDS的安装
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
监控sqlserver需要安装插件freeTDS
wget http://dag.wieers.com/rpm/packages/freetds/freetds-0.62.1-0.el5.rf.i386.rpm
rpm -ivh --nodeps freetds-0.62.1-0.el5.rf.i386.rpm
#查看是否被安装
rpm -qa |grep freetds
现在来配置nagios的commands.cfg文件
#上面的好多命令如check_http_uri是不存在的需要自己添加
vi /usr/local/nagios/etc/objects/commands.cfg
#添加下面的内容
#调用飞信机器人发送短信报警
define command {
command_name notify-host-by-sendmsg
command_line /usr/local/fetion/sendsms.sh "Host $HOSTSTATE$ alert for $HOSTNAME$($HOSTADDRESS$) on $TIME$."
}
#调用飞信机器人发送短信报警
define command {
command_name notify-service-by-sendmsg
command_line /usr/local/fetion/sendsms.sh ""$TIME$":$SERVICEDESC$($HOSTADDRESS$) is $SERVICESTATE$."
}
# 'check_nrpe ' command definition
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
# 'check_mysql ' command definition
define command{
command_name check_mysql
command_line $USER1$/check_mysql -H $HOSTADDRESS$ -u $ARG1$ -p $ARG2$ -P $ARG3$ $ARG4$
}
# 'check_mssql ' command definition
define command{
command_name check_mssql
command_line $USER1$/check_mssql $HOSTADDRESS$ $ARG1$ $ARG2$ $ARG3$
}
# 'check_http_uri' command definition
define command{
command_name check_http_uri
command_line $USER1$/check_http -I $HOSTADDRESS$ -p $ARG1$ -u $ARG2$ $ARG3$
}
保存退出
这里大概讲一下,commands.cfg配置文件的原理是调用nagios-plugins插件安装产生的工具来检查各个机器的相应状况,nagios-plugins插件安装产生的工具所在目录为 /usr/local/nagios/libexec/
#注意check_mssql默认是没有的需要从网上下载或者自己编译
#check_mysql如果没有就需要安装mysql_client,然后在重新安装nagios-plugins,如果实在没有就从别的地方copy过来也可以,我就是从别的地方copy过来的!
好了,到此nagios的server端已经安装和配置完,下面我们去被监控机器上去安装插件
6,被监控机器上 安装插件
linux被监控机器的安装
useradd nagios
tar xzvf nagios-plugins-1.4.13.tar.gz
cd nagios-plugins-1.4.13
# Nagios-plugins默认安装到/usr/local/nagios
./configure
make
make install
chown nagios.nagios /usr/local/nagios/
chown -R nagios.nagios /usr/local/nagios/libexec/
tar xzvf nrpe-2.12.tar.gz
cd nrpe-2.12
./configure
make all
# 安装nrpe插件,本监控端可以不装
make install-plugin
# 安装nrpe守护进程
make install-daemon
# 安张nrpe配置文件
make install-daemon-config
# 修改nrpe配置文件,允许Nagios监控服务器(192.168.0.19)监控
vi nrpe.cfg
# 多台机器用逗号隔开
allowed_hosts=127.0.0.1,192.168.0.216
# 以独立守护进程启动nrpe,也可以使用xinetd启动nrpe,具体清查看nrpe官方文档。
/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
# 开机自动启动nrpe
echo "/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d" >> /etc/rc.d/rc.local
# 检查nrpe是否安装正常
[root@wiki etc]# /usr/local/nagios/libexec/check_nrpe -H localhost
NRPE v2.12
# 返回nrpe版本说明安装没问题。
# 查看启动端口
[root@wiki ~]# netstat -tunlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN 27387/nrpe
如果有防火墙应该开放5666端口:
iptables -I eth0 -p tcp -m tcp �Cdport 5666 -j ACCEPT
windows被监控机的安装
下载NSClient++-Win32-0.3.6.msi
进行安装
到安装目录打开NSC.ini文件进行修改:
C:\Program Files\NSClient++\NSC.ini
编辑NSC.ini文件
在[modules]模块,将除CheckWMI.dll和RemoteConfiguration.dll外的所有dll文件名前的注释(;)去掉。
在[Settings]模块可以设置一个连接密码password=PWD,为了简单,在此不设密码。设置 allowed_hosts=127.0.0.1/32,192.168.0.216,可以连接的监控服务器的地址,如果写成192.168.0.0 /24则表示该子网内的所有机器都可以访问;如果这个地方是空白则表示所有的主机都可以连接上来(注意在[NSClient]有 allowed_hosts的同样设置,不要设置错了),最后不要忘记去掉前面的注释符(;)。
运行nsclient++
cd C:\Program Files\NSClient++
NSClient++ /start (也可以在服务管理器中去启动、重启NSClientpp)
如果有防火墙,请开放相应端口。
创建监控配置文件,使用check_nt命令监控windows系统信息(此命令默认已定义)。
到此位置被监控机的安装也已经都完成了
7.检查nagios的所有配置文件并启动nagios
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Nagios Core 3.2.0
Copyright (c) 2009 Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 08-12-2009
License: GPL
Website: http://www.nagios.org
Reading configuration data...
Read main config file okay...
Processing object config file '/usr/local/nagios/etc/objects/commands.cfg'...
Processing object config file '/usr/local/nagios/etc/objects/contacts.cfg'...
Processing object config file '/usr/local/nagios/etc/objects/contactgroups.cfg'...
Processing object config file '/usr/local/nagios/etc/objects/timeperiods.cfg'...
Processing object config file '/usr/local/nagios/etc/objects/templates.cfg'...
Processing object config file '/usr/local/nagios/etc/objects/hostgroups.cfg'...
Processing object config file '/usr/local/nagios/etc/objects/hosts.cfg'...
Processing object config file '/usr/local/nagios/etc/objects/services.cfg'...
Processing object config file '/usr/local/nagios//etc/objects/windows.cfg'...
Processing object config directory '/usr/local/nagios/etc/servers'...
Processing object config file '/usr/local/nagios/etc/servers/192.168.0.113_l.cfg'...
Processing object config file '/usr/local/nagios/etc/servers/192.168.0.80_w.cfg'...
Processing object config file '/usr/local/nagios/etc/servers/192.168.0.19_l.cfg'...
Processing object config file '/usr/local/nagios/etc/servers/192.168.0.229_w.cfg'...
Read object config files okay...
Running pre-flight check on configuration data...
Checking services...
Checked 33 services.
Checking hosts...
Checked 5 hosts.
Checking host groups...
Checked 2 host groups.
Checking service groups...
Checked 0 service groups.
Checking contacts...
Checked 3 contacts.
Checking contact groups...
Checked 2 contact groups.
Checking service escalations...
Checked 0 service escalations.
Checking service dependencies...
Checked 0 service dependencies.
Checking host escalations...
Checked 0 host escalations.
Checking host dependencies...
Checked 0 host dependencies.
Checking commands...
Checked 31 commands.
Checking time periods...
Checked 5 time periods.
Checking for circular paths between hosts...
Checking for circular host and service dependencies...
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...
Total Warnings: 0
Total Errors: 0
Things look okay - No serious problems were detected during the pre-flight check
看到上面的提示就表示都ok了,如果有问题就按照提示一一调整,切记nagios的大部分配置文件里不支持空格尽量用TAB键来相隔,下面我们来重启nagios
service nagios restart
打开浏览器输入网址就可以访问nagios了
http://192.168.0.216/nagios/
输入用户名 test
密码 密码
现在我们就看到nagios的监控页面了,并且如果出现问题是可以发邮件和短信报警的了!!
到此配置基本结束,当然nagios是相当强大的这里只是发挥了一小部分的功能。如果大家对nagios感兴趣请下载官方文档
http://support.nagios.com/knowledgebase/officialdocs