平台及所用组件,
监控服务器:redhat linux as5,nagios-3.0.5, nagios-plugins-1.4.11
被监控端:windows2003,nsclient++0.3.3(http://nsclient.org/nscp/downloads )
简单介绍
Nagios是一款开源的免费网络监视工具,其功能强大,灵活性强。能有效监控Windows、Linux和Unix的主机状态,交换机路由器等网络设置,打印机等。本文档主要实现nagios监控windows主机,nagios监控windows系统有三种实现方式:SNMP、NSClient++、NRPE.
本文只介绍使用NSClient++方式来监控Windows,然后简单介绍一下nagios如何使用插件及自己编写插件参数.
1.Nagios的安装
1.安装基础支持套件
nagios需要apache,gcc,glibc,gd库等套件才能运行.
yum install httpd
yum install gcc
yum install glibc glibc-common
yum install gd gd-devel
2.创建帐号及组
/usr/sbin/useradd -m nagios
passwd nagios
/usr/sbin/groupadd nagcmd
/usr/sbin/usermod -a -G nagcmd nagios
/usr/sbin/usermod -a -G nagcmd daemon #daemon为运行apache的账号。
3.安装nagios
tar -zxvf nagios-3.0.5.tar.gz
cd nagios-3.0.5
./configure --prefix=/usr/local/nagios --with-command-group=nagcmd --with-gd-lib=/usr/lib/ --with-gd-inc=/usr/include/
make all
make install
make install-init #在/etc/rc.d/init.d安装启动脚本
make install-config #在/usr/local/nagios/etc安装示例配置文件
make install-commandmode #配置目录权限
4.配置apache
在alias模块<IfModule alias_module>添加如下行.(如果是rpm安装apache,将下面行写入到alias处)
ScriptAlias /nagios/cgi-bin "/usr/local/nagios/sbin"
<Directory "/usr/local/nagios/sbin">
# SSLRequireSSL
Options ExecCGI
AllowOverride None
Order allow,deny
Allow from all
# Order deny,allow
# Deny from all
# Allow from 127.0.0.1
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /usr/local/nagios/etc/htpasswd.users
Require valid-user
</Directory>
Alias /nagios "/usr/local/nagios/share"
<Directory "/usr/local/nagios/share">
# SSLRequireSSL
Options None
AllowOverride None
Order allow,deny
Allow from all
# Order deny,allow
# Deny from all
# Allow from 127.0.0.1
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /usr/local/nagios/etc/htpasswd.users
Require valid-user
</Directory>
创建apache目录验证文件并得启apache
/usr/local/apache/bin/htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
New password:
Re-type new password:
Adding password for user nagiosadmin
service httpd restart #重启apache
5.安装nagios-plugins
nagios-plugins是nagios官方提供的一套插件程序,nagios监控主机的功能其实都是通过执行插件程序来实现的.
tar zxvf nagios-plugins-1.4.11.tar.gz
cd nagios-plugins-1.4.11
./configure --with-nagios-user=nagios --with-nagios-group=nagios --enable-redhat-pthread-workaround
make
make install
6.其它设置
chkconfig --add nagios #配置机器启动时自动启动Nagios
chkconfig nagios on
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg #检查Nagios配置文件
vi /etc/selinux/config #关闭SELinux
SELINUX=disabled
service iptables stop #关闭SELinux,或打开80,5666端口
7.启动并访问
启动
/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
service nagios start
现在就可以访问nagios服务了
http://192.168.0.20/nagios/
可见nagios服务能运行了,现在它只监控了它自己.下面让它监控wichow服务器.
2.监控Windows服务器
nagios监控windows系统有三种实现方式:SNMP,NSClient++,NRPE,后面两种方式都需要在windows上安装agent,本文档只介绍使用NSClient++方式来监控Windows
1.windows设置
把nsclient++0.3.3.zip解压到C盘然后进入命今窗口安装
C:\>cd "NSClient++-Win32-0.3.5"
C:\NSClient++-Win32-0.3.5>nsclient++ /install
l \NSClient++.cpp(193) Service installed!
Usage: -version, -about, -install, -uninstall, -start, -stop, -encrypt
Usage: [-noboot] <ModuleName> <commnd> [arguments]
C:\NSClient++-0.3.5-Win32>"NSClient++.exe" -install
l \NSClient++.cpp(193) Service installed!
C:\NSClient++-0.3.5-Win32>"NSClient++.exe" -start
Starting NSClientpp
C:\NSClient++-0.3.5-Win32>
编辑NES.ini
[modules] #] 部分的所有模块前面的注释都去掉 除了CheckWMI.dll和RemoteConfiguration.dll
FileLogger.dll
CheckSystem.dll
CheckDisk.dll
NSClientListener.dll
[Settings]
allowd_host=192.168.0.20/32 #些处为nagios服务端的IP
[NSClient]
port=12489 #去掉注释就可以了!
1、[modules] 部分的所有模块前面的注释都去掉,除了CheckWMI.dll and RemoteConfiguration.dll这两个 。
2、[Settings] 部分设置'password'选项来设置密码,作用是在nagios连接过来时要求提供密码。这里我们不 要密码!
3、[Settings] 部分'allowed_hosts'选项的注释去掉,并加上监控主机的IP。如: allowed_hosts=127.0.0.1,192.168.1.0/24,222.73.231.21/32
以逗号相隔. 如果这个地方是空白则表示所有的主机都可以连接上来.
注意是[Settings]部分的,因为[NSClient]部分也有这个选项.
4、[NSClient] 的'port'选项必须保证没有被注释,并且它的值是'12489',这是NSClient的默认监听端口。
nsclient++ /start
netstat –an 查看端口 12489 是否正常监听!
check_command check_nt!UPTIME 监控windows服务器运行的时间
check_command check_nt!CPULOAD!-l 5,80,90 监控Windows服务器的CPU负载
check_command check_nt!MEMUSE!-w 80 -c 90 监控Windows服务器的内存
check_command check_nt!USEDDISKSPACE!-l c -w 80 -c 90 监控Windows服务器C空间
check_command check_nt!SERVICESTATE!-d SHOWALL -l telnet 监控telnet服务的状态
check_command check_nt!PROCSTATE!-d SHOWALL -l Explorer.exe 监控Explorer进程状态
服务端:
启动nsclient服务并确认端口是否打开
C:\NSClient++-Win32-0.3.5>netstat -an | more
Active Connections
Proto Local Address Foreign Address State
TCP 0.0.0.0:445 0.0.0.0:0 LISTENING
TCP 0.0.0.0:5666 0.0.0.0:0 LISTENING
TCP 0.0.0.0:12489 0.0.0.0:0 LISTENING
2.nagios设置
接下来我们开始配置nagios服务器里面的内容
因为nagios是模块化调用,先到配置文件打开windows相关模块
vi /usr/local/nagios/etc/nagios.cfg
# Definitions for monitoring the local (Linux) host
cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
# Definitions for monitoring a Windows machine
cfg_file=/usr/local/nagios/etc/objects/windows.cfg #去掉这句话的注释
打开模块后配置windows.cfg
vi /usr/local/nagios/etc/objects/windows.cfg
define host{
use windows-server #从template中继承相关监控参数
host_name winserver #被监控主机名
alias My Windows Server #别名
address 192.168.0.8 ; 被监控的windows地址
#把下面的host_name都改成winserver
define service{
use generic-service
host_name winserver
service_description CPU Load
check_command check_nt!CPULOAD!-l 5,80,90 #监控CPU使用
}
define service{
use generic-service
host_name winserver
service_description Memory Usage
check_command check_nt!MEMUSE!-w 80 -c 90 #监控内存
}
然后是定义监控对象的正常运行时间
check_command check_nt!UPTIME
随后是定义CPU的负载状况,下面的定义表示在5分钟内的平均负载超过80%则发出警告WARNING,而超过90%则是危机报警CRITICAL alert
check_command check_nt!CPULOAD!-l 5,80,90
定义内存负载状况,当内存使用率达到80则warning 90%则CRITICAL alert
check_command check_nt!MEMUSE!-w 80 -c 90
监控C盘空间,使用率达到80则warning 90%则CRITICAL alert
check_command check_nt!USEDDISKSPACE!-l c -w 80 -c 90
监控服务状态的格式,当服务停止了则发送CRITICAL alert
check_command check_nt!SERVICESTATE!-d SHOWALL -l W3SVC
监控系统进程,当进程处于非运行状态时,则发送CRITICAL alert
check_command check_nt!PROCSTATE!-d SHOWALL -l Explorer.exe
打开windows模块,设置windows.cfg中相关被监控主机与监控内容后nagios服务器就配置完了,下面重启nagios然后看一下监控结果.
service nagios start
验证配置文件是否正确,并重启:
/usr/local/nagios/bin/nagios –v /usr/local/nagios/etc/object/nagios.cfg (检查配置文件是否正确)
Service nagios restart (重启nagios 使配置生效)
呵呵,监控到了,现在windows运行一切正常
3.Nagios监控相关内容
1).nagios目录功能的简要说明
bin Nagios执行程序所在目录,nagios文件即为主程序
etc Nagios配置文件位置
sbin Nagios Cgi文件所在目录,也就是执行外部命令所需文件所在的目录
Share Nagios网页文件所在的目录
Var Nagios日志文件、spid 等文件所在的目录
var/archives 日志归档目录
var/rw 用来存放外部命令文件
libexec 存放nagios插件
2)如何使用nagios插件
上面监控windows使用了check_nt插件(插件都放在/usr/local/nagios/libexec)
[root@cxy ~]# ls /usr/local/nagios/libexec/
check_apt check_ftp check_mailq check_overcr check_tcp
check_breeze check_http check_mrtg check_ping check_time
check_by_ssh check_icmp check_mrtgtraf check_pop check_udp
check_clamd check_ide_smart check_nagios check_procs check_ups
check_cluster check_ifoperstatus check_nntp check_real check_users
check_dhcp check_ifstatus check_nntps check_rpc check_wave
check_dig check_imap check_nrpe check_sensors negate
check_disk check_ircd check_nt check_simap urlize
可以看到有很多插件我们也可以使用帮助来自己写监控代码
例如查看check_nt帮助
[root@cxy libexec]# pwd
/usr/local/nagios/libexec
[root@cxy libexec]# ./check_nt -h
Usage:check_nt -H host -v variable [-p port] [-w warning] [-c critical][-l params] [-d SHOWALL] [-t timeout]
#监控CPU写法
CPULOAD =
Average CPU load on last x minutes.
Request a -l parameter with the following syntax:
-l <minutes range>,<warning threshold>,<critical threshold>.
<minute range> should be less than 24*60.
Thresholds are percentage and up to 10 requests can be done in one shot.
ie: -l 60,90,95,120,90,95
#完整写法为
check_nt!CPULOAD!-l 5,80,90
check_nt调用cpuload,5分钟内负载平均达到80%为warning,负载达到90%为critical
监控磁盘使用
USEDDISKSPACE =
Size and percentage of disk use.
Request a -l parameter containing the drive letter only.
Warning and critical thresholds can be specified with -w and -c.
#如果要监控C盘,达到80%报警,达到90%为严重危险
check_nt!USEDDISKSPACE!-l c -w 80 -c 90