Nagios使用说明文档
1,nagios介绍
1.1 什么是nagios?
nagios是一款用于系统和网络监控的应用程序,它可以在你设定的条件下对主机和服务进行监控,在状态变差和变好的是会给出警告信息。
1.2 nagios的特征
监控网络服务(smtp,pop3 http nntp ping 等)
监控主机资源(cpu负荷,磁盘利用率,内存利用率等)
简单的插件社稷使得用户可以方便的扩展自己服务的监测方法
并行服务检查机制
具备定义网络分层结构的能力,用“parent”主机定义来表达网络主机间的关系,这种关系可被用来发现和明晰主机当机和不可达状态
当服务或主机文体产生与解决时将警告发送给联系人(通过EMAIL,短信,用户定义方式)
具备定义事件句柄功能,它可以在主机或服务的事件发生时获取更多问题定位
自动的日志回滚
可以支持并实现对主机的冗余
可选的web界面用于查看当前的网络状态,通知和故障历史,日志文件等
1.3 Nagios能做什么
监控windows主机
监控linux/unix主机
监控netware服务器
监控路有其和交换机
监控打印机
监控公众服务平台
2,nagios安装
2.1 安装要求
硬件:nagios对硬件没有特别的要求,一般服务器及主机均可运行
系统:nagios所需要的运行的硬件必须可以运行linux(或unix),并且有c编译器
软件:web服务(apache),
thomas boutell制作的GD库与开发库(gd gd-devel glibc glibc-common)
GCC编译器 (gcc)
2.2 软件准备
Nagios主程序:nagios-cn-3.3.1.tar.tar.gz(本文档实用)
nagios插件:nagios-plugins-1.4.15.tar.gz nrpe-2.12.tar.gz
英文版下载地址:http://www.nagios.org/download/
中文版下载地址:http://sourceforge.net/projects/nagios-cn/
2.3 安装nagios
2.3.1 nagios用户和组
#adduser nagios
#mkdir /usr/local/nagios
#chown nagios.nagios /usr/local/nagios
2.3.2 编译
#tar zxvf nagios.3.3.1.tar.gz
#cd nagios
#./configure —prefix=/usr/local/nagios --with-nagios-user=nagios --with-nagios-group=nagios --with-command-group=nagios
2.3.3 安装
#make all
#make install 安装主要的程序、CGI、HTML文件等等。
#make install-commandmode 赋予外部命令访问nagios配置文件的权限
#make install-config 将nagios的配置文件的例子复制到nagios的安装目录下
#make install-init 将nagios做成一个运行脚本,放入init.d中,使nagios可以随系统的开机而启动
2.3.4 全部编译安装完毕后检查
# ls /usr/local/nagios
bin etc libexec sbin share var
查看是否有上述几个目录,如果存在说明nagios安装成功。
下面来说明这五个目录的功能:
bin nagios执行程序所在目录,这个目录只有一个文件nagios
etc nagios配置文件位置,初始安装完成后,只有几个*.cfg-sample文件
libexec nagios程序脚本文件
sbin nagios的Cgi文件所在目录,也就是执行外部命令所需文件所在的目录
share nagios网页文件所在的目录
var nagios日志文件、spid等文件所在的目录
2.3.5将nagios信息加到apache中
#vim /etc/httpd/conf/httpd.conf
在配置文件的最后加入以下内容:
<Directory "/usr/local/nagios/sbin">
Options ExecCGI
AllowOverride None
Order allow,deny
Allow from all
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /usr/local/nagios/etc/htpasswd.users
Require valid-user
</Directory>
<Directory "/usr/local/nagios/share">
Options None
AllowOverride None
Order allow,deny
Allow from all
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /usr/local/nagios/etc/htpasswd.users
Require valid-user
</Directory>
2.3.5 生成http用户验证文件,用户名为nagios
#/usr/bin/htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
2.3.6 重启apache
#service httpd restart
2.3.7 查看nagios监控页面
打开浏览器输入地址:http://127.0.0.1/nagios/
如果得到以下界面说明前面的安装没有问题
2.3.8配置nagios
# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg 来验证程序能否正常运行
分析nagios的配置文件
# vi /usr/local/nagios/etc/localhost.cfg
把下面的几行注释去掉
cfg_file=/usr/local/nagios/etc/objects/contactgroups.cfg
//联系组配置文件路径
cfg_file=/usr/local/nagios/etc/objects/contacts.cfg
//联系人配置文件路径
cfg_file=/usr/local/nagios/etc/objects/hostgroups.cfg
//主机组配置文件路径
cfg_file=/usr/local/nagios/etc/objects/hosts.cfg
//主机配置文件路径
cfg_file=/usr/local/nagios/etc/objects/services.cfg
//服务配置文件路径
cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg
//监视时段配置文件路径
将check_external_command=0修改为1 作用是允许执行web界面下重启nagios,停止主机/服务检查等操作
cgi.cfg文件 控制相关的cgi脚本
#vi /usr/local/nagios/etc/cgi.cfg
确保use_authentication=1
其中的各项authorized_for_都是定义登录的用户权限
全部设置为nagiosadmin即可,也可设置你自己htpasswd生成设置的用户名
objects目录下的配置文件如下:
commands.cfg localhost.cfg printer.cfg templates.cfg windows.cfg
contacts.cfg switch.cfg timeperiods.cfg
commands.cfg 是定义了各种命令的功能
localhost.cfg 是默认的主机监控策略
contacts.cfg 中定义了发送警报的方式和联系人信息
templates.cfg 定义host,service中use使用的模板
timeperiods.cfg 各种报警时间的定义
等等
很多配置文件都已经默认设置好了基础的监控行为,如果有需要可以随时更改
具体配置实例可见nagios的/usr/local/nagios/etc/localhost.cfg
2.4 nagios插件的使用
2.4.1 nagios插件安装
#tar zxvf nagios-plugins-1.4.15.tar.gz
#cd /nagios-plugins-1.4.15
#./configure —prefix=/usr/local/nagios nagios-plugins是安装到nagios的主目录下的
#make
#make install
#ls /usr/local/nagios/libexec (检查插件是否安装成功,如果安装成功可以在该目录下看到很多可执行程序)
检查工作:
再次检查nagios主目录的属主,一定要是nagios,不能是root
如果属主不正确
#chown -R nagios.nagios /usr/local/nagios
nagios的用户不需要登录shell 所以如果为了安全
#vi /etc/passwd
nagios:x:500:500::/home/nagios:/bin/bash
修改为:
nagios:x:500:500::/home/nagios:/bin/nologin
那么nagios用户则不能够登录shell
2.4.2修改nagios的插件配置
1) 服务器端的客户端软件安装完成后需要在插件的命令行增加
# vim /data/nagios/etc/objects/commands.cfg
#check nrpe
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
2) 如果在/usr/local/nagios/libexec中没有check_nrpe.需要将安装完nrpe后,复制nrpe的libexec中得check_nrpe到/usr/local/nagios/libexec
3,运行nagios
3.1检查配置文件
验证配置文件的正确性在命令行模式输入以下命令:
# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
….
Total Warnings: 0
Total Errors: 0
假如有报错会将报错信息显示在出来
3.2启动和停止nagios
3.2.1启动nagios
脚本启动:
#/etc/rc.d/init.d/nagios start
手工启动:
#/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
重启ngios:
#/etc/rc.d/init.d/nagios reload
停止nagios:
#/etc/rc.d/init.d/nagios stop 或者直接杀死进程
4,客户端安装设置
4.1 客户端安装
4.1.1增加nagios用户
#useradd nagios
4.1.2 安装xinetd
#yum install xinetd
4.1.3安装nagios插件
#tar -zxvf nagios-plugins-1.4.14.tar.gz
#cd nagios-plugins-1.4.14
#./configure --prefix=/usr/local/nagios
#make
#make install
4.1.4安装nrpe
#tar -zxvf nrpe-2.12.tar.gz
#cd nrpe-2.12
#./configure --prefix=/usr/local/nagios
#make all
#make install-plugin
#make install-daemon
#make install-daemon-config
#make install-xinetd nrpe安装为xinetd服务
4.1.5 修改权限
# chown -R nagios.nagios /usr/local/nagios
4.2修改配置
1)编辑nrpe配置文件,增加监控机地址:
#vi /etc/xinetd.d/nrpe
only_from = 127.0.0.1 10.1.1.14
注意,这里必须以空格分隔。
2)修改services文件,增加端口
#vi /etc/services
nrpe 5666/tcp #NRPE
3) 修改配置文件nrpe.cfg
#vim /usr/local/nagios/etc/nrpe.cfg
allowed_hosts= 127.0.0.1
修改为:allowed_hosts= 127.0.0.1,10.1.1.14
4) 重启xinetd服务
#service xinetd restart
5) 查看服务是否启动
# netstat -antp|grep 5666
tcp 0 0 :::5666 :::* LISTEN 16690/xinetd
6)测试nrpe服务
#/usr/local/nagios/libexec/check_nrpe -H localhost
NRPE v2.12
注意,如果出现Connection refused by host 需要安装yum intall openssl*
附件:
以下为check_mem.pl
#! /usr/bin/perl -w
#
# check_mem v1.4 plugin for nagios
#
# uses the output of `free` to find the percentage of memory used
#
# Copyright Notice: GPL
#
# History:
#
# v1.4 Garrett Honeycutt - [email protected]
# + Fixed PerfData output to adhere to standards and show crit/warn values
#
# v1.3 Rouven Homann - [email protected]
# + Memory installed, used and free displayed in verbose mode
# + Bit Code Cleanup
#
# v1.2 Rouven Homann - [email protected]
# + Bug fixed where verbose output was required (nrpe2)
# + Bug fixed where perfomance data was not displayed at verbose output
# + FindBin Module used for the nagios plugin path of the utils.pm
#
# v1.1 Rouven Homann - [email protected]
# + Status Support (-c, -w)
# + Syntax Help Informations (-h)
# + Version Informations Output (-V)
# + Verbose Output (-v)
# + Better Error Code Output (as described in plugin guideline)
#
# v1.0 Garrett Honeycutt - [email protected]
# + Initial Release
#
use strict;
use FindBin;
use lib $FindBin::Bin;
use utils qw($TIMEOUT %ERRORS &print_revision &support);
use vars qw($PROGNAME);
use Getopt::Long;
use vars qw($opt_V $opt_h $verbose $opt_w $opt_c);
$PROGNAME = "check_mem";
sub print_help ();
sub print_usage ();
Getopt::Long::Configure('bundling');
GetOptions ("V" => \$opt_V, "version" => \$opt_V,
"h" => \$opt_h, "help" => \$opt_h,
"v" => \$verbose, "verbose" => \$verbose,
"w=s" => \$opt_w, "warning=s" => \$opt_w,
"c=s" => \$opt_c, "critical=s" => \$opt_c);
if ($opt_V) {
print_revision($PROGNAME,'$Revision: 1.4 $');
exit $ERRORS{'UNKNOWN'};
}
if ($opt_h) {
print_help();
exit $ERRORS{'UNKNOWN'};
}
print_usage() unless (($opt_c) && ($opt_w));
my $critical = $1 if ($opt_c =~ /([0-9]+)/);
my $warning = $1 if ($opt_w =~ /([0-9]+)/);
my $verbose = $verbose;
my ($mem_percent, $mem_total, $mem_used) = &sys_stats();
my $free_mem = $mem_total - $mem_used;
if ($mem_percent>$critical) {
if ($verbose) { print "CRITICAL: $mem_percent\% Used Memory - Total: $mem_total MB, used: $mem_used MB, free: $free_mem MB | MemUsed=$mem_percent\%;$warning;$critical\n";}
else { print "CRITICAL: $mem_percent\% Used Memory | MemUsed=$mem_percent\%;$warning;$critical\n";};
exit $ERRORS{'CRITICAL'};
} elsif ($mem_percent>$warning) {
if ($verbose) { print "WARNING: $mem_percent\% Used Memory - Total: $mem_total MB, used: $mem_used MB, free: $free_mem MB | MemUsed=$mem_percent\%;$warning;$critical\n";}
else { print "WARNING: $mem_percent\% Used Memory | MemUsed=$mem_percent\%;$warning;$critical\n";};
exit $ERRORS{'WARNING'};
} else {
if ($verbose) { print "OK: $mem_percent\% Used Memory - Total: $mem_total MB, used: $mem_used MB, free: $free_mem MB | MemUsed=$mem_percent\%;$warning;$critical\n"; }
else { print "OK: $mem_percent\% Used Memory | MemUsed=$mem_percent\%;$warning;$critical\n";};
exit $ERRORS{'OK'};
}
sub sys_stats {
my ($mem_total, $mem_used);
chomp($mem_total = `free -mt | grep Mem | awk '{print \$2}'`);
chomp($mem_used = `free -mt | grep cache | tail -1 | awk '{print \$3}'`);
my $mem_percent = ($mem_used / $mem_total) * 100;
return (sprintf("%.0f",$mem_percent),$mem_total,$mem_used);
}
sub print_usage () {
print "Usage: $PROGNAME [-w <warn>] [-c <crit>] [-v] [-h]\n";
exit $ERRORS{'UNKNOWN'} unless ($opt_h);
}
sub print_help () {
print_revision($PROGNAME,'$Revision: 1.4 $');
print "Copyright (c) 2005 Garrett Honeycutt/Rouven Homann\n";
print "\n";
print_usage();
print "\n";
print "-w <warn> = Memory usage to activate a warning message.\n";
print "-c <crit> = Memory usage to activate a critical message.\n";
print "-v = Verbose Output.\n";
print "-h = This screen.\n\n";
support();
}