Nagios-3.2安装与配置.pdf (1.1 MB
Nagios的安装部署和与Cacti的整合(linuxtone).rar (23.8 KB)
db2 DISK WARNING 10-31-2010 15:58:55 0d 2h 14m 23s 4/4 DISK WARNING - free space: / 45940 MB (96% inode=97%): /dev 8117 MB (99% inode=99%): /data 79778 MB (20% inode=98%): /usr/local 80396 MB (84% inode=99%): Load OK 10-31-2010 15:54:56 6d 1h 9m 22s 1/4 OK - load average: 2.62, 2.54, 3.00 MYSQL OK 10-31-2010 15:57:56 0d 11h 46m 22s 1/4 OK - 128 client connection threads MYSQL_FREE OK 10-31-2010 15:54:55 6d 1h 9m 23s 1/4 OK: remain 5983232 KB NET_BYTE OK 10-31-2010 15:57:56 4d 6h 21m 22s 1/4 NETBYTES OK - network byte 48174982 NET_CONN OK 10-31-2010 15:54:56 6d 1h 9m 22s 1/4 OK - socket 227 PING OK 10-31-2010 15:55:46 0d 5h 13m 32s 1/3 PING OK - Packet loss = 0%, RTA = 4.84 ms PROC OK 10-31-2010 15:54:56 6d 1h 9m 22s 1/4 PROCS OK: 108 processes SSH OK 10-31-2010 15:54:56 6d 1h 9m 22s 1/4 SSH OK - OpenSSH_4.2 (protocol 2.0) Swap OK 10-31-2010 15:54:56 6d 1h 9m 22s 1/4 SWAP OK - 100% free (10244 MB out of 10244 MB)
db过载
Status Information: | OK - 429client connection threads |
Performance Data: | threads_connected=429;300;400 |
PRO CCRITICAL 11-03-2010 15:47:56 0d 1h 11m 23s 4/4 PROCS CRITICAL: 229 processes
http://blog.chinaunix.net/uid-26719405-id-3409875.html nagios安装配置笔记
http://bbs.linuxtone.org/thread-2201-1-1.html 利用Nagios 实现监控Linux/Windows及短信报警的总结
http://bbs.linuxtone.org/thread-2268-1-1.htmlNagios 安装PNP(类似Cacti)绘图查看监控主机
http://bbs.linuxtone.org/thread-2269-1-1.html Nagios 3.x 实战解决方案相关贴
http://nagios-cn.sourceforge.net/nagios-cn/
http://bbs.linuxtone.org/thread-1281-1-1.html
http://www.labclub.com.cn/ELabClub/Article/ArticleShow.aspx?pguid=&uguid=ab6118b1-9c58-4849-a60a-508c195617b9&aguid=7040f425e9194db0af69e8c854ddb5b1
http://nagios-cn.sourceforge.net/nagios-cn/beginning.html#quickstart
Nagios是一个监视系统运行状态和网络信息的监视系统。Nagios能监视所指定的本地或远程主机以及服务,同时提供异常通知功能等
================================安装
http://www.nagios.org/download
wget http://prdownloads.sourceforge.net/sourceforge/nagios/nagios-3.2.3.tar.gz
wget http://prdownloads.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.15.tar.gz
======================
一、库文件安装
如果系统中没有这些软件可以使用yum方式安装
#yum –y install gcc glibc glibc-common gd gd-devel
另外,新版的Nagios需要PHP程序的支持,所以在安装Nagios之前最好先安装Apache+PHP
二、创建用户维护Nagios的用户和组
#/usr/sbin/useradd nagios
#passwd nagios
nagios
#/usr/sbin/usermod -a -G www nagios (#/usr/sbin/groupadd www)
三、安装Nagios和插件
安装Nagios
#tar zxvf nagios-3.2.1.tar.gz
#cd nagios-3.2.1
#./configure --prefix=/usr/local/webserver/nagios --with-command-group=www
#make all
#make install
#make install-init
#make install-config
#make install-commandmode
样例配置文件默认安装在这个目录下/usr/local/nagios/etc,这些样例文件可以配置Nagios使之正常运行,只需要做一个简单的修改...
用你擅长的编辑器软件来编辑这个/usr/local/nagios/etc/objects/contacts.cfg配置文件,更改email地址nagiosadmin的联系人定义信息中的EMail信息为你的EMail信息以接收报警内容。
vim /usr/local/webserver/nagios/etc/objects/contacts.cfg
安装Nagios的WEB配置文件到Apache的conf.d目录下
make install-webconf
创建一个nagiosadmin的用户用于Nagios的WEB接口登录。记下你所设置的登录口令,一会儿你会用到它。
htpasswd -c /usr/local/webserver/nagios/etc/htpasswd.users nagiosadmin
nagiosadmin
重启Apache服务以使设置生效。
service httpd restart
/usr/local/webserver/apache2/bin/apachectl restart
-----------------编译并安装Nagios插件
安装Nagios-plugins
#tar zxvf nagios-plugins-1.4.14.tar.gz
#cd nagios-plugins-1.4.14
#./configure --prefix=/usr/local/webserver/nagios --with-nagios-user=nagios --with-nagios-group=nagios
#make
make install
-------启动Nagios
把Nagios加入到服务列表中以使之在系统启动时自动启动
chkconfig --add nagios
chkconfig nagios on
验证Nagios的样例配置文件
/usr/local/webserver/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
如果没有报错,可以启动Nagios服务
service nagios start
配置web接口(对Nagios做授权访问)
创建认证用户
#/usr/local/webserver/apache2/bin/htpasswd -c /usr/local/webserver/nagios/etc/nagiospwd nagiosadmin
nagiosadmin
修改apache配置文件,将如下内容加入配置文件底部
#vi /usr/local/webserver/apache2/conf/httpd.conf
Scriptalias /nagios/cgi-bin /usr/local/webserver/nagios/sbin
<directory "/usr/local/webserver/nagios/sbin">
Authtype basic
Options execcgi
Allowoverride none
Order allow,deny
Allow from all
Authname "nagios access"
Authuserfile /usr/local/webserver/nagios/etc/nagiospwd
Require valid-user #必须账号登陆
</directory>
Alias /nagios /usr/local/webserver/nagios/share
<directory "/usr/local/webserver/nagios/share">
Authtype basic
Options none
Allowoverride none
Order allow,deny
Allow from all
Authname "nagios access"
Authuserfile /usr/local/webserver/nagios/etc/nagiospwd
Require valid-user #必须账号登陆
</directory>
service httpd restart
五、测试
配置完毕后重新启动Apache,然后在浏览器中访问http://www.70cars.tk/nagios 正常的话应该会弹出一个登录窗口,填写完用户名和密码后就可以看到Nagios的WEB界面了
============================监控 linux主机
http://blog.sina.com.cn/s/blog_5dc960cd0100k2dj.html
http://q.sohu.com/forum/5/topic/45812746 监控本机mysql
http://bizchen.blog.51cto.com/1802248/340771nagios监控mysql主机,nginx,cpu,网卡流量
http://www.nagios.org/download/ Get Nagios Addons
wget http://prdownloads.sourceforge.net/sourceforge/nagios/nrpe-2.12.tar.gz
一、服务器端安装NRPE
tar zxvf nrpe-2.12.tar.gz
cd nrpe-2.12
./configure --prefix=/usr/local/webserver/nagios
make all
make install-plugin //安装插件,安装完后会在/usr/local/webserver/nagios/libexec目录下多出一个check_nrpe文件
定义check_nrpe命令
#vim /usr/local/webserver/nagios/etc/objects/commands.cfg 在文件最后添加如下内容
#'check_nrpe ' command definition
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
================================== 在被监控服务器147(Linux/unix)上安装Nagios-plugins和nrpe
wget http://prdownloads.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.15.tar.gz
wget http://prdownloads.sourceforge.net/sourceforge/nagios/nrpe-2.12.tar.gz
useradd nagios
tar xzvf nagios-plugins-1.4.13.tar.gz
cd nagios-plugins-1.4.13
# Nagios-plugins默认安装到/usr/local/nagios
./configure
make
make install
chown nagios.nagios /usr/local/nagios/
chown -R nagios.nagios /usr/local/nagios/libexec/
tar xzvf nrpe-2.12.tar.gz
cd nrpe-2.12
./configure
make all
# 安装nrpe插件,本监控端可以不装
make install-plugin
# 安装nrpe守护进程
make install-daemon
# 安张nrpe配置文件
make install-daemon-config
# 修改nrpe配置文件,允许Nagios监控服务器(121.11.76.224)监控
vim /usr/local/nagios/etc/nrpe.cfg
# 多台机器用逗号隔开
allowed_hosts=127.0.0.1,121.11.76.224
# 以独立守护进程启动nrpe,也可以使用xinetd启动nrpe,具体清查看nrpe官方文档。
/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
# 开机自动启动nrpe
vim /etc/rc.d/rc.local
# 加入下面行
/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
# 检查nrpe是否安装正常
[root@wiki etc]# /usr/local/nagios/libexec/check_nrpe -H localhost
NRPE v2.12
# 返回nrpe版本说明安装没问题。
# 查看启动端口
[root@wiki ~]# netstat -tunlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN 27387/nrpe
如果有防火墙应该开放5666端口:
iptables -I eth0 -p tcp -m tcp ?dport 5666 -j ACCEPT
***********************************************************
注意:我们需要在/usr/local/nagios/etc/nrpe.cfg中定义我们用到的监控本地资源的命令。
下面的命令是默认定义的:
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/hda1
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
下面的命令是自己定义的:
# 监控交换分区的使用情况,使用超过20%时为警告状态,超过10%时为严重状态
command[check_swap]=/usr/local/nagios/libexec/check_swap -w 20% -c 10%
# 监控根分区磁盘使用情况
command[check_disk_root]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /
***********************************************************
------Nagios如何监控Linux机器
NRPE总共由两部分组成:
(1).check_nrpe插件,运行在监控主机上。
(2).NRPE daemon,运行在远程的linux主机上(通常就是被监控机)
按照上图,整个的监控过程如下:
当Nagios需要监控某个远程linux主机的服务或者资源情况时:
1).nagios会运行check_nrpe插件,我们要在nagios配置文件中告诉它要检查什么.
2).check_nrpe插件会通过SSL连接到远程的NRPE daemon.
3).NRPE daemon会运行相应的nagios插件来执行检查本地资源或服务.
4).NRPE daemon将检查的结果返回给check_nrpe插件,插件将其递交给nagios做处理.
注意:NRPE daemon需要nagios插件安装在远程被监控linux主机上,否则,daemon不能做任何的监控.
------------Nagios的配置文件
# 控制cgi访问的配置文件
/usr/local/webserver/nagios/etc/cgi.cfg
# Nagios主配置文件
/usr/local/webserver/nagios/etc/nagios.cfg
# resource.cfg定义了一些变量,以便被其他文件引用,如$USER1$
/usr/local/webserver/nagios/etc/resource.cfg
# objects是一个目录,用于定义Nagios对象
/usr/local/webserver/nagios/etc/objects/
# servers是自己创建的一个目录,Nagios可以加载一个目录下面的所有配置文件(需要在nagios.cfg中配置)
# 命令定义配置文件,里面定义的命令可以被其他文件引用
/usr/local/webserver/nagios/etc/objects/commands.cfg
# 联系人和联系人组配置文件
/usr/local/webserver/nagios/etc/objects/contacts.cfg
# 监控本地机器的配置文件
/usr/local/webserver/nagios/etc/objects/localhost.cfg
# 监控打印机的一个事例配置文件(默认未启用)
/usr/local/webserver/nagios/etc/objects/printer.cfg
# 监控路由器的一个事例配置文件(默认未启用)
/usr/local/webserver/nagios/etc/objects/switch.cfg
# 模板配置文件,在此可以定义模板,在其他文件中引用
/usr/local/webserver/nagios/etc/objects/templates.cfg
# 定义监控时间段的配置文件
/usr/local/webserver/nagios/etc/objects/timeperiods.cfg
# 监控Windows的一个事例配置文件(默认未启用)
/usr/local/webserver/nagios/etc/objects/windows.cfg
----------- 重要,新建 /usr/local/webserver/nagios/etc/servers
./servers:
# 自己创建的主机群组配置文件
hostgroup.cfg
# 自己创建的监控远程Linux主机的配置文件
wiki-l-11.cfg
-----
配置文件是怎样引用的?
用nagios主要是监控一台主机的各种信息,包括本机资源以及对外的服务等等.这些在nagios里面都是被定义为一个个的项目(nagios称之为服务,为了与主机提供的服务相区别,我这里用项目这个词),而实现每个监控项目,则需要通过commands.cfg文件中定义的命令。
为了不必重复定义一些项目,Nagios引入了一个模板配置文件(templates.cfg),将一些共性的属性定义成模板,以便于多次引用。
我们现在有一个监控项目是监控一台机器的web服务是否正常, 我们需要哪些元素呢?最重要的有下面三点:首先是监控哪台机器,然后是这个监控要用什么命令实现,最后就是出了问题的时候要通知哪个联系人。
我们首先应该在commands.cfg中定义监控远程服务和资源的命令,以及如何发送邮件的命令。大部分监控远程服务和资源的命令的命令通过/usr/local/nagios/libexec下的脚本实现,如ping命令为check_ping。
/usr/local/nagios/libexec下的脚本命令的使用发法可以通过-h参数查看 /usr/local/nagios/libexec/check_ping -h
然后我们在contacts.cfg文件中定义联系人和联系人组,在timeperiods.cfg中定义监控时间段。最后我们在服务器监控配置文件中引用前面定义的元素来监控服务器状态。
=========================下面引用配置文件中部分配置做说明:
vi /usr/local/webserver/nagios/etc/resource.cfg
# 定义$USER1$变量,设置插件路径
$USER1$=/usr/local/nagios/libexec
vi /usr/local/webserver/nagios/etc/objects/commands.cfg
# 定义check-host-alive命令
define command{
command_name check-host-alive # 命令名称
command_line $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5
}
# 上面的$USER1$和$HOSTADDRESS$引用自已定义的配置文件。变量不需现定义才能被引用。
# 自己定义check_nrpe命令,此命令后接必需接一个参数,用于告诉远程服务器上的NRPE daemon需要监控的内容,如check_swap参数为监控远程机器的交换分区。
# 'check_nrpe ' command definition
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
vi /usr/local/webserver/nagios/etc/objects/contacts.cfg
# 定义联系人
define contact{
contact_name nagiosadmin ; Short name of user
use generic-contact ; Inherit default values from generic-contact template (defined above)
alias Nagios Admin ; Full name of user
email [email protected] ; <<***** CHANGE THIS TO YOUR EMAIL ADDRESS ******
}
# 上面的generic-contact在vi /usr/local/webserver/nagios/etc/objects/templates.cfg中定义。
# 定义联系人组
define contactgroup{
contactgroup_name admins
alias Nagios Administrators
members nagiosadmin #在此可以加入多个联系人,中间用逗号隔开
}
vi /usr/local/webserver/nagios/etc/objects/timeperiods.cfg
# 定义监控的时间段
define timeperiod{
timeperiod_name 24x7 #监控所有时间段(7 24小时)
alias 24 Hours A Day, 7 Days A Week
sunday 00:00-24:00
monday 00:00-24:00
tuesday 00:00-24:00
wednesday 00:00-24:00
thursday 00:00-24:00
friday 00:00-24:00
saturday 00:00-24:00
}
vi /usr/local/webserver/nagios/etc/objects/templates.cfg
# 定义generic-contact联系人模板,并非真正的联系人,真正的联系人在contacts.cfg中定义
define contact{
name generic-contact ; The name of this contact template
service_notification_period 24x7 ; service notifications can be sent anytime
host_notification_period 24x7 ; host notifications can be sent anytime
service_notification_options w,u,c,r,f,s ; send notifications for all service states, flapping events, and scheduled downtime events
host_notification_options d,u,r,f,s ; send notifications for all host states, flapping events, and scheduled downtime events
service_notification_commands notify-service-by-email ; send service notifications via email
host_notification_commands notify-host-by-email ; send host notifications via email
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL CONTACT, JUST A TEMPLATE!
}
service_notification_period 24x7
服务出了状况通知的时间段,这个时间段就是上面在timeperiods.cfg中定义的.
host_notification_period 24x7
主机出了状况通知的时间段, 这个时间段就是上面在timeperiods.cfg中定义的
service_notification_options w,u,c,r
当服务出现w?报警(warning),u?未知(unkown),c?严重(critical),或者r?从异常情况恢复正常,在这四种情况下通知联系人.
host_notification_options d,u,r
当主机出现d?当机(down),u?返回不可达(unreachable),r?从异常情况恢复正常,在这3种情况下通知联系人
service_notification_commands notify-service-by-email
服务出问题通知采用的命令notify-service-by-email,这个命令是在commands.cfg中定义的,作用是给联系人发邮件.
host_notification_commands notify-host-by-email
同上,主机出问题时采用的也是发邮件的方式通知联系人
# 定义generic-host主机模板
define host{
name generic-host ; The name of this host template
notifications_enabled 1 ; Host notifications are enabled
event_handler_enabled 1 ; Host event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
failure_prediction_enabled 1 ; Failure prediction is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
notification_period 24x7 ; Send host notifications at any time
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
}
# 定义Linux主机模板
define host{
name linux-server ; The name of this host template
use generic-host ; This template inherits other values from the generic-host template
check_period 24x7 ; By default, Linux hosts are checked round the clock
check_interval 5 ; Actively check the host every 5 minutes
retry_interval 1 ; Schedule host check retries at 1 minute intervals
max_check_attempts 10 ; Check each Linux host 10 times (max)
check_command check-host-alive ; Default command to check Linux hosts
notification_period workhours ; Linux admins hate to be woken up, so we only notify during the day
; Note that the notification_period variable is being overridden from
; the value that is inherited from the generic-host template!
notification_interval 120 ; Resend notifications every 2 hours
notification_options d,u,r ; Only send notifications for specific host states
contact_groups admins ; Notifications get sent to the admins by default
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
}
=========================重要
# 在/usr/local/webserver/nagios/etc/nagios.cfg配置文件中开启对/usr/local/webserver/nagios/etc/servers/中配置文件的引用。
#cfg_dir=/usr/local/webserver/nagios/etc/servers 改为
cfg_dir=/usr/local/webserver/nagios/etc/servers
# 远程Linux主机监控文件,如果监控多台主机只需简单复制修改即可。
#我们应该牢记wiki-l-11.cfg用到的命令在commands.cfg中定义,在commands.cfg中定义的命令用到/usr/local/nagios/libexec下的插件(命令)。
vi /usr/local/webserver/nagios/etc/servers/wiki-l-11.cfg
define host{
use linux-server
host_name wiki
alias Docs
address 66.90.103.147
}
define service{
use generic-service
host_name wiki
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}
define service{
use generic-service
host_name wiki
service_description Root Partition
check_command check_nrpe!check_disk_root
}
define service{
use generic-service
host_name wiki
service_description Current Users
check_command check_nrpe!check_users
}
define service{
use generic-service
host_name wiki
service_description Current Load
check_command check_nrpe!check_load
}
define service{
use generic-service
host_name wiki
service_description Swap Usage
check_command check_nrpe!check_swap
}
define service{
use generic-service
host_name wiki
service_description SSH
check_command check_ssh
notifications_enabled 0
}
define service{
use generic-service
host_name wiki
service_description HTTP
check_command check_http
notifications_enabled 0
}
vi /usr/local/webserver/nagios/etc/servers/hostgroup.cfg
# 定义主机组(localhost.cfg中有类似的主机组设置,我已将其注释掉,否则可能会有冲突)
define hostgroup{
hostgroup_name linux-servers ; The name of the hostgroup
alias Linux Servers ; Long name of the group
members localhost,wiki ; Comma separated list of hosts that belong to this group
}
#define hostgroup{
# hostgroup_name windows-servers ; The name of the hostgroup
# alias Windows Servers ; Long name of the group
# members print ; Comma separated list of hosts that belong to this group
#
# 完成监控主机配置文件的配置后使用下面命令检查配置文件的正确性:
/usr/local/webserver/nagios/bin/nagios -v /usr/local/webserver/nagios/etc/nagios.cfg
-------------??????????????????????
Processing object config file '/usr/local/webserver/nagios/etc/servers/hostgroup.cfg'...
Warning: Duplicate definition found for hostgroup 'linux-servers' (config file '/usr/local/webserver/nagios/etc/servers/hostgroup.cfg', starting on line 1)
Error: Could not add object property in file '/usr/local/webserver/nagios/etc/servers/hostgroup.cfg' on line 2.
Error processing object config files!
***> One or more problems was encountered while processing the config files...
Check your configuration file(s) to ensure that they contain valid
directives and data defintions. If you are upgrading from a previous
version of Nagios, you should be aware that some variables/definitions
may have been removed or modified in this version. Make sure to read
the HTML documentation regarding the config files, as well as the
'Whats New' section to find out what has changed.
=>>>>解决:
vim /usr/local/webserver/nagios/etc/servers/hostgroup.cfg
改为:hostgroup_name admins ;
--------------------------------------
# 确定无误后重启Nagios:
service nagios restart
------------
linux主机 :/usr/local/nagios/libexec/check_nrpe -H localhost -c check_load
-----------------------------?????????增加disk检测
--------linux主机
vim /usr/local/nagios/etc/nrpe.cfg
command[check_disk]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /
#使用超过20%时为警告状态,超过10%时为严重状态
---------nagios #定义服务时 可以参考如下命令
vi /usr/local/webserver/nagios/etc/servers/wiki-l-11.cfg
define service{
use generic-service
host_name wiki
service_description DISK
check_command check_nrpe!check_disk!20%!10%
}
???????????????NRPE: Command 'check_disk' not defined
NRPE: Command 'check_swap' not defined
通常check_swap应该在/usr/lib/nagios/plugins/下
locate check_nrpe!check_load
看能不能找到这条命令,如果没有的话重新安装插件