Nagios安装部署、错误详细解析与Cacti整合文档超精细版本
1.文档更新记录
时间
|
修改人
|
版本号
|
修改说明
|
2010.09.07
|
Kevin
|
1.0.0
|
建立文档
|
2010.09.24
|
Kevin
|
1.0.1
|
添加报警设置
|
2011.01.07
|
Kevin
|
1.0.2
|
添加每日健康检查报警机制
|
2011.02.16
|
Kevin
|
1.0.3
|
更新文档生成pdf文档
|
2011.02.22
|
Kevin
|
1.0.4
|
添加新的troubleshooting项
|
2011.03.10
|
Kevin
|
1.0.5
|
添加
nagios
飞信机器人报警
|
2011.05.31
|
Kevin
|
1.0.8
|
更新
troubleshooting
|
3.Nagios的安装 3.1.安装基础支持套件和添加用户
#yum install httpd
#yum install gcc
#yum install glibc glibc-common
#yum install gd gd-devel
#yum install php
nagios3.2.0
以后的版本必须安装php,nagios页面访问才正常
#/usr/sbin/useradd -m nagios
添加一个名为nagios的用户用以专门跑nagios
#passwd nagios
设置密码
#/usr/sbin/groupadd nagcmd
添加nagcmd用户组,用以通过web页面提交外部控制命令
#/usr/sbin/usermod -a -G nagcmd nagios
将nagios用户加入nagcmd组
#/usr/sbin/usermod -a -G nagcmd apache
将apache用户加入nagcmd组
|
#mkdir ~/downloads#cd ~/downloads# wget http://nchc.dl.sourceforge.net/sourceforge/nagios/nagios-3.2.1.tar.gz # wget http://nchc.dl.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.15.tar.gz#cd ~/downloads#tar xzf nagios-3.2.1.tar.gz#cd nagios-3.2.1#./configure --with-command-group=nagcmd#make all#make install#make install-init#make install-config#make install-commandmode 这时nagios基本已经安装完成,默认安装后的配置文件用于启动nagios是没有问题的。#vi usr/local/nagios/etc/objects/contacts.cfg修改nagiosadmin这行其中的邮件地址为你的email地址,以将报警邮件发到你的邮箱#make install-webconf安装nagios的web接口#htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin 设置登陆web界面时HTTP验证的账号密码#service httpd restart启动apache |
#cd ~/downloads
#tar xzf
nagios-plugins-1.4.15.tar.gz
#cd
nagios-plugins-1.4.15
#./configure --with-nagios-user=nagios --with-nagios-group=nagios
#make
#make install
安装插件,安装后所有插件命令将被安装到/usr/local/nagios/libexec 目录下 |
#cd ~/downloads
#wget http://nagios.manubulon.com/nagios-snmp-plugins.1.1.1.tgz
#tar xzf nagios-snmp-plugins.1.1.1.tgz
#cd nagios_plugins
配置check_snmp_int.pl这些插件的使用时需要配置cpan,CPAN是Comprehensive Perl Archive Network的缩写.。它是一个巨大的Perl软件收藏库,收集了大量有用的Perl模块(modules)及其相关的文件。这里主要是使用Perl-Net-SNMP模块。有两种方式安装:
A)通过CPAN来安装
#perl -MCPAN -e shell
cpan> install Net::SNMP
B) 手工安装
首先去官方网站 www.cpan.org下载以下几个模块
Crypt::DES
Digest::MD5
Digest::SHA1
Digest::HMAC
Net::SNMP
下载后对于每个模块依次按照下面的方式安装
#tar zxf <module>.tar.gz <module>
表示模块名,具体请按上面提到的模块替换
#cd <module> <module>
表示模块名,具体请按上面提到的模块替换
#perl Makefile.pl
#make test
#make install
注意:Net::SNMP模块必须在最后安装。至此Net::SNMP手动安装完毕
#./install.sh 执行nagios-snmp-plugins 安装脚本, 执行之后会将插件命令安装到/usr/local/nagios/libexec 下 |
至此nagios基本已经安装完毕,但这时还不能马上启动nagios,需要以下设置。
#chkconfig --add nagios
将nagios
添加到服务中
#chkconfig nagios on
设置服务为自启动
#/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
检测nagios 的配置是否正确,在后面配置nagios 过程中我们为了检测配置的是否正确需要不断执行该命令来检查配置文件。
#service nagios start
启动nagios
需要注意的是,Centos默认打开了selinux并且运行于强制安全模式,这将导致在打开nagios的web界面时会出现Internet Server Error的错误。
#getenforce
查看是否运行于强制模式,结果为1
表示是
#setenforce 0
更改selinux
运行于宽容模式
但是这个设置重启后就会失效,如需要重启后保持该设置需要修改/etc/sysconfig/selinux,将其中的SELINUX= enforcing更改为SELINUX= permissive并重启系统。当然你也可以改成disable禁用selinux。
也可以不更改selinux的运行模式,解决办法为:
#chcon -R -t httpd_sys_content_t /usr/local/nagios/sbin/
#chcon -R -t httpd_sys_content_t /usr/local/nagios/share/
|
4.2.nagios的配置文件
cfg_file=/usr/local/nagios/etc/objects/commands.cfg
包含配置文件,下同
cfg_file=/usr/local/nagios/etc/objects/contacts.cfg
cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg
cfg_file=/usr/local/nagios/etc/objects/templates.cfg
cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
cfg_file=/usr/local/nagios/etc/objects/switch.cfg
cfg_dir=/usr/local/nagios/etc/services
包含配置目录,目录下所有cfg
文件将被包含;下同
cfg_dir=/usr/local/nagios/etc/hosts
cfg_dir=/usr/local/nagios/etc/commands
cfg_dir=/usr/local/nagios/etc/switches
cfg_dir=/usr/local/nagios/etc/routers
|
[root@localhost etc]# /usr/local/nagios/libexec/check_snmp_storage.pl -H 192.168.1.200 -C mypublic -2 -m "^Virtual Memory$" -w 70 -c 90
Virtual Memory: 21%used(531MB/2472MB) (<70%) : OK
|
[root@localhost etc]#
cat resource.cfg |grep -v '#'| sed /^$/d 查看resource.cfg配置
$USER1$=/usr/local/nagios/libexec
$USER7$=-C mypublic -2
|
define command{
command_name
check_snmp_storage
command_line
$USER1$/check_snmp_storage.pl -H $HOSTADDRESS$ $USER7$ $ARG1$ -w $ARG2$ -c $ARG3$
}
|
define host{
use
windows-server
定义使用的模板
host_name
web83
定义主机名为web83
alias
web server on 111.83
主机别名
address
192.168.1.200
主机IP
地址
hostgroups
linuxtoneweb
将该主机归到linuxtoneweb
这个组,如果要归到多个组里,用逗号分隔组名
}
|
define hostgroup{
hostgroup_name
linuxtoneweb
alias
linuxtone web servers
}
|
define hostgroup{
hostgroup_name
linuxtoneweb
alias
linuxtone web servers
members web83 设置该组的成员,需要是在host 中定义的主机名,多个成员请用逗号分隔
}
|
define service {
hostgroup_name
linuxtone,linuxtoneweb,database
定义监控对象
name
memory
设置服务名
service_description
check memory
服务描述
check_period
24x7
监控周期设置
max_check_attempts
4
最大检测尝试次数
normal_check_interval
3
正常检测间隔时间
retry_check_interval
2
重试检测间隔时间
contact_groups
admins
报警联系组
notification_interval
10
通知间隔
notification_period
24x7
通知周期设置
notification_options
w,u,c,r
定义什么状态时报警
check_command
check_snmp_storage!-m "^Virtual Memory$"!70!90
}
|
# Windows host definition template - This is NOT a real host, just a template!
define host{
name
windows-server ; The name of this host template
use
generic-host ; Inherit default values from the generic-host template
check_period
24x7 ; By default, Windows servers are monitored round the clock
check_interval
5 ; Actively check the server every 5 minutes
retry_interval
1 ; Schedule host check retries at 1 minute intervals
max_check_attempts
10 ; Check each server 10 times (max)
check_command
check-host-alive ; Default command to check if servers are "alive"
notification_period
24x7 ; Send notification out at any time - day or night
notification_interval
30 ; Resend notifications every 30 minutes
notification_options
d,r ; Only send notifications for specific host states
contact_groups
admins ; Notifications get sent to the admins by default
hostgroups
windows-servers ; Host groups that Windows servers should be a member of
register
0 ; DONT REGISTER THIS - ITS JUST A TEMPLATE
}
|
define contact{
contact_name
nagiosadmin ; Short name of user
use
generic-contact ; Inherit default values from generic-contact template (defined above)
alias
Nagios Admin ; Full name of user
email
[email protected] ; <<***** CHANGE THIS TO YOUR EMAIL ADDRESS ******
address1这里是个自定义的变量设置,定义了一个SMS地址用以接收SMS报警信息,联系人的自定义变量只能使用address1-address6;通过这个设置你可以实现多种报警方式,如电话,手机短信等,通过在这里设置你的电话,手机号,然后到报警命令定义里定义一个命令即可。
[email protected] ;
}
|
define contact{
name
generic-contact ; The name of this contact template
service_notification_period
24x7 ; service notifications can be sent anytime
host_notification_period
24x7 ; host notifications can be sent anytime
service_notification_options
w,u,c,r,f,s ; send notifications for all service states, flapping events, and scheduled downtime events
host_notification_options
d,u,r,f,s ; send notifications for all host states, flapping events, and scheduled downtime events
service_notification_commands
notify-service-by-email,notify-service-by-sms ; send service notifications via email
host_notification_commands
notify-host-by-email,notify-host-by-sms ; send host notifications via email
register
0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL CONTACT, JUST A TEMPLATE!
}
|
define timeperiod{
timeperiod_name 24x7
alias
24 Hours A Day, 7 Days A Week
sunday
00:00-24:00
monday
00:00-24:00
tuesday
00:00-24:00
wednesday
00:00-24:00
thursday
00:00-24:00
friday
00:00-24:00
saturday
00:00-24:00
}
|
define command{
command_name
notify-host-by-email
command_line
/usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /bin/mail -s "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **" $CONTACTEMAIL$
}
# 'notify-service-by-email' command definition
define command{
command_name
notify-service-by-email
command_line
/usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$" | /bin/mail -s "** $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$
}
# 'notify-host-by-sms' command definition
define command{
command_name
notify-host-by-sms
command_line
php /usr/local/nagios/share/sms/smssendmsg.php $CONTACTADDRESS1$ "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n"
}
# 'notify-service-by-sms' command definition
define command{
command_name
notify-service-by-sms
command_line
php /usr/local/nagios/share/sms/smssendmsg.php $CONTACTADDRESS1$ "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$"
}
|
5.安装cacti
安装rrdtool,rrdtool不能直接通过yum安装,可以加入Dag RPM Repository以便让yum能找到rrdtool
#wget http://dag.wieers.com/rpm/packages/rpmforge-release/rpmforge-release-0.3.6-1.el5.rf.i386.rpm
#rpm -Uvh rpmforge-release-0.3.6-1.el5.rf.i386.rpm
#yum install rrdtool
安装net-snmp
#
yum -y install net-snmp net-snmp-utils net-snmp-libs php-mysql
安装cacti
#cd ~/downloads #wget http://www.cacti.net/downloads/cacti-0.8.7d.tar.gz #tar zxvf cacti-0.8.7d.tar.gz #mkdir –p /usr/local/wwwroot #cp –rf cacti-0.8.7.d /usr/local/wwwroot/cacti 下面开始导入数据库,mysql 数据库的基本安全配置这里不做说明,请参考以前的postfix 邮件安装文档中的mysql 设置部分。 #mysql –u root –p mysql>create database cacti; 创建一个数据库供cacti 使用 mysql>use cacti;
Mysql>source /usr/local/cacti/cacti.sql
导入mysql
数据库
mysql> grant all privileges on cacti.* to cacti@localhost identified by "cacti";
Query OK, 0 rows affected (0.03 sec)
添加一个数据库账号cacti
密码为cacti
用以访问cacti
库
Mysql>flush privileges; 刷新权限信息
Apache设置
#vi /etc/http/conf.d/cacti.conf 编辑cacti站点的配置文件,文件内容如下:
Alias /cacti "/var/www/html/cacti"
<Directory "/var/www/html/cacti ">
Options FollowSymLinks MultiViews
AllowOverride None
Order allow,deny
Allow from all
</Directory>
修改cacti的配置文件
#vi /usr/local/wwwroot/cacti/include/config.php
/* load up old style plugins here */
$plugins = array();
$url_path = "/cacti/";
修改其中的数据库连接信息,设置好数据库主机地址,用户,密码信息。 #chmod 777 -R rra log 至此安装完成,访问 http:// ip/cacti/install根据提示安装cacti (ip换成你对应的主机ip地址) 接下来安装cacti的插件管理 # cd ~/download/
#wget http://mirror.cactiusers.org/downloads/plugins/cacti-plugin-0.8.7d-PA-v2.4.zip #unzip cacti-plugin-0.8.7d-PA-v2.4.zip #mysql –u root –p cacti<pa.sql 导入sql 到cacti 库中 #cd files-0.8.7d/ #cp –rf * /usr/local/wwwroot/cacti 复制相关文件到cacti 下 |
6.1.安装ndoutils
首先需要安装ndoutils以将nagios的数据能导入到mysql数据库中。
#yum -y install mysql-devel 安装mysql 开发包以编译ndoutils
#wget http://nchc.dl.sourceforge.net/sourceforge/nagios/ndoutils-1.4b7.tar.gz
#tar zxvf ndoutils-1.4b7.tar.gz
#cd ndoutils
#
./configure --prefix=/usr/local/nagios LDFLAGS=-L/usr/lib --with-mysql-inc=/usr/include/mysql --with-mysql-lib=/usr/lib/mysql --enable-mysql --disable-pgsql --with-ndo2db-user=nagios --with-ndo2db-group=nagios
#make
#make install //
此命令可不用执行
#./db/installdb -ucacti -pcacti -h localhost -d cacti
#cp config/ndomod.cfg /usr/local/nagios/etc
修改nagios主配置文件
#vi /usr/local/nagios/etc/nagios.cfg 添加以下内容 check_external_commands=1
command_check_interval=-1
event_broker_options=-1
broker_module=/usr/local/nagios/bin/ndomod.o config_file=/usr/local/nagios/etc/ndomod.cfg
process_performance_data=1
添加的内容至此结束
#cd src
#
cp ndomod-3x.o ndo2db-3x log2ndo file2sock /usr/local/nagios/bin
#cd ..
#cp src/ndo2db-3x /usr/local/nagios/bin/ndo2db
# mv /usr/local/nagios/bin/ndomod-3x.o /usr/local/nagios/bin/ndomod.o //
新添加
#cp config/ndo2db.cfg /usr/local/nagios/etc
修改配置文件ndocmd.cfg和ndo2db.cfg,这里我的配置文件内容为:
[root@localhost downloads]# cat /usr/local/nagios/etc/ndomod.cfg |grep -v '^#'|sed /^$/d
instance_name=default
output_type=tcpsocket
output=localhost
tcp_port=5668
output_buffer_items=5000
buffer_file=/usr/local/nagios/var/ndomod.tmp
file_rotation_interval=14400
file_rotation_timeout=60
reconnect_interval=15
reconnect_warning_interval=15
data_processing_options=-1
config_output_options=2
[root@localhost downloads]# cat /usr/local/nagios/etc/ndo2db.cfg |grep -v '^#'|sed /^$/d
ndo2db_user=nagios
ndo2db_group=nagios
socket_type=tcp
socket_name=/usr/local/nagios/var/ndo.sock
tcp_port=5668
db_servertype=mysql
db_host=localhost
db_port=3306
db_name=cacti
db_user=cacti
db_pass=cacti
db_prefix=npc_
max_timedevents_age=1440
max_systemcommands_age=10080
max_servicechecks_age=10080
max_hostchecks_age=10080
max_eventhandlers_age=44640
debug_level=1
debug_verbosity=1
debug_file=/usr/local/nagios/var/ndo2db.debug
max_debug_file_size=1000000
#/usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
启动ndo2db
|
#cd ~/downloads
#wget http://www.aurore.net/projects/php-json/php-json-ext-1.2.0.tar.bz2
#tar xvjf php-json-ext-1.2.0.tar.bz2
#cd php-json-ext-1.2.0
#phpize
编译前初始化php
环境
#./configure
#make
#make install
#vi /etc/php.d/json.ini
extension=php_json.so
#cp /usr/lib/php/modules/json.so /usr/lib/php/modules/php_json.so这一步骤很关键,因为,apache否则日志报错加载不到php_json.so文件。
#/usr/sbin/httpd -k graceful
重启apache
为了验证是否成功可使用phpinfo
查看是否已支持json
|
#cd ~/downloads
#wget http://www.assembla.com/spaces/npc/documents/aUjAwmdW8r3BuPab7jnrAJ/download?filename=npc-2.0.0b.166.tar.gz
wget http://dlwt.csdn.net/fd.php?i=659714146741849&s=796b68562511c6534bfc15d7b04711f4/npc-2.0.3.tar.gz
新链接可用
#mv npc /usr/local/wwwroot/cacti/plugins/ 启用cacti的插件功能,以admin用户登陆cacti,在console中的user management里对admin的用户权限进行编辑,勾选上Plugin Management,然后到插件管理中心安装并启用NPC即可。
在'User Management'中选择'admin',在下边'Realm Permissions'中钩选'Plugin Management'。这时右侧会出现'Plugin Management'连接,进去后在'uninstalled'中安装npc,然后在'intalled'中enable npc,然后在回去admin的'Realm Permissions'中会出现'use npc',如果没有钩选就把它选中。
接着在右侧栏目中选择settings,点选npc的标签
钩选Remote Commands
Nagios Command File Path:
/usr/local/nagios/var/rw/nagios.cmd
<这个文件启动nagios后会产生,根据实际的位置写>
Nagios URL:
[url]http://yourserver/nagios/[/url]
保存就可以了。
至此安装就完成了
启动mysql httpd ndo nagios
service mysqld start
service httpd start
/usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
service nagios start
访问
[url]http://yourserver/cacti/[/url]
选择npc标签就可以看到nagios所检测的主机信息,唯一不足的是我打开这个页面的速度有些慢,需要进一步寻找原因来进行优化.npc提供的这一套界面很好看(个人感觉)。
|
6.4.测试页面
7.Nagios每日健康检查报警短信
7.1.编写检查脚本
7.2.添加crond计划
7.3.配置飞信机器人报警 7.3.1.Commands.cfg配置文件添加如下内容:
7.3.2.Contacts.cfg配置添加:
7.3.3.Templates.cfg
7.3.4.修改展示页面监控图片大小: /usr/local/nagios/etc/pnp/config.php
8.Troubleshooting 8.1.web界面修改某个服务时报错
Could not open command file '/usr/local/nagios/var/rw/nagios.cmd' for update!
The permissions on the external command file and/or directory may be incorrect. Read the FAQs on how to setup proper permissions.
An error occurred while attempting to commit your command for processing.
|
# EXTERNAL COMMAND FILE
# This is the file that Nagios checks for external command requests.
# It is also where the command CGI will write commands that are submitted
# by users, so it must be writeable by the user that the web server
# is running as (usually 'nobody').
Permissions should be set at the
# directory level instead of on the file, as the file is deleted every
# time its contents are processed.
这段话的核心意思是apache的运行用户要有对文件写的权限.权限应该设置在目录上,因为每次文件的内容被处理后文件就会被删掉
command_file=/usr/local/nagios/var/rw/nagios.cmd
|
8.2.点击host,service选项时,结果无法显示
8.3.nagios3.2.0以后,安装nagios在访问http://ip/nagios时出现如下错误提示:
8.4.出现pnp小太阳图标,点击报错如下:
8.5.安装NAGIOS时发现有Status Map、Alert Histogram打不开链接,提示找不到statusmap.cgi和histogram.cgi.
8.6.后台apache日志报错如下:
8.7.进行编译安装ndoutils-1.4b7时,报错如下:
8.8.安装后,查看/usr/local/nagios/var/nagios.log日志,报错如下:
8.9.有时开机后,后台报错如下:
8.10.访问npc插件页面时,主机图标为红色叉号:
8.11.访问点击小太阳后,报错如下: