Linux+apache+mysql+php+nagios监控服务搭建
参考了很多文档总结出来的一个比较完整的nagios服务的搭建;
nagios是一款开源监控软件,运行在LINUX/UNIX平台,其配置文件复杂关联性,网上给了一个美名"难够死"。
早期2.X版本,无须安装PHP,目前新版是3.X,需要PHP的支持。
本文安装过程是参考的田逸的《互联网运营智慧》一书,更详细的可以去查看。
http://baike.baidu.com/view/5017732.htm
#################################################################################################
##############前提条件########################################################################
nagios是基于LAMP环境(linux+apache+mysql+php)搭建的,所以首先要先把这个环境跑起来,
然后再编译安装nagios。
监控端只需把mysql软件包传上去即可,无须安装,目的是在编译naigos-plugin时,生成check_mysql命令。
监控端需要安装 nagios-3.2.1.tar.gz
nagios-plugins-1.4.14.tar.gz
nrpe-2.12.tar.gz
被监控端需要安装 nagios-plugins-1.4.14.tar.gz
nrpe-2.12.tar.gz
##############################################################################################
###############涉及的nagios配置文件################################
nagios.cfg ——主配置文件,用来修改对象配置文件
cgi.cfg —— 允许访问nagios网页平台用户配置文件
htpasswd.users —— 登陆nagios网页平台用户名和密码存放的加密文件
resource.cfg —— 用来存放nagios命令插件配置文件
nrpe.cfg —— 被监控端监控资源配置文件
commands.cfg —— 监控命令配置文件
templates.cfg —— 监控模板文件
timeperiods.cfg —— 时间模板文件
contacts.cfg —— 故障时通知的联系人配置文件
contactgroups.cfg —— 故障时通知的联系人组配置文件
hosts —— 监控主机配置文件
hostgroups —— 监控主机组配置文件
services —— 监控主机服务配置文件
####################################################################
##############
###安装步骤###
##############
准备工作:
1、更换Yum源163
# cd /etc/yum.repos.d/
# mv CentOS-Base.repo{,.bak}
# wget http://mirrors.163.com/.help/CentOS-Base-163.repo
# yum makecache
2、LAMP环境安装
apache安装
# cd /usr/local/src
# wget http://mirrors.sohu.com/apache/httpd-2.2.21.tar.bz2
# yum -y install openssl openssl-devel
# tar jxvf httpd-2.2.21.tar.bz2
# ./configure --prefix=/usr/local/apache2 --enable-so --enable-ssl --enable-rewrite --enable-deflate
# make && make install
# cd ..
遇到问题:apr版本过低导致apache make 出错 解决办法: yum -y install apr 升级下apr以及
参考:http://hi.baidu.com/cell37/blog/item/53015130ad54c9a25edf0e02.html
为了方便以后安装其它需要sql的监控,装上mysql
mysql
# wget http://mirrors.sohu.com/mysql/MySQL-5.1/mysql-5.1.57.tar.gz
# groupadd mysql
# useradd mysql -g mysql -s /sbin/nologin
# tar zxvf mysql-5.1.57.tar.gz
# cd mysql-5.1.57
# yum -y install ncurses ncurses-devel
# ./configure --prefix=/usr/local/mysql --localstatedir=/var/lib/mysql --sysconfdir=/etc --enable-thread-safe-client --with-client-ldflags=-all-static --with-mysqld-ldflags=-all-static --with-unix-socket-path=/tmp/mysql.sock --enable-assembler --without-debug --with-plugins=utf8,gbk --with-plugins=innobase
# make && make install
# cd /usr/local/mysql/share/mysql
# cp my-small.cnf /etc/my.conf
# cp mysql.server /etc/init.d/mysql.server
# chmod 755 /etc/init.d/mysql.server
# vi /etc/init.d/mysql.server
46行 basedir=/usr/local/mysql
47行 datadir=/usr/local/mysql/var
# cd /usr/local/mysql/
# chown -R mysql.mysql .
# bin/mysql_install_db --user=mysql --basedir=/usr/local/mysql --datadir=/usr/local/mysql/var
# chown -R root .
# chown -R mysql /usr/local/mysql/var
# chkconfig --add mysql.server
# chkconfig mysql.server on
# /etc/init.d/mysql.server start||stop||restart
# echo "export PATH=$PATH:/usr/local/mysql/bin" >>/etc/profile
# source /etc/profile
nagios3以后版本需要php的支持
PHP的安装
# wget http://mirrors.sohu.com/php/php-5.3.6.tar.bz2
因为需要有图片产生,所以先需要php的gd库支持
# wget ftp://217.146.241.3/pub/linux/lib/gd-2.0.33.tar.gz
# yum -y install libjpeg libjpeg-devel libpng libpng-devel freetype freetype-devel libxml2 libxml2-devel libXpm libXpm-devel
# tar zxvf gd-2.0.33.tar.gz
# cd gd-2.0.33
# ./configure --prefix=/usr/local/gd2 --with-png --with-freetype --with-jpeg
# make && make install
编译php
# tar jxvf php-5.3.6.tar.bz2
# cd php-5.3.6
# ./configure --prefix=/usr/local/php5 --with-apxs2=/usr/local/apache2/bin/apxs --with-mysql=/usr/local/mysql --with-mysql-sock=/tmp/mysql.sock --enable-mbstring=cn --enable-force-cgi-redirect --enable-ftp -with-gd -with-jpeg -with-zlib -with-png -with-freetype --disable-debug --enable-inline-optimization --enable-sockets --enable-bcmath
make && make install
增加php与apache的关联
# vi /usr/local/apache2/conf/httpd.conf
确认有这样一行LoadModule php5_module modules/libphp5.so添加:AddType application/x-httpd-php .php
修改:
67行 User nagios 改成 User nagios
68行 Group nagios 改成 Group nagios
99行 #ServerName www.example.com:80 去掉注视改成 ServerName 192.168.18.20:80
168行 DirectoryIndex index.html 改成 DirectoryIndex index.html index.php
phpinfo测试下是否成功
<? phpinfo() ?>
nagios的安装
#wget http://cdnetworks-kr-2.dl.sourceforge.net/project/nagios/nagios-3.x/nagios-3.2.3/nagios-3.2.3.tar.gz
#wget http://cdnetworks-kr-2.dl.sourceforge.net/project/nagiosplug/nagiosplug/1.4.15/nagios-plugins-1.4.15.tar.gz
# groupadd nagios
# useradd nagios -g nagios
# tar zxvf nagios-3.2.3.tar.gz
# cd nagios-3.2.3
# ./configure --prefix=/usr/local/nagios
# make all //编译
# make install //安装nagios主要程序,cgi和html文件等
# make install-init //安装nagios启动脚本
# make install-config //将配置文件的例子复制到nagios配置文件里
# make install-commandmode //配置nagios目录权限。
# mkdir -p /etc/httpd/conf.d/
# make install-webconf 因为这个命令执行的时候,会在/etc/httpd/conf.d/创建 nagios.conf文件
# cat /etc/httpd/conf.d/nagios.conf >>/usr/local/apache2/conf/httpd.conf
安装nagios插件
# tar zxvf nagios-plugins-1.4.15.tar.gz
# cd nagios-plugins-1.4.15
# ./configure --prefix=/usr/local/nagios/
# make && make install
安装完成后,会在/usr/local/nagios/ 目录下面生成libexe插件目录,nagios所有的插件都会在这个目录里面。
# chown -R nagios.nagios /usr/local/nagios/
#wget http://cdnetworks-kr-1.dl.sourceforge.net/project/nagios/nrpe-2.x/nrpe-2.12/nrpe-2.12.tar.gz
# tar zxvf nrpe-2.12.tar.gz
# cd nrpe-2.12
# ./configure && make all
# make install-plugin //安装check_nrpe插件
# make install-daemon
# make install-daemon-config
# make install-xinetd //安装xinetd脚本
安装nrpe(被监控服务器或主机)
#wget http://cdnetworks-kr-1.dl.sourceforge.net/project/nagios/nrpe-2.x/nrpe-2.12/nrpe-2.12.tar.gz
# tar zxvf nrpe-2.12.tar.gz
# cd nrpe-2.12
# ./configure && make all
# make install-plugin //安装check_nrpe插件
# make install-daemon
# make install-daemon-config
# make install-xinetd //安装xinetd脚本
# yum -y install xinetd
# vi /etc/xinetd.d/nrpe
only_from = 127.0.0.1 注释掉 allowshosts =192.168.28.175 添加监控主机,一般为nagios服务器的Ip
# vi /etc/services
添加两行
nrpe 5666/tcp
nrpe 5666/udp
启动xinetd
# /etc/init.d/xinetd start 看看5666端口是否开启
# scp /usr/local/nagios/libexec/check_nrpe root@监控主机的ip:/usr/local/nagios/libexec/
添加用户 -c选项为创建文件,创建第二个选项不需要
# /usr/local/apache2/bin/htpasswd -c /usr/local/nagios/etc/htpasswd.users nagios
修改cgi.cfg
#vi /usr/local/nagios/etc/cgi.cfg
找到这几行,后面添加 nagios用户,可以自己定义用户,使用“,”隔开
:%s /\(nagiosadmin\)/\1,nagios/
authorized_for_system_information=nagiosadmin
authorized_for_configuration_information=nagiosadmin
authorized_for_system_commands=nagiosadmin
authorized_for_all_services=nagiosadmin
authorized_for_all_hosts=nagiosadmin
authorized_for_all_service_commands=nagiosadmin
authorized_for_all_host_commands=nagiosadmin
给nagios.log权限:
# chmod 777 /usr/local/nagios/var/nagios.log
重新启动apache
# /usr/local/apache2/bin/apachectl start
重新启动nagios
# service nagios restart
添加要监控的主机和服务;
安装完成后产生以下这些文件:
localhost.cfg -----默认监控本主机和服务的文件
contacts.cfg ------默认联系人配置文件
printer.cfg ------默认监控打印机的配置文件
switch.cfg ------默认监控交换机的配置文件
templates.cfg------默认的模板文件
windows.cfg ------默认监控windows的配置文件
commands.cfg ------命令的配置文件
timeperiods.cfg----时间配置文件
基于方便管理的原则,将各个配置目标单独放在文件中,如联系人方在 contacs.cfg 中定义,现在我们来从新定义一下配置文件吧;
首先修改主配置文件nagios.cfg来定义文件生效;
# vi /usr/local/nagios/etc/nagios.cfg
注视掉localhost.cfg这一行,添加contactgroups.cfg,hosts.cfg,hostgroups.cfg,services.cfg,servicegroups.cfg这些文件。
#cfg_file=/usr/local/nagios/etc/localhost.cfg //在前面加#
cfg_file=/usr/local/nagios/etc/contacts.cfg //联系人配置文件路径
cfg_file=/usr/local/nagios/etc/contactgroups.cfg //联系人组配置文件路径
cfg_file=/usr/local/nagios/etc/commands.cfg //命令配置文件路径
cfg_file=/usr/local/nagios/etc/hosts.cfg //主机配置文件路径
cfg_file=/usr/local/nagios/etc/hostgroups.cfg //服务器组配置文件
cfg_file=/usr/local/nagios/etc/templates.cfg //模板配置文件路径
cfg_file=/usr/local/nagios/etc/timeperiods.cfg //监视时段配置文件路径
cfg_file=/usr/local/nagios/etc/services.cfg //服务配置文件
cfg_file=/usr/local/nagios/etc/servicegroups.cfg //服务组配置文件
接下来修改这些配置文件;
主机定义文件
# vi /usr/local/nagios/etc/objects/hosts.cfg
define host{
host_name Nagios-Server#设置主机的名字,该名字会出现在hostgroups.cfg 和services.cfg中。alias Nagios Server #别名
address 192.168.18.20 #主机的IP 地址
check_command check-host-alive #检查的命令
check_interval 5 #检测的时间间隔
retry_interval 1 #检测失败后重试的时间间隔
max_check_attempts 5 #最大重试次数
check_period 24x7 #检测的时段
process_perf_data 0
retain_nonstatus_information 0
contact_groups sagroup #联系组
notification_interval 30 #通知的时间间隔
notification_period 24x7 #通知的时间段
notification_options d,u,r #通知的选项
#w—报警(warning),u—未知(unkown)
#c—严重(critical),r—从异常情况恢复正常
}
define host{
host_name Nagios-Client
alias Nagios Client
address 192.168.18.40
check_command check-host-alive
check_interval 5
retry_interval 1
max_check_attempts 5
check_period 24x7
process_perf_data 0
retain_nonstatus_information 0
contact_groups sagroup
notification_interval 30
notification_period 24x7
notification_options d,u,r
}
主机组文件
# vi /usr/local/nagios/etc/objects/hostgroups.cfg
define hostgroup {
hostgroup_name Nagios-Example #主机组名字
alias Nagios Example #主机组别名
members Nagios-Server,Nagios-Client #主机组成员,用逗号隔开
}
服务定义文件
# vi /usr/local/nagios/etc/objects/services.cfg
define service {
host_name Nagios-Server #主机名
service_description check-host-alive #服务描述
check_period 24x7 #检测的时间段
max_check_attempts 4
normal_check_interval 3
retry_check_interval 2
contact_groups sagroup
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
check_command check-host-alive #调用的命令
}
define service {
host_name Nagios-Client
service_description check-host-alive
check_period 24x7
max_check_attempts 4
normal_check_interval 3
retry_check_interval 2
contact_groups sagroup
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
check_command check-host-alive
}
服务组定义文件
# vi /usr/local/nagios/etc/objects/servicegroup.cfg
define servicegroup{
servicegroup_name Host-Alive
alias Host Alive
members Nagios-Server,check-host-alive,Nagios-Client,check-host-alive
}
联系人定义文件
# vi /usr/local/nagios/etc/objects/contacts.cfg
define contact{
contact_name nagiosadmin
use generic-contact
alias System Administrator
email nagios@localhost
}
定义联系组
# vi /usr/local/nagios/etc/objects/contactgroups.cfg
define contactgroup{
contactgroup_name sagroup
alias Nagios Administrators
members nagiosadmin
}
修改命令定义文件
# vi /usr/local/nagios/etc/objects/commands.cfg
define command{
command_name check_nrpe #用于远程监控的命令
command_line /usr/local/nagios/libexec/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
给配置文件权限;
chmod -R 755 /usr/local/nagios/etc/objects/
chown nagios.nagios /usr/local/nagios/etc/objects/
检查错误
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
修改受控端nrpe配置文档
# vi /usr/local/nagios/etc/nrpe.cfg
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_/]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda1
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
command[check_swap]=/usr/local/nagios/libexec/check_swap -w 20% -c 10%
4.4 修改Nagios 服务器服务定义文件
# vi /usr/local/nagios/etc/objects/services.cfg
添加服务
define service {
host_name Nagios-Client
service_description check-users
check_period 24x7
max_check_attempts 4
normal_check_interval 3
retry_check_interval 2
contact_groups sagroup
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
check_command check_nrpe!check_users
}
define service {
host_name Nagios-Client
service_description check-load
check_period 24x7
max_check_attempts 4
normal_check_interval 3
retry_check_interval 2
contact_groups sagroup
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
check_command check_nrpe!check_load
}
define service {
host_name Nagios-Client
service_description check-/
check_period 24x7
max_check_attempts 4
normal_check_interval 3
retry_check_interval 2
contact_groups sagroup
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
check_command check_nrpe!check_/
}
添加短信与邮件报警(这里用的是139邮箱)
139邮箱申请: http://mail.10086.cn/
下载sendEmail
# wget http://caspian.dotconf.net/menu/Software/SendEmail/sendEmail-v1.56.tar.gz
解开后,将可执行文件sendEmail复制到/usr/local/bin/目录并且修改为nagios可执行
# tar xvzf sendEmail-v1.56.tar.gz
# cd sendEmail-v1.56/
# cp sendEmail /usr/local/bin
# chmod +x /usr/local/bin/sendEmail
发个邮件测试一下
/usr/local/bin/sendEmail -f
[email protected] -t 138xxxxxxx -s smtp.163.com -u "test" -m "hello" -xu old_hoodlum -xp password -l /var/log/sendEmail.log
-f --from
-t --to
-s 使用的smtp域名
-u 标题
-m 内容
-xu smtp登陆用户名
-xp smtp登陆密码
建议,nagios报警尽量不要使用本机sendmail,在大网站注册个免费邮箱发去吧,也减少了很多麻烦,比如sendmail问题导致的报警不能及时到达
以下为command中定义的发邮件命令,根据自己的实际情况修改
command.cfg中加入
define command{
command_name notify-host-by-sendEmail
command_line /usr/bin/printf "%b" "***** Nagios-BJ *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /usr/local/bin/sendEmail -f
[email protected] -t $CONTACTEMAIL$ -s smtp.163.com -u "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **" -xu old_hoodlum -xp xxxxxxxx -l /var/log/sendEmail.log
}
define command{
command_name notify-service-by-sendEmail
command_line /usr/bin/printf "%b" "***** Nagios-BJ *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTNAME$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n" | /usr/local/bin/sendEmail -f
[email protected] -t $CONTACTEMAIL$ -s smtp.163.com -u "** $NOTIFICATIONTYPE$ Service Alert: $HOSTNAME$/$SERVICEDESC$ is $SERVICESTATE$ **" -xu old_hoodlum -xp xxxxxxxx -l /var/log/sendEmail.log
}
修改contacts.cfg添加联系人邮箱
define contact{
contact_name nagiosadmin
use generic-contact
alias System Administrator
email
[email protected]
service_notification_commands notify-service-by-sendEmail
host_notification_commands notify-host-by-sendEmail
}
添加需要监控的服务(按照自己的需求添加)
监控mysql主从
被监控端
首先开启一个用于进行ab复制监控的sql用户
# mysql -u root -p
mysql > grant Replication client on *.* to 'nagios'@'%' identified by 'nagios';
mysql> flush privileges;
验证的主要命令(一般是在nagios服务器端执行)
# mysql -h 被监监控ip -unagios -pnagios -e "show slave status\G"
编写用于nagios远程监控的脚本
保存在 /usr/local/nagios/libexec/
# vi check_mysql_slave
内容:
#!/bin/sh
declare -a slave_is
slave_is=($(/usr/local/mysql/bin/mysql -unagios -pnagios -e "show slave status\G"|grep Running |awk '{print $2}'))
if [ "${slave_is[0]}" = "Yes" -a "${slave_is[1]}" = "Yes" ]
then
echo "OK -slave is running"
exit 0
else
echo "Critical -slave is error"
exit 2
fi
给予执行权限
chmod +x /usr/local/nrpe/libexec/chech_mysql_slave
chown nagios:nagios /usr/local/nrpe/libexec/chech_mysql_slave
在nrpe.cfg增加相关命令
command[check_mysql_slave]=/usr/local/nrpe/libexec/chech_mysql_slave
从nagios服务器上测试
check_nrpe -H 192.168.0.222 -c check_mysql_slave
在nagios服务器端的nagios severs.cfg 里添加监控服务
define service {
host_name Nagios-db-slave230
service_description check_mysql_slave
check_period 24x7
max_check_attempts 4
normal_check_interval 3
retry_check_interval 2
contact_groups sagroup
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
check_command check_nrpe!check_mem
}
监控内存
被监控端
1 添加插件check_mem.sh
#!/bin/bash
USAGE="`basename $0` [-w|--warning]<percent free> [-c|--critical]<percent free>"
THRESHOLD_USAGE="WARNING threshold must be greater than CRITICAL: `basename $0` $*"
calc=/tmp/memcalc
percent_free=/tmp/mempercent
critical=""
warning=""
STATE_OK=0
STATE_WARNING=1
STATE_CRITICAL=2
STATE_UNKNOWN=3
# print usage
if [[ $# -lt 4 ]]
then
echo ""
echo "Wrong Syntax: `basename $0` $*"
echo ""
echo "Usage: $USAGE"
echo ""
exit 0
fi
# read input
while [[ $# -gt 0 ]]
do
case "$1" in
-w|--warning)
shift
warning=$1
;;
-c|--critical)
shift
critical=$1
;;
esac
shift
done
# verify input
if [[ $warning -eq $critical || $warning -lt $critical ]]
then
echo ""
echo "$THRESHOLD_USAGE"
echo ""
echo "Usage: $USAGE"
echo ""
exit 0
fi
# Total memory available
total=`free -m | head -2 |tail -1 |gawk '{print $2}'`
# Total memory used
used=`free -m | head -2 |tail -1 |gawk '{print $3}'`
# Calc total minus used
free=`free -m | head -2 |tail -1 |gawk '{print $2-$3}'`
# normal values
#echo "$total"MB total
#echo "$used"MB used
#echo "$free"MB free
# make it into % percent free = ((free mem / total mem) * 100)
echo "5" > $calc # decimal accuracy
echo "k" >> $calc # commit
echo "100" >> $calc # multiply
echo "$free" >> $calc # division integer
echo "$total" >> $calc # division integer
echo "/" >> $calc # division sign
echo "*" >> $calc # multiplication sign
echo "p" >> $calc # print
percent=`/usr/bin/dc $calc|/bin/sed 's/^\./0./'|/usr/bin/tr "." " "|/usr/bin/gawk {'print $1'}`
#percent1=`/usr/bin/dc $calc`
#echo "$percent1"
if [[ "$percent" -le $critical ]]
then
echo "CRITICAL - $free MB ($percent%) Free Memory"
exit 2
fi
if [[ "$percent" -le $warning ]]
then
echo "WARNING - $free MB ($percent%) Free Memory"
exit 1
fi
if [[ "$percent" -gt $warning ]]
then
echo "OK - $free MB ($percent%) Free Memory"
exit 0
fi
2添加命令
在nrpe.cfg
command[check_mem]=/usr/local/nagios/libexec/check_mem.sh -w 10 -c 5
在nagios服务器端的servers.cfg添加服务
# vi servers.cfg
define service {
host_name Nagios-Client
service_description check_mem
check_period 24x7
max_check_attempts 4
normal_check_interval 3
retry_check_interval 2
contact_groups sagroup
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
check_command check_nrpe!check_mem.sh
}
以后还会继续添加各种服务的配置敬请关注!谢谢!!!
Ok!一个简单的nagios监控到这里完成了!接下来开始你的nagios之旅吧!哈哈!!!