nagios监控
Nagios是一个监视系统运行状态和网络信息的监视系统。Nagios能监视所指定的本地或远程主机以及服务,同时提供异常通知功能等。Nagios可运行在Linux/Unix平台之上,同时提供一个可选的基于浏览器的WEB界面以方便系统管理人员查看网络状态,各种系统问题,以及日志等等。
操作环境rhel6.5
1.关闭selinux
[root@server4 cacti]# vim /etc/sysconfig/selinux
--------------------
SELINUX=disabled
-------------------
作监控测试前先关掉selinux
2.下载nagios源码安装包并安装
lftp 172.25.254.53:/pub/nagios> get nagios-cn-3.2.3.tar.bz2
9638175 bytes transferred
[root@server3 ~]# tar jxf nagios-cn-3.2.3.tar.bz2
[root@server3 ~]# cd nagios-cn-3.2.3
[root@server3 nagios-cn-3.2.3]# ./configure
------------------------------------------------------------------
Boutell's GD library is required to compile the statusmap, trends
and histogram CGIs.
------------------------------------------------------------------
[root@server3 nagios-cn-3.2.3]# yum install -y gd-devel #解决依赖(若yum源中不存在,则需从第三方去下载)
Loaded plugins: product-id, subscription-manager
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
Setting up Install Process
No package gd-devel available.
Error: Nothing to do
则需下载安装包
lftp 172.25.254.53:/pub/nagios> get gd-devel-2.0.35-11.el6.x86_64.rpm
[root@server3 ~]# yum install -y gd-devel-2.0.35-11.el6.x86_64.rpm
[root@server3 nagios-cn-3.2.3]# ./configure #根据需要添加所需要的功能
---------------------------
Embedded Perl: no
---------------------------
[root@server3 nagios-cn-3.2.3]# ./configure --help|less #查看参数
------------------------
--enable-embedded-perl #可用关键字查找到该参数
------------------------
[root@server3 nagios-cn-3.2.3]# ./configure --enable-embedded-perl #根据报错,解决依赖
-----------------------------------------
Can't locate ExtUtils/Embed.pm in @INC
-----------------------------------------
[root@server3 nagios-cn-3.2.3]# yum install -y perl-ExtUtils-Embed
[root@server3 nagios-cn-3.2.3]# ./configure --enable-embedded-perl
---------------------------------------------------
General Options:
-------------------------
Nagios executable: nagios
Nagios user/group: nagios,nagios
Command user/group: nagios,nagios
--------------------------------------------------
安装过程中需要nagios用户,但系统中并不存在该用户
[root@server3 nagios-cn-3.2.3]# useradd -d /usr/local/nagios -M nagios #创建nagios用户
[root@server3 nagios-cn-3.2.3]# make all
--------------------------------
根据make all中的提示安装
-------------------------------------------------------------------
without any arguments for a list of all possible options):
make install
- This installs the main program, CGIs, and HTML files
make install-init
- This installs the init script in /etc/rc.d/init.d
make install-commandmode
- This installs and configures permissions on the
directory for holding the external command file
make install-config
- This installs *SAMPLE* config files in /usr/local/nagios/etc
You'll have to modify these sample files before you can
use Nagios. Read the HTML documentation for more info
on doing this. Pay particular attention to the docs on
object configuration files, as they determine what/how
things get monitored!
make install-webconf
- This installs the Apache config file for the Nagios
web interface
------------------------------------------------------------------
[root@server3 nagios-cn-3.2.3]#make install
[root@server3 nagios-cn-3.2.3]#make install-init #生成init 启动脚本
[root@server3 nagios-cn-3.2.3]#make install-commandmode #设置相应的目录权限
[root@server3 nagios-cn-3.2.3]#make install-config #生成模板配置文件
[root@server3 nagios-cn-3.2.3]#make install-webconf #安装此功能的前提是安装了Apache( yum install -y httpd)生成apache 配置文件
[root@server3 nagios-cn-3.2.3]# cd /etc/httpd/conf.d/
[root@server3 conf.d]# htpasswd -m /usr/local/nagios/etc/htpasswd.users nagiosadmin ##为apache 创建一个登陆用户,注意用户名是nagiosadmin,该用户名是在/usr/local/nagios/etc/cgi.cfg 注意:第一次添加用户用-c 选项,以后再添加千万别在用这个选项了,会覆盖以前的所有用户。(因为htpasswd.users中已有用户,所以没写-c参数)
New password:
Re-type new password:
Updating password for user nagiosadmin
[root@server3 libexec]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg #检测是否存在语法错误
[root@server3 libexec]# /etc/init.d/nagios start #若没有错误启动nagios服务
[root@server3 libexec]# chkconfig nagios on #nagios添加到启动管理程序中
3.登陆nagios(http://nagios服务器IP/nagios/)
前提时开启Apache,这时可在浏览器上查看nagios前端界面( 172.25.254.3/nagios),此时登录(登录时的用户密码就是htpasswd创建的用户和密码)后很多状态都是处于未绝或紧急状态,这很正常,这是因为nagios功能的实现是通过插件来完成的,看下面的操作:
4.安装nagios插件
lftp 172.25.254.53:/pub/nagios> get nagios-plugins-2.0.3.tar.gz
2659772 bytes transferred
[root@server3 ~]# tar zxf nagios-plugins-2.0.3.tar.gz
[root@server3 ~]# cd nagios-plugins-2.0.3
[root@server3 nagios-plugins-2.0.3]# ./configure #根据需要添加Options
------------------------------------------------------------------------
--enable-perl-modules: no
--with-cgiurl: /nagios/cgi-bin
--with-trusted-path: /bin:/sbin:/usr/bin:/usr/sbin
--enable-libtap: no
-----------------------------------------------------------------------
[root@server3 nagios-plugins-2.0.3]# ./configure --enable-perl-modules --enable-libtap
[root@server3 nagios-plugins-2.0.3]# make && make install #编译安装
[root@server3 nagios-plugins-2.0.3]# cd /usr/local/nagios/libexec/
[root@server3 libexec]# chown -R nagios.nagios * #修改目录文件权限
[root@server3 etc]# pwd
/usr/local/nagios/etc
[root@server3 etc]# vim nagios.cfg #配置主配置文件
-----------------------------------------------------------
30 cfg_file=/usr/local/nagios/etc/objects/commands.cfg
31 cfg_file=/usr/local/nagios/etc/objects/contacts.cfg
32 cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg
33 cfg_file=/usr/local/nagios/etc/objects/templates.cfg
34 cfg_file=/usr/local/nagios/etc/objects/hosts.cfg #主机定义文件
35 cfg_file=/usr/local/nagios/etc/objects/services.cfg #服务定义文件
38 #cfg_file=/usr/local/nagios/etc/objects/localhost.cfg #一定将此行注释掉
-----------------------------------------------------------
[root@server3 objects]# pwd
/usr/local/nagios/etc/objects
[root@server3 objects]# cp localhost.cfg hosts.cfg
[root@server3 objects]# cp localhost.cfg services.cfg
[root@server3 objects]# vim hosts.cfg #配置主机定义文件,定义要监控的对象,这里定义的“host_name”被应用到其他的所有配置文件中,这是配置nagios必须修改的文件
------------------------------------------------------------------------
define host{
use linux-server
host_name server3.example.com
alias Manager
# parents MainSwitch
address 172.25.254.3
icon_image server.gif
statusmap_image server.gd2
2d_coords 500,200
3d_coords 500,200,100
}
# Define an optional hostgroup for Linux machines (主机组)
define hostgroup{
hostgroup_name linux-servers ; The name of the hostgroup
alias Linux Servers ; Long name of the group
members * ; Comma separated list of hosts that belong to this group
}
------------------------------------------------------------------------
[root@server3 objects]# vim services.cfg(将文件中所有的host删除或注释掉,然后将全文的田朝阳家用机用主机名替换掉用:%s/田朝阳家用机/server3.example.com/g这条命令)
-----------------------------------------------------------------------
define servicegroup{
servicegroup_name 系统负荷检查
alias 负荷检查
members server3.example.com,进程总数,server3.example.com,登录用户数,server3.example.com,根分区,server3.example.com,交换空间利用率
}
#将所有的host删除掉,只留一个servicegroup,其余的service都保留
------------------------------------------------------------------------
[root@server3 etc]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg #查看配置文件是否正确(若有错误,则根据提示修改,直到没错误才能启动那个ios服务)
[root@server3 etc]# /etc/init.d/nagios reload #重载服务
5.登陆到nagios的前端
然后到web浏览器上发现有些服务被禁用,发现也添加不了,这是因为apache并不属于nagios用户和组,只有属于nagios用户和组的才有权限添加其他服务
[root@server3 etc]# usermod -G nagios apache #将apache附加到nagios组中
[root@server3 etc]# id apache
uid=48(apache) gid=48(apache) groups=48(apache),1001(nagios)
[root@server3 etc]# /etc/init.d/httpd restart
这时就可直接将禁止的服务设置为启动项
6.监控远程主机
首先监控远程主机的mysql
(1)在远程主机上安装数据库
[root@server4 ~]# yum install mysql-server -y
[root@server4 ~]# /etc/init.d/mysqld start
[root@server4 ~]# mysql_secure_installation #初始化数据库
[root@server4 ~]# mysql -pwestos
mysql> create database nagios; #创建数据库
Query OK, 1 row affected (0.00 sec)
mysql> grant all on nagios.* to nagios@'172.25.254.3' identified by 'westos'; #为数据库用户授权
Query OK, 0 rows affected (0.00 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
(2)在nagios服务端看是否能检测到远程主机的数据库
[root@server3 libexec]# ./check_mysql -H 172.25.254.4 -n
MySQL OK - Version: 5.1.71 (protocol 10) #出现版本说明检测成功
[root@server3 libexec]# ./check_mysql -H 172.25.254.4 -u nagios -p westos -d nagios #出现下面的信息说明检测成功
Uptime: 764 Threads: 1 Questions: 34 Slow queries: 0 Opens: 16 Flush tables: 1 Open tables: 9 Queries per second avg: 0.44|Connections=14c;;; Open_files=18;;; Open_tables=9;;; Qcache_free_memory=0;;; Qcache_hits=0c;;; Qcache_inserts=0c;;; Qcache_lowmem_prunes=0c;;; Qcache_not_cached=0c;;; Qcache_queries_in_cache=0;;; Queries=34c;;; Questions=34c;;; Table_locks_waited=0c;;; Threads_connected=1;;; Threads_running=1;;; Uptime=764c;;;
(3)检测成功后在监控端进行配置,即nagios服务端
[root@server3 objects]# vim commands.cfg #配置命令定义文件(命令定义文件是nagios 中很重要的配置文件,所有在hosts.cfg 还是services.cfg 使用的命令都必须在命令定义文件中定义才能使用。)
-----------------------------------------------------------------------
在文件命令定义的末尾添加命令定义
# 'check_mysql' command definition
define command{
command_name check_mysql
command_line $USER1$/check_mysql -H $HOSTADDRESS$ -u $ARG1$ -p $ARG2$ -d$ARG3$
}
#其中-u 用户 -p 密码 -d 数据库
------------------------------------------------------------------------
[root@server3 objects]# vim hosts.cfg #在原有的基础上添加主机
-------------------------------------------------------
define host{
use linux-server
host_name server4.example.com
alias Mysql
parents server3.example.com #上层监控端
address 172.25.254.4
icon_image server.gif
statusmap_image server.gd2
2d_coords 400,300
3d_coords 400,300,100
}
---------------------------------------------------------
[root@server3 objects]# vim services.cfg #在原有服务的基础上为被监控端的主机添加服务
-----------------------------------------------------------------------
#文件末尾添加
###############################server4#################################
define service{
use local-service
host_name server4.example.com
service_description MYSQL
check_command check_mysql!nagios!westos!nagios
}
------------------------------------------------------------------------
[root@server3 objects]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg #语法检测
[root@server3 objects]# /etc/init.d/nagios reload
然后到浏览器上登录到nagios前端界面可以查看到远程主机mysql的状态
7.如何加载远程主机的根分区、登录用户、系统负荷等,也就是如何让监控主机监控的被监控主机如何像他本身一样可以监控到主机的多个模块
(1)被监控端的设置
lftp 172.25.254.53:/pub/nagios> get nagios-plugins-2.0.3.tar.gz
2659772 bytes transferred
创建用户因在安装时需要此用户
#我将uid、gid设置为1001的原因是我的nagios服务端的nagios用户的uid、gid为1001
[root@server4 ~]# groupadd -g 1001 nagios
[root@server4 ~]# useradd -u 1001 -g 1001 -d /usr/local/nagios -M nagios
a、安装nagios插件
[root@server4 ~]# tar zxf nagios-plugins-2.0.3.tar.gz
[root@server4 ~]# cd nagios-plugins-2.0.3
[root@server4 nagios-plugins-2.0.3]# ./configure #安装过程中一般会有报错,根据报错解决问题,一般都是依赖问题
[root@server4 nagios-plugins-2.0.3]# yum install -y gcc
[root@server4 nagios-plugins-2.0.3]# ./configure
--with-openssl: no
[root@server4 nagios-plugins-2.0.3]# yum install openssl-devel -y
[root@server4 nagios-plugins-2.0.3]# ./configure
[root@server4 nagios-plugins-2.0.3]# make #编译
[root@server4 nagios-plugins-2.0.3]# make install #安装
root@server4 nagios-plugins-2.0.3]# cd /usr/local/nagios/
[root@server4 nagios]# chown -R nagios.nagios *
b、安装nrpe
nrpe是监控软件nagios的一个扩展,它被用于被监控的服务器上,向nagios监控平台提供该服务器的一些本地的情况。例如,cpu负载、内存使用、硬盘使用等等。
lftp 172.25.254.53:/pub/nagios> get nrpe-2.15.tar.gz
419695 bytes transferred
[root@server4 ~]# tar zxf nrpe-2.15.tar.gz
[root@server4 ~]# cd nrpe-2.15
[root@server4 nrpe-2.15]# ./configure
[root@server4 nrpe-2.15]# make all
[root@server4 nrpe-2.15]# yum install -y xinetd
----------------------------------------------------------------------------------------------
xinetd提供类似于inetd+tcp_wrapper的功能,但是更加强大和安全。它能提供以下特色:
* 支持对tcp、udp、RPC服务(但是当前对RPC的支持不够稳定)
* 基于时间段的访问控制
* 功能完备的log功能,即可以记录连接成功也可以记录连接失败的行为
* 能有效的防止DoS攻击(Denial of Services)
* 能限制同时运行的同一类型的服务器数目
* 能限制启动的所有服务器数目
* 能限制log文件大小
* 将某个服务绑定在特定的系统接口上,从而能实现只允许私有网络访问某项服务
* 能实现作为其他系统的代理。如果和ip伪装结合可以实现对内部私有网络的访问
它最大的缺点是对RPC支持的不稳定性,但是可以启动protmap,与xinetd共存来解决这个问题。
xinetd已经取代了inetd,并且提供了访问控制、加强的日志和资源管理功能,已经成了Red Hat 7 和 Mandrake 7.2的Internet标准超级守护进程。
xinetd用括号括起的、扩展了的语法取代了inetd中的通用的行。另外,还添加了日志和访问控制功能。 虽然inetd可以使用Venema的 tcp_wrappers软件(tcpd) 控制 TCP 的连接,但是你不能用它来控制 UDP 连接。此外,inetd对RPC(portmapper)类型的服务也处理不好。另外,虽然使用 inetd 你可以控制连接速度 ( 通过给wait或是no wait 变量附加一个数值,例如nowait.1表示每隔一秒钟一个实例),你不能控制实例的最大数。这能导致进程表攻击(例如,一个有效的拒绝服务攻击)。通过使用xinetd,我们可以防止Dos。
xinetd 对所有的服务都进行纪录,日志保存到文件 /var/adm/xinetd.log中,并且使用配置文件/etc/xinetd.conf。
----------------------------------------------------------------------------------------------
[root@server4 nrpe-2.15]# make install-plugin #安装插件
[root@server4 nrpe-2.15]# make install-daemon #安装进程
[root@server4 nrpe-2.15]# make install-daemon-config #安装配置
[root@server4 nrpe-2.15]# make install-xinetd
/etc/services #系统中所有的端口信息
[root@server4 nrpe-2.15]# vim /etc/services #添加服务端口
----------------------------
nrpe 5666/tcp
----------------------------
[root@server4 nrpe-2.15]# cd /etc/xinetd.d/
[root@server4 xinetd.d]# vim nrpe
----------------------------------------------------
# description: NRPE (Nagios Remote Plugin Executor)
service nrpe
{
flags = REUSE
socket_type = stream
port = 5666
wait = no
user = nagios
group = nagios
server = /usr/local/nagios/bin/nrpe
server_args = -c /usr/local/nagios/etc/nrpe.cfg --inetd
log_on_failure += USERID
disable = no
only_from = 172.25.254.3 #监控端的ip即nagios服务端的IP,修改此项
}
----------------------------------------------------------------------
[root@server4 xinetd.d]# cd /usr/local/nagios/etc/
[root@server4 etc]# vim nrpe.cfg
---------------------------------------------------------------------------------
221 command[check_disk]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /
---------------------------------------------------------------------------------
[root@server4 etc]# /etc/init.d/xinetd start
(2)监控端的设置
lftp 172.25.254.53:/pub/nagios> get nrpe-2.15.tar.gz
[root@server3 ~]# tar zxf nrpe-2.15.tar.gz
[root@server3 ~]# cd nrpe-2.15
[root@server3 nrpe-2.15]# ./configure
[root@server3 nrpe-2.15]# make all
[root@server3 nrpe-2.15]# make install-plugin
[root@server3 nrpe-2.15]# make install-daemon
[root@server3 nrpe-2.15]# make install-daemon-config
[root@server3 nrpe-2.15]# cd /usr/local/nagios/libexec/
[root@server3 libexec]# ll check_nrpe
-rwxrwxr-x 1 nagios nagios 76769 May 7 22:21 check_nrpe
[root@server3 libexec]# ./check_nrpe -H 172.25.254.4
NRPE v2.15
[root@server3 libexec]# ./check_nrpe -H 172.25.254.4 -c check_disk #-c 指定执行的操作
[root@server3 objects]# pwd
/usr/local/nagios/etc/objects
[root@server3 objects]# vim commands.cfg #配置命令法定义文件
---------------------------------------------------------------------
#在文件定义命令部分的末尾添加命令定义
# 'check_nrpe' command definition
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
----------------------------------------------------------------------
[root@server3 objects]# vim services.cfg #添加被监控端要监控的模块
---------------------------------------------------------------------
#在文件末尾添加被监控主机要监控的模块
define service{
use local-service ; Name of service template to use
host_name server4.example.com
service_description 根分区
check_command check_nrpe!check_disk
}
define service{
use local-service ; Name of service template to use
host_name server4.example.com
service_description 登录用户数
check_command check_nrpe!check_users
}
define service{
use local-service ; Name of service template to use
host_name server4.example.com
service_description 进程总数
check_command check_nrpe!check_total_procs
}
define service{
use local-service ; Name of service template to use
host_name server4.example.com
service_description 系统负荷
check_command check_nrpe!check_load
}
----------------------------------------------------------------------
[root@server3 objects]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg #查看配置文件是否正确
[root@server3 objects]# /etc/init.d/nagios reload
8.登陆nagios进行验证
此时便可在浏览器上登陆到nagios的前端界面,可以查看到监控到远程主机的根分区、登录用户数、系统负荷、进程总数
[root@server3 objects]# mail -f /var/spool/mail/nagios #时nagios默认的邮箱,nagios将监测到到的错误报警信息发送到该邮箱中,供管理者查看
[root@server3 var]# tail -f nagios.log #监控nagios的日志信息(/usr/local/nagios/var)
4.微信报警
(1)下载微信平台私有接口
[root@server3 var]# yum install -y git
[root@server3 libexec]# pwd
/usr/local/nagios/libexec
[root@server3 libexec]# git clone https://github.com/lealife/WeiXin-Private-API