Nagios 监控服务器安装及配置文档

一:需求软件
       RHEL 5.6
Nagios
Nagios-plugins
Nrpe
下载地址:
http://down1.chinaunix.net/distfiles/nagios-3.2.3.tar.gz
http://down1.chinaunix.net/distfiles/nagios-plugins-1.4.10.tar.gz
http://nchc.dl.sourceforge.net/project/nagios/nrpe-2.x/nrpe-2.12/nrpe-2.12.tar.gz
二:安装准备工作
1 :搭建 yum 服务器
       [root@loc ~]# vim /etc/yum.repos.d/rhel-source.repo
              [Server]
name=Server
baseurl=file:///mnt/Server
enabled=1
gpgcheck=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release
2 :安装环境所需要的安装软件包
       [root@loc ~]# yum �Cy install httpd php gcc glibc glibc-common gd gd-devel openssl openssl-devel
三:开始安装 nagios 软件
1 :解压软件包
       [root@loc ~]# tar xf nagios- 3.2.3 .tar.gz -C /usr/src/
       [root@loc ~]# tar xf nagios-plugins- 1.4.10 .tar.gz -C /usr/src/
       [root@loc ~]# tar xf nrpe-1.8.tar.gz -C /usr/src/
 
2 :创建用户
       [root@loc ~]# groupadd nagcmd
       [root@loc ~]# useradd �CG nagcmd nagios
       [root@loc ~]# usermod �CG nagcmd apache
 
3 :编译安装 nagios
       [root@loc ~]# cd /usr/src/nagios- 3.2.3
       [root@loc nagios- 3.2.3 ]# ./configure --prefix=/usr/local/nagios --with-nagios-user=nagios --with-nagios-group=nagios --with-command-group=nagcmd
       [root@loc nagios- 3.2.3 ]# make all                                         
       [root@loc nagios- 3.2.3 ]# make install                              编译安装
       [root@loc nagios- 3.2.3 ]# make install-init                        生成启动脚本
       [root@loc nagios- 3.2.3 ]# make install-commandmode              为外部命令文件配置目录权限
       [root@loc nagios- 3.2.3 ]# make install-config                    生成配置文件
       [root@loc nagios- 3.2.3 ]# make install-webconf                 生成一个与 apache 接口的配置文件
 
4 :编译安装 nagios-plugins
       [root@loc nagios- 3.2.3 ]# cd ../nagios-plugins-1.4.10/
       [root@loc nagios-plugins- 1.4.15 ]# ./configure �Cwith-nagios-user=nagios �Cwith-nagios-group=nagios �Cprefix=/usr/local/nagios
       [root@loc nagios-plugins- 1.4.15 ]# make;make install
      
5 :添加为开机自启动服务
       [root@loc ~]# chkconfig --add nagios
       [root@loc ~]# chkconfig nagios on
 
6 :添加 web 用户
       [root@loc ~]# htpasswd -c /usr/local/nagios/etc/passwrd.users nagiosadmin
 
7 :启动服务,检查配置
       [root@loc ~]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg    检查下 nagios 配置文件的正确性
              Total Warnings: 0          查看这两个为“ 0 ”,说明配置文件没问题,可以启动服务了。
Total Errors:   0
       [root@loc ~]# /etc/init.d/httpd start
       [root@loc ~]# /etc/init.d/nagios start
 
8 :登录 nagios
              Http://127.0.0.1/nagios   出现下面这个登录界面:

登录后的界面:


注意
       如果是开启了 selinux 服务,就要执行下面两步:
       [root@loc ~]# chcon -R -t httpd_sys_content_t /usr/local/nagios/sbin/
       [root@loc ~]# chcon -R -t httpd_sys_content_t /usr/local/nagios/share/
 
到此,安装部分也就完成了!接下来我们就做下添加监控服务器配置
先观看下监控原理图:

四:配置 nagios 监控 linux 主机
1 :主配置文件修改(改动部分内容)
       [root@loc ~]# vim /usr/local/nagios/etc/nagios.cfg
       # Definitions for monitoring the local (Linux) host
cfg_file=/usr/local/nagios/etc/objects/localhost.cfg                    定义监控本机服务(如不需要监控本机,将注释这行)
cfg_file=/usr/local/nagios/etc/objects/hosts.cfg               添加此行,定义被监控的主机名和地址
cfg_file=/usr/local/nagios/etc/objects/hostgroups.cfg          添加此行,定义被监控主机的分组管理
cfg_file=/usr/local/nagios/etc/objects/contactgroups.cfg        添加此行,定义联系人组
cfg_dir/usr/local/nagios/etc/servers                                打开此行,定义需要监控的服务项
 
check_external_commands=1                                     定义在 web 界面下重启 nagios 服务
command_check_interval=10s                                  定义命令检查服务的间隔时间
 
[root@loc ~]# vim /usr/local/nagios/cfi.cfg
authorized_for_system_information=nagiosadmin,test
authorized_for_configuration_information=nagiosadmin,test
authorized_for_system_commands=nagiosadmin,test
authorized_for_all_services=nagiosadmin,test
authorized_for_all_hosts=nagiosadmin,test
authorized_for_all_service_commands=nagiosadmin,test
authorized_for_all_host_commands=nagiosadmin,test
可以定义多个用户,添加到后面用逗号隔开就好。
 
2 object 文件配置(在此目录下添加在 .1 中所讲到的文件)
       创建联系人和联系人组配置文件
[root@loc objects]# vi contacts.cfg
define contact {
     contact_name                 admin                                          # 联系人名
     alias                         system administrator               # 别名
     service_notification_period    24x7                                     # 服务通知的时间段
     host_notification_period       24x7                                    # 主机通知的时间段
     service_notification_options    w,u,c,r                                 # 当服务出现 w ―报警 ,u ―未知 ,c ―严重 ,r ―从异常恢复到正常,在这四种情况下通知联系人
     host_notification_options       d,u,r                                   # 当主机出现 d---- ―当机 ,u ―返回不可达 ,r ―从异常情况恢复正常 , 在这 3 种情况下通知联系人
     service_notification_commands  notify-service-by-email        # 服务出问题通知采用的命令
     host_notification_commands     notify-host-by-email           # 同上
     email                        [email protected]     # 指定联系的人 email 地址
     pager                                          13800138000                   # 定义通过手机短信的方式发送警报的手机号码
        pager                                          13810255206
}
 
[root@loc objects]# vi contactgroups.cfg
define contactgroup{
        contactgroup_name       sagroup                                 # 定义组名
        alias                   system administrator group   # 组别名
        members                admin                   # 定义联系人名( contacts.cfg 中的联系人名)
        }
创建被监控的主机和主机组文件
[root@loc objects]# vi hosts.cfg
       define host {
       host_name             web                      # 被监控的主机名
       alias                      tomas                    # 别名
       address                  192.168.2.6           # 被监控主机地址
       contact_groups       sagroup                 # 联系人组
       check_command     check-host-alive     # 检查主机状态的名字
       check_period  24x7                            # 提醒周期
       max_check_attempts     5                   # 检查失败后重试的次数
       notification_interval      5                   # 提醒的间隔时间
       notification_options       d,u,r                     # 在什么情况提醒
       }
 
       define host {
       host_name             nagios-server               
       alias                      tomas1                        
       address                  192.168.2.7          
       contact_groups       sagroup                
       check_command     check-host-alive    
       check_period  24x7                           
       max_check_attempts     5                  
       notification_interval      5                  
       notification_options       d,u,r                    
       }
       define host {
       host_name             linux                    
       alias                      tomas2                        
       address                  192.168.2.8          
       contact_groups       sagroup                
       check_command     check-host-alive    
       check_period  24x7                           
       max_check_attempts     5                  
       notification_interval      5                  
       notification_options       d,u,r                    
       }
      
       [root@loc objects]# vi hostgroups.cfg
       define hostgroup{
        hostgroup_name  sa-servers
        alias           sa servers
        members         web,nagios-server,linux
        }
 
       配置监控主机服务项
       /usr/local/nagios/etc/servers/web.cfg                                 # 在这下面依次再建 nagios-server.cfg linux.cfg
       define service{
        host_name               web                              # 必须是 hosts.cfg 中定义的主机
        service_description           check-host-alive            #
        check_command           check-host-alive           # commands.cfg 文件中定义或在 nrpe.cfg 里面定义的命令
        max_check_attempts        5                                 # 最大重试次数
        normal_check_interval        5                                 # 检查间隔的单位是分钟
        retry_check_interval          2                                 # 检查间隔的单位是分钟
        check_period              24x7                           
        notification_interval           10                                # 探测到故障后,每隔多长时间发送一次报警信息,单位是分钟
        notification_period           24x7                            # 通知选项跟联系人配置文件相同
        notification_options           w,u,c,r                        
        contact_groups             sagroup                        # 配置文件 contactgroup.cfg 定义的组名称
        }
 
define service{
        host_name                 web
        service_description           check_tcp 80
              check_command           check_tcp!80
        check_period              24x7
        max_check_attempts        4
        normal_check_interval        3
        retry_check_interval          2
        contact_groups             sagroup
        notification_interval           10
        notification_period           24x7
        notification_options           w,u,c,r
        }
 
define service{
        host_name               web
        service_description           cpu load
        check_command             check_nrpe!check_load [A1]   [A1]  
        check_period              24x7
        max_check_attempts        4
        normal_check_interval        3
        retry_check_interval          2
        contact_groups             sagroup
        notification_interval           10
        notification_period           24x7
        notification_options           w,u,c,r
        }
 
define service{
        host_name               web
        service_description           total-procs
        check_command          check_nrpe!check_total_procs
        check_period              24x7
        max_check_attempts        4
        normal_check_interval        3
        retry_check_interval          2
        contact_groups             sagroup
        notification_interval           10
        notification_period           24x7
        notification_options           w,u,c,r
        }
 
监控主机文件配置完成了,我们可以重启下服务,查看下 nagios 的界面。
       验证下配置文件:
       [root@loc objects]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Nagios Core 3.2.3
Copyright (c) 2009-2010 Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 10-03-2010
License: GPL
Website: http://www.nagios.org
Reading configuration data...
   Read main config file okay...
Processing object config file '/usr/local/nagios/etc/objects/commands.cfg'...
Processing object config file '/usr/local/nagios/etc/objects/contacts.cfg'...
Processing object config file '/usr/local/nagios/etc/objects/timeperiods.cfg'...
Processing object config file '/usr/local/nagios/etc/objects/templates.cfg'...
Processing object config file '/usr/local/nagios/etc/objects/localhost.cfg'...
Processing object config file '/usr/local/nagios/etc/objects/hosts.cfg'...
Processing object config file '/usr/local/nagios/etc/objects/hostgroups.cfg'...
Processing object config file '/usr/local/nagios/etc/objects/contactgroups.cfg'...
Processing object config directory '/usr/local/nagios/etc/servers'...
Processing object config file '/usr/local/nagios/etc/servers/linux.cfg'...
Processing object config file '/usr/local/nagios/etc/servers/web.cfg'...
Processing object config file '/usr/local/nagios/etc/servers/nagios-server.cfg'....
     Read object config files okay...
Running pre-flight check on configuration data...
Checking services...
        Checked 12 services.
Checking hosts...
        Checked 2 hosts.
Checking host groups...
        Checked 2 host groups.
Checking service groups...
        Checked 0 service groups.
Checking contacts...
        Checked 3 contacts.
Checking contact groups...
        Checked 2 contact groups.
Checking service escalations...
        Checked 0 service escalations.
Checking service dependencies...
        Checked 0 service dependencies.
Checking host escalations...
        Checked 0 host escalations.
Checking host dependencies...
        Checked 0 host dependencies.
Checking commands...
        Checked 24 commands.
Checking time periods...
        Checked 5 time periods.
Checking for circular paths between hosts...
Checking for circular host and service dependencies...
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...
 
Total Warnings: 0           # 表示文件没有任何警告
Total Errors:   0                 # 表示文件没有任何错误
 
重新启动服务:
       [root@loc objects]# /etc/init.d/nagios restart
Running configuration check...done.
Stopping nagios: done.
Starting nagios: done.
 
[root@loc objects]# /etc/init.d/httpd restart
登录界面:
Http://IP/nagios


可以清晰的看到前面所做的监控配置了。
 
远程监控插件 NRPE 的安装就不再记录了,在用 NRPE 监控的时候记得去修改 services.cfg command.cfg 配置文档,网络上文档都很详细。
 
 
在被监控的服务器上只要安装以下这两个软件包就好:
Nagios-plugins
Nrpe









































你可能感兴趣的:(nagios)