最近公司需要上线监控系统,而且需要部署很多的监控,环境与设备也大都不一样,所以我就写了一份安装监控的技术文档,让我公司的运维来根据我的文档来进行监控的部署。

我的系统是redhat5.4,关闭了iptables与selinux。

1 、安装yum(如果本机有了yum,则可以不安装,跳过此步到第3步)
   
   
   
   
  1. [root@localhost yum.repos.d]# wget http://packages.sw.be/rpmforge-release/rpmforge-release-0.5.1-1.el5.rf.i386.rpm  
  2. root@localhost yum.repos.d]# wget http://dag.wieers.com/rpm/packages/RPM-GPG-KEY.dag.txt  
  3. [root@localhost yum.repos.d]# rpm -Uvh rpmforge-release-0.5.1-1.el5.rf.i386.rpm   
  4. root@localhost yum.repos.d]# rpm --import RPM-GPG-KEY.dag.txt  
  5. [root@localhost yum.repos.d]# yum install yum-fastestmirror yum-presto   

2、安装apache(如果本机默认安装了,那么可以跳过这一步,如果没有安装,则可以yum安装)

   
   
   
   
  1. [root@localhost ~]# yum -y install httpd 

安装nagios需要一些基础支持套件

   
   
   
   
  1. [root@localhost etc]# yum -y install gd gd-devel glibc glibc-common gcc 

3、配置apache来支持nagios

1)建立nagios用户

   
   
   
   
  1. [root@localhost ~]# useradd nagios  
  2. [root@localhost etc]# /usr/sbin/groupadd nagcmd  添加nagcmd用户组,用以通过web页面提交外部控制命令  
  3. [root@localhost etc]# /usr/sbin/usermod -a -G nagcmd nagios将nagios用户加入nagcmd组  
  4. [root@localhost etc]# /usr/sbin/usermod -a -G nagcmd apache将apache用户加入nagcmd组  
  5. [root@localhost etc]# /usr/sbin/usermod -a -G apache nagios将nagios用户加入apache组  
  6. [root@localhost etc]# /usr/sbin/usermod -a -G nagios apache将apache用户加入nagios组  

2)修改apache运行用户和组。默认是daemon,需要把它改成nagios。这样它才能有权限访问我们安装的nagios目录,执行相关的cgi命令,如通过浏览器界面关闭nagios、停止某个故障对象发送报警信息等。(此步可以省略,因为我在部署nagios的时候,没有改变apache的用户与组,也没有出现问题)

3)添加nagios访问目录(nagios 的安装路径/usr/local/nagios),同时使用http用户验证。把下面的内容追加到httpd.conf文件的末尾:

   
   
   
   
  1. ScriptAlias /nagios/cgi-bin /usr/local/nagios/sbin  
  2. <Directory "/usr/local/nagios/sbin"> 
  3. Options ExecCGI  
  4. AllowOverride None  
  5. Order allow,deny  
  6. Allow from all  
  7. AuthName "Nagios Access"  
  8. AuthType Basic  
  9. AuthUserFile /usr/local/nagios/etc/htpasswd  
  10. Require valid-user  
  11. Directory> 
  12. Alias /nagios /usr/local/nagios/share  
  13. <Directory "/usr/local/nagios/share"> 
  14. Options None  
  15. AllowOverride None  
  16. Order allow,deny  
  17. Allow from all  
  18. AuthName "Nagios Access"  
  19. AuthType Basic  
  20. AuthUserFile /usr/local/nagios/etc/htpasswd  
  21. Require valid-user  
  22. Directory> 

4、安装nagios

   
   
   
   
  1. [root@localhost tmp]# tar zxvf nagios-3.3.1.tar.gz   
  2. [root@localhost nagios]# ./configure --prefix=/usr/local/nagios -with-command-group=nagcmd 
  3. [root@localhost nagios]# make all  
  4. [root@localhost nagios]# make install  
  5. [root@localhost nagios]# make install-init  
  6. [root@localhost nagios]# make install-config  
  7. [root@localhost nagios]# make install-commandmode  
  8. [root@localhost nagios]# make install-webconf  

5、安装nagios插件nagios-plugin

   
   
   
   
  1. [root@localhost nagios]#cd /tmp  
  2. [root@localhost tmp]# tar zxvf nagios-plugins-1.4.15.tar.gz   
  3. [root@localhost nagios-plugins-1.4.15]# ./configure  --with-nagios-user=nagios --with-nagios-group=nagios   
  4. [root@localhost nagios-plugins-1.4.15]# make  
  5. [root@localhost nagios-plugins-1.4.15]# make install  
6 、配置nagios
   
   
   
   
  1. [root@localhost nagios-plugins-1.4.15]# cd /usr/local/  
  2. [root@localhost local]# chown -R nagios:nagios nagios/  
  3. [root@localhost local]# chown -R nagios:nagios nagios/*  
  4. [root@localhost local]# cd nagios/etc/  
  5. [root@localhost etc]# vim nagios.cfg    ###修改nagios.cfg配置文件,内容如下:  
  6. cfg_file=/usr/local/nagios/etc/hosts.cfg #增加主机配置文件  
  7. cfg_file=/usr/local/nagios/etc/hostgroups.cfg #增加主机组配置文件  
  8. cfg_file=/usr/local/nagios/etc/contacts.cfg #增加联系人配置文件  
  9. cfg_file=/usr/local/nagios/etc/contactgroups.cfg #增加联系人配置文件  
  10. cfg_file=/usr/local/nagios/etc/services.cfg ##增加服务配置文件  
  11. cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg #时间周期配置文件  
  12. cfg_file=/usr/local/nagios/etc/objects/commands.cfg #命令配置文件  
  13. 修改cgi.cfg配置文件,修改内容如下:  
  14. [root@localhost etc]# vim cgi.cfg  
  15. #如有多个用户,中间用逗号隔开  
  16. authorized_for_system_information=nagios   
  17. authorized_for_configuration_informationnagios 
  18. authorized_for_system_commandsnagios 
  19. authorized_for_all_servicesnagios 
  20. authorized_for_all_hostsnagios 
  21. authorized_for_all_service_commandsnagios 
  22. authorized_for_all_host_commandsnagios 
在这里指定的用户 nagios” 可以通过浏览器操纵 nagios 服务的关闭、重启等各种操作
[root@localhost etc]# sed -i 's/nagiosadmin/nagios/g' cgi.cfg ## 或者用此命令修改
   
   
   
   
  1. (1)、配置主机文件hosts.cfg  
  2. define host{  
  3. host_name                               web1## 主机名为web1,可以在hostname里查看  
  4. alias Nagios                            Server  ##主机别名为Server  
  5. address                                 192.168.10.223##主机的ip地址  
  6. check_command                           check-host-alive ##检查使用的命令,需要在命令定  
  7. 义文件定义,默认是定义好的。  
  8. check_interval                          5  ##检测的时间间隔  
  9. retry_interval                          1 ##检测失败后重试的时间间隔  
  10. max_check_attempts                      5 ##最大重试次数  
  11. check_period                            24x7 ##检测的时段  
  12. process_perf_data                       0  
  13. retain_nonstatus_information            0  
  14. contact_groups                          admin       ###联系组,就是设置邮件报警的组  
  15. notification_interval                   30    ##通知间隔  
  16. notification_period                     24x7  ##通知周期设置  
  17. notification_options                    d,u,r    ####定义什么状态时报警,定义报警状态中的w表示warning,u表示unknown,c表示critial,r表示recovery(即恢复后是否发送通知);报警选项一般生产环境下设置w,c,r即可  
  18. }  
  19. (2)、配置主机组文件hostgroups.cfg  
  20. define hostgroup {  
  21. hostgroup_name                          Nagios-Example  ##定义主机组的名字  
  22. alias Nagios                            Example         ##定义主机组的别名  
  23. members                                 web1            ##主机组的成员,跟hosts.cfg里的hostname一致,否则出错  
  24. }  
  25. (3)、配置联系人文件contacts.cfg  
  26. define contact{  
  27. contact_name                            nagiosadmin                 #联系名称  
  28. alias                                   Nagios Admin                #联系别名  
  29. service_notification_period             24x7                        #服务监控时间为任何时候  
  30. host_notification_period                24x7                        #主机监控时间为任何时候  
  31. service_notification_options            w,u,c,r                     #服务监控的状态  
  32. host_notification_options               d,u,r                       #主机监控的状态  
  33. service_notification_commands           notify-service-by-email     #邮件报警  
  34. host_notification_commands              notify-host-by-email        #同上  
  35. email                                   [email protected]            #接收报警的邮箱  
  36. }  
  37. (4)、配置联系组文件contactgroups.cfg  
  38. define contactgroup{  
  39. contactgroup_name               admin                       #联系组的名字  
  40. alias                           Nagios Administrators       #联系组的别名  
  41. members                         nagiosadmin                 #联系组里的成员,与contacts.cfg里的contact_name 保存一致  
  42.  
  43. }  
  44. (5)、配置服务文件 services.cfg  
  45. define service {  
  46. host_name                 web1                #与hosts.cfg里的host-name保持一致  
  47. service_description          check-host-alive #服务描述  
  48. check_period                 24x7     #服务描述  
  49. max_check_attempts           4 #最大检测次数  
  50. normal_check_interval           3 #检测的时间间隔  
  51. retry_check_interval            2 #重复检测的时间间隔  
  52. contact_groups               admin #发生故障通知的联系人组  
  53. notification_interval           10 #通知间隔  
  54. notification_period             24x7 #通知的时间段  
  55. notification_options            w,u,c,r #定义什么状态时报警,定义报警状态中  
  56. check_command                check-host-alive #检测的命令  
  57. }  
  58. define service {  
  59. host_name                web1   
  60. service_description             PING  
  61. check_period                 24x7   
  62. max_check_attempts           4   
  63. normal_check_interval           3  
  64. retry_check_interval            2  
  65. contact_groups               admin   
  66. notification_interval           10  
  67. notification_period             24x7   
  68. notification_options            w,u,c,r  
  69. check_command                check_ping!100.0,20%!500.0,60%  
  70. }  
  71. define service {  
  72. host_name                web1   
  73. service_description             Root Partition  
  74. check_period                 24x7   
  75. max_check_attempts           4   
  76. normal_check_interval           3  
  77. retry_check_interval            2  
  78. contact_groups               admin   
  79. notification_interval           10  
  80. notification_period             24x7   
  81. notification_options            w,u,c,r  
  82. check_command                check_local_disk!20%!10%!/  
  83. }  
  84. define service {  
  85. host_name                web1   
  86. service_description             Current Users  
  87. check_period                 24x7   
  88. max_check_attempts           4   
  89. normal_check_interval           3  
  90. retry_check_interval            2  
  91. contact_groups               admin   
  92. notification_interval           10  
  93. notification_period             24x7   
  94. notification_options            w,u,c,r  
  95. check_command                check_local_users!20!50  
  96. }  
  97. define service {  
  98. host_name                web1   
  99. service_description             Total Processes  
  100. check_period                 24x7   
  101. max_check_attempts           4   
  102. normal_check_interval           3  
  103. retry_check_interval            2  
  104. contact_groups               admin   
  105. notification_interval           10  
  106. notification_period             24x7   
  107. notification_options            w,u,c,r  
  108. check_command                check_local_procs!250!400!RSZDT  
  109. }  
  110. define service {  
  111. host_name                web1   
  112. service_description             Current Load  
  113. check_period                 24x7   
  114. max_check_attempts           4   
  115. normal_check_interval           3  
  116. retry_check_interval            2  
  117. contact_groups               admin   
  118. notification_interval           10  
  119. notification_period             24x7   
  120. notification_options            w,u,c,r  
  121. check_command                check_local_load!5.0,4.0,3.0!10.0,6.0,4.0  
  122. }  
  123. define service {  
  124. host_name                web1   
  125. service_description             Swap Usage  
  126. check_period                 24x7   
  127. max_check_attempts           4   
  128. normal_check_interval           3  
  129. retry_check_interval            2  
  130. contact_groups               admin   
  131. notification_interval           10  
  132. notification_period             24x7   
  133. notification_options            w,u,c,r  
  134. check_command                check_local_swap!20!10  
  135. }  
  136. define service {  
  137. host_name                web1   
  138. service_description             SSH  
  139. check_period                 24x7   
  140. max_check_attempts           4   
  141. normal_check_interval           3  
  142. retry_check_interval            2  
  143. contact_groups               admin   
  144. notification_interval           10  
  145. notification_period             24x7   
  146. notifications_enabled           0  
  147. notification_options            w,u,c,r  
  148. check_command                check_ssh  
  149. }  
  150. define service {  
  151. host_name                web1   
  152. service_description             HTTP  
  153. check_period                 24x7   
  154. max_check_attempts           4   
  155. normal_check_interval           3  
  156. retry_check_interval            2  
  157. contact_groups               admin   
  158. notification_interval           10  
  159. notification_period             24x7   
  160. notifications_enabled           0  
  161. notification_options            w,u,c,r  
  162. check_command                check_http  
  163. }  
7 、安装nrpe
   
   
   
   
  1. [root@localhost etc]# cd /tmp/  
  2. [root@localhost tmp]# tar zxvf nrpe-2.12.tar.gz   
  3. [root@localhost tmp]# cd nrpe-2.12  
  4. [root@localhost nrpe-2.12]# ./configure --prefix=/usr/local/nrpe  
  5. [root@localhost nrpe-2.12]# make  
  6. [root@localhost nrpe-2.12]# make install  
复制文件
   
   
   
   
  1. [root@localhost nrpe-2.12]# cp /usr/local/nrpe/libexec/check_nrpe  /usr/local/nagios/libexec  
  2. [root@localhost nrpe-2.12]# cp /usr/local/nagios/libexec/check_disk  /usr/local/nrpe/libexec  
  3. [root@localhost nrpe-2.12]# cp /usr/local/nagios/libexec/check_load  /usr/local/nrpe/libexec  
  4. [root@localhost nrpe-2.12]# cp /usr/local/nagios/libexec/check_ping  /usr/local/nrpe/libexec  
  5. [root@localhost nrpe-2.12]# cp /usr/local/nagios/libexec/check_procs  /usr/local/nrpe/libexec  
配置nrpe
   
   
   
   
  1. [root@localhost nrpe-2.12]# mkdir /usr/local/nrpe/etc  
  2. [root@localhost nrpe-2.12]# cp sample-config/nrpe.cfg /usr/local/nrpe/etc/  

修改nrpe.cfg的配置问题,如果是服务端的话,可以不修改,如果是客户端的话,则修改下面:

allowed_hosts=127.0.0.1

可以在allowed_hosts里加入服务都的ip

   
   
   
   
  1. [root@localhost nrpe-2.12]# /usr/local/nrpe/bin/nrpe -c /usr/local/nrpe/etc/nrpe.cfg -d
  2. [root@localhost nrpe-2.12]# ps -ef|grep nrpe
  3. nagios 4465 1 0 21:02 ? 00:00:00 /usr/local/nrpe/bin/nrpe -c /usr/local/nrpe/etc/nrpe.cfg -d
  4. root 4467 12877 0 21:02 pts/2 00:00:00 grep nrpe
  5. [root@localhost nrpe-2.12]# lsof -i:5666
  6. COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
  7. nrpe 4465 nagios 4u IPv4 81685 TCP *:5666 (LISTEN)

修改nagios与nrpe的所属用户与组

   
   
   
   
  1. [root@localhost local]# chown -R nagios:nagios /usr/local/nagios/*
  2. [root@localhost local]# chown -R nagios:nagios /usr/local/nrpe/*

8、启动nagios

先查看nagios的配置是否有问题
   
   
   
   
  1. [root@localhost etc]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
  2. Nagios Core 3.3.1
  3. Copyright (c) 2009-2011 Nagios Core Development Team and Community Contributors
  4. Copyright (c) 1999-2009 Ethan Galstad
  5. Last Modified: 07-25-2011
  6. License: GPL
  7. Website: http://www.nagios.org
  8. Reading configuration data...
  9. Read main config file okay...
  10. Processing object config file '/usr/local/nagios/etc/objects/commands.cfg'...
  11. Processing object config file '/usr/local/nagios/etc/objects/timeperiods.cfg'...
  12. Processing object config file '/usr/local/nagios/etc/hosts.cfg'...
  13. Processing object config file '/usr/local/nagios/etc/hostgroups.cfg'...
  14. Processing object config file '/usr/local/nagios/etc/contacts.cfg'...
  15. Processing object config file '/usr/local/nagios/etc/contactgroups.cfg'...
  16. Processing object config file '/usr/local/nagios/etc/services.cfg'...
  17. Read object config files okay...
  18. Running pre-flight check on configuration data...
  19. Checking services...
  20. Checked 9 services.
  21. Checking hosts...
  22. Checked 1 hosts.
  23. Checking host groups...
  24. Checked 1 host groups.
  25. Checking service groups...
  26. Checked 0 service groups.
  27. Checking contacts...
  28. Checked 2 contacts.
  29. Checking contact groups...
  30. Checked 1 contact groups.
  31. Checking service escalations...
  32. Checked 0 service escalations.
  33. Checking service dependencies...
  34. Checked 0 service dependencies.
  35. Checking host escalations...
  36. Checked 0 host escalations.
  37. Checking host dependencies...
  38. Checked 0 host dependencies.
  39. Checking commands...
  40. Checked 24 commands.
  41. Checking time periods...
  42. Checked 5 time periods.
  43. Checking for circular paths between hosts...
  44. Checking for circular host and service dependencies...
  45. Checking global event handlers...
  46. Checking obsessive compulsive processor commands...
  47. Checking misc settings...
  48. Total Warnings: 0
  49. Total Errors: 0
  50. Things look okay - No serious problems were detected during the pre-flight check
没有问题,则启动nagios
   
   
   
   
  1.  [root@localhost etc]# chkconfig --add nagios 将nagios添加到服务中  
  2. [root@localhost etc]# chkconfig nagios on    设置服务为自启动  
  3. [root@localhost etc]# service nagios start    启动nagios  
创建web验证用户
   
   
   
   
  1. [root@localhost etc]# htpasswd -c /usr/local/nagios/etc/htpasswd nagios  
  2. New password:   
  3. Re-type new password:   
  4. Adding password for user nagios  
创建开机启动nrpe
   
   
   
   
  1. [root@localhost etc]#echo "/usr/local/nrpe/bin/nrpe -c /usr/local/nrpe/etc/nrpe.cfg -d" >>/etc/rc.local

启动sendmail,接收报警

   
   
   
   
  1. [root@localhost etc]#service sendmail start
之后你断掉httpd服务就能收到报警,如果出现了解决不了的问题,可以联系我。
或者直接浏览我的下一篇文章文章为什么nagios不能发生报警邮件
”,地址是http://dl528888.blog.51cto.com/2382721/763079