Nagios 监控 nginx 服务

1nginx 服务器上安装nrpe客户端:

Nginx的服务需要监控起来,不然万一down了而不及时修复,会影响web应用,如下web应用上面启动的nginx后台进程

  1. [root@lb-net-2 ~]# ps aux|grep nginx
  2. nobody 15294 0.0 0.0 22432 3464 ? S Jul03 0:05 nginx: worker process
  3. nobody 15295 0.0 0.0 22432 3480 ? S Jul03 0:05 nginx: worker process
  4. ......
  5. nobody 15316 0.0 0.0 22432 3468 ? S Jul03 0:05 nginx: worker process
  6. nobody 15317 0.0 0.0 22432 3480 ? S Jul03 0:05 nginx: worker process
  7. root 16260 0.0 0.0 20584 1684 ? Ss Jun18 0:00 nginx: master process /usr/local/nginx/sbin/nginx
  8. root 21211 0.0 0.0 103252 860 pts/1 S+ 17:50 0:00 grep nginx


1.1rpm方式安装nrpe客户端

下载地址:http://download.csdn.net/detail/mchdba/7493875


  1. [root@localhost nagios]# ll
  2. 总计 768
  3. -rw-r--r-- 1 root root 713389 12-16 12:08 nagios-plugins-1.4.11-1.x86_64.rpm
  4. -rw-r--r-- 1 root root 32706 12-16 12:09 nrpe-2.12-1.x86_64.rpm
  5. -rw-r--r-- 1 root root 18997 12-16 12:08 nrpe-plugin-2.12-1.x86_64.rpm
  6. [root@localhost nagios]# rpm -ivh *.rpm --nodeps --force


1.2 在配置文件最末尾,添加配置信息以及监控主机服务器ip地址

  1. [root@ localhost nagios]# vim /etc/nagios/nrpe.cfg
  2. # add by tim on 2014-06-11
  3. command[check_users]=/usr/local/nagios/libexec/check_users -w 8 -c 15
  4. command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
  5. command[check_sda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda
  6. command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
  7. #command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 50 -c 80
  8. command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 750 -c 800
  9. command[check-host-alive]=/usr/local/nagios/libexec/check_ping -H 10.xx.3.29 -w 3000.0,80% -c 5000.0,100% -p 5
  10. allowed_hosts = 127.0.0.1,192.168.188.110, 10.xx.3.41
  11. check下命令是否生效:
  12. [root@web-9 nrpe-2.15]# /usr/local/nagios/libexec/check_users -w 8 -c 15
  13. USERS OK - 2 users currently logged in |users=2;8;15;0
  14. [root@web-9 nrpe-2.15]#
  15. 看到已经USERS OK -.命令已经生效。

1.3 启动nrpe报错如下:

  1. [root@web-9 ~]# service nrpe restart
  2. Shutting down nrpe: [失败]
  3. Starting nrpe: /usr/sbin/nrpe: error while loading shared libraries: libssl.so.6: cannot open shared object file: No such file or directory
  4.                                                            [失败]
  5. [root@web-9 ~]#
  6. [root@db-m2-slave-1 nagios_client]# service nrpe start
  7. Starting nrpe: /usr/sbin/nrpe: error while loading shared libraries: libssl.so.6: cannot open shared object file: No such file or directory
  8.                                                            [失败]
  9. [root@db-m2-slave-1 nagios_client]#
  10. 建立连接
  11. [root@db-m2-slave-1 nagios_client]# ln -s /usr/lib64/libssl.so /usr/lib64/libssl.so.6
  12.  (如果没有libssl.so,就采用别的libssl.so.10来做软连接,ln -s /usr/lib64/libssl.so.10 /usr/lib64/libssl.so.6)
  13. [root@db-m2-slave-1 nagios_client]#
  14. 再重新启动如下:
  15. [root@db-m2-slave-1 nagios_client]# service nrpe start
  16. Starting nrpe: /usr/sbin/nrpe: error while loading shared libraries: libcrypto.so.6: cannot open shared object file: No such file or directory
  17.                                                            [失败]
  18. [root@web-10 ~]# ll /usr/lib64/libcrypto.so
  19. lrwxrwxrwx. 1 root root 18 10月 13 2013 /usr/lib64/libcrypto.so -> libcrypto.so.1.0.0
  20. [root@db-m2-slave-1 nagios_client]#
  21. 再建链接:
  22. [root@db-m2-slave-1 nagios_client]# ln -s /usr/lib64/libcrypto.so /usr/lib64/libcrypto.so.6
  23. (或者如果没有libcrypto.so,就采用libcrypto.so.10做软连接, ln -s /usr/lib64/libcrypto.so.10 /usr/lib64/libcrypto.so.6)
  24. [root@db-m2-slave-1 nagios_client]# service nrpe start
  25. Starting nrpe: [确定]
  26. [root@db-m2-slave-1 nagios_client]#

1.4 检测下nrpe是否正常运行:

nagios服务器端check

[root@cache-2 ~]#  /usr/local/nagios/libexec/check_nrpe -H 10.xx.3.xx

NRPE v2.12

[root@cache-2 ~]#

看到返回NRPE v2.15表示已经连接成功,客户端的nrpe服务已经监控完成。

 

2,比较简单的通过check_http的方式监控

可以在/etc/nagios/nrpe.cfg里面采用check_http的方式来获取nginx是否运行:

(1)     编辑nrpe.cfg

  1. Vim /etc/nagios/nrpe.cfg
  2. command[check_nginx_status]=/usr/lib/nagios/plugins/check_http -I localhost -p 80 -u /nginx_status -e 200 -w 3 -c 10

(2)     重启nrpe服务

  1. [root@lb-net-2 ~]# service nrpe restart
  2. Shutting down nrpe: [确定]
  3. Starting nrpe: [确定]
  4. [root@lb-net-2 ~]#

(3)     nagios服务器端check,成功。

  1. [root@cache-2 ~]# /usr/local/nagios/libexec/check_nrpe -H10.xx.1.22 -c check_nginx_status
  2. HTTP OK HTTP/1.1 200 OK - 254 bytes in 0.002 seconds |time=0.002031s;3.000000;10.000000;0.000000 size=254B;;;0

(4)     services.cfg里面添加check_nginx_status服务

  1. define service{
  2.         host_name lb-net-2
  3.         service_description check_nginx_status
  4.         check_command check_nrpe!check_nginx_status
  5.         max_check_attempts 5
  6.         normal_check_interval 3
  7.         retry_check_interval 2
  8.         check_period 24x7
  9.         notification_interval 10
  10.         notification_period 24x7
  11.         notification_options w,u,c,r
  12.         contact_groups opsweb
  13.         }

(5)     command.cfg添加check_nginx_status服务

  1. define command{
  2.         command_name check_nginx_status
  3.         command_line $USER1$/check_nginx_status -I $HOSTADDRESS$ -w $Warning$ -c $Cri$
  4.         }

(6)     重新加载nagios

  1. [root@cache-2 objects]# service nagios reload
  2. Running configuration check...
  3. Reloading nagios configuration...
  4. done
  5. [root@cache-2 objects]#

(7)     查看界面的nginx监控服务,如下所示:

 Nagios 监控 nginx 服务_第1张图片

3 编写脚本来监控nginx服务

3.1 调试详细经过

  1. [root@lb-net-2 run]# find / -name nginx.pid
  2. /usr/local/nginx/logs/nginx.pid
  3. [root@lb-net-2 run]# /usr/lib/nagios/plugins/check_nginxstatus -H localhost -P 80 -p /usr/local/nginx/logs/ -n /usr/local/nginx/logs/nginx.pid -s nginx_status -o /tmp/ -w 1500 -c 2000
  4. expr: 参数数目错误
  5. expr: 语法错误
  6. (standard_in) 1: syntax error
  7. /usr/lib/nagios/plugins/check_nginxstatus: line 258: [: : integer expression expected
  8. /usr/lib/nagios/plugins/check_nginxstatus: line 262: [: : integer expression expected
  9. OK - nginx is running. requests per second, connections per second ( requests per connection) | \'reqpsec\'= \'conpsec\'= \'conpreq\'= ]

分析:去查看262行,将逻辑运算符 "-a" 改成"&&"

  1. [root@lb-net-2 run]# vim /usr/lib/nagios/plugins/check_nginxstatus
  2. [root@lb-net-2 run]# /usr/lib/nagios/plugins/check_nginxstatus -H localhost -P 80 -p /usr/local/nginx/logs/ -n /usr/local/nginx/logs/nginx.pid -s nginx_status -o /tmp/ -w 1500 -c 2000
  3. expr: 参数数目错误
  4. expr: 语法错误
  5. (standard_in) 1: syntax error
  6. /usr/lib/nagios/plugins/check_nginxstatus: line 258: [: missing `]\'
  7. /usr/lib/nagios/plugins/check_nginxstatus: line 262: [: : integer expression expected
  8. OK - nginx is running. requests per second, connections per second ( requests per connection) | \'reqpsec\'= \'conpsec\'= \'conpreq\

看到已经OK了,再修改文件。

  1. [root@lb-net-2 run]# vim /usr/lib/nagios/plugins/check_nginxstatus
  2. [root@lb-net-2 run]#
  3. [root@lb-net-2 run]# /usr/lib/nagios/plugins/check_nginxstatus -H localhost -P 80 -p /usr/local/nginx/logs/ -n /usr/local/nginx/logs/nginx.pid -s nginx_status -o /tmp/ -w 1500 -c 2000
  4. expr: 参数数目错误
  5. expr: 语法错误
  6. (standard_in) 1: syntax error
  7. /usr/lib/nagios/plugins/check_nginxstatus: line 258: [: missing `]\'
  8. OK - nginx is running. requests per second, connections per second ( requests per connection) | \'reqpsec\'= \'conpsec\'= \'conpreq\

将[]改成使用"[[]]", 即可!

  1. [root@lb-net-2 run]# vim /usr/lib/nagios/plugins/check_nginxstatus
  2. [root@lb-net-2 run]# /usr/lib/nagios/plugins/check_nginxstatus -H localhost -P 80 -p /usr/local/nginx/logs/ -n /usr/local/nginx/logs/nginx.pid -s nginx_status -o /tmp/ -w 1500 -c 2000
  3. expr: 参数数目错误
  4. expr: 语法错误
  5. (standard_in) 1: syntax error
  6. OK - nginx is running. requests per second, connections per second ( requests per connection) | \'reqpsec\'= \'conpsec\'= \'conpreq\'= ]
  7. [root@lb-net-2 run]#

注释掉#reqpcon=`echo"scale=2; $reqpsec / $conpsec" | bc -l`之后,就不会报(standard_in) 1: syntax error错误,如下所示:

  1. [root@lb-net-2 run]# /usr/lib/nagios/plugins/check_nginxstatus -H localhost -P 80 -p /usr/local/nginx/logs/ -s nginx_status -n nginx.pid -w 15000 -c 20000
  2. expr: 参数数目错误
  3. expr: 语法错误
  4. OK - nginx is running. requests per second, connections per second ( requests per connection) | \'reqpsec\'= \'conpsec\'= \'conpreq\'= ]
  5. [root@lb-net-2 run]#

注释掉# reqpsec=`expr$tmp2_reqpsec - $tmp1_reqpsec` 就不会再报 expr: 参数数目错误,如下所示:

报错:

  1. [root@lb-net-2 run]# /usr/lib/nagios/plugins/check_nginxstatus -H localhost -P 80 -p /usr/local/nginx/logs/ -s nginx_status -n nginx.pid -w 15000 -c 20000
  2. expr: 语法错误
  3. OK - nginx is running. requests per second, connections per second ( requests per connection) | \'reqpsec\'= \'conpsec\'= \'conpreq\'= ]

再次注释掉 #reqpcon=`echo"scale=2; $reqpsec / $conpsec" | bc -l` 后,运行不会报expr: 语法错误,如下所示:

  1. [root@lb-net-2 run]# /usr/lib/nagios/plugins/check_nginxstatus -H localhost -P 80 -p /usr/local/nginx/logs/ -s nginx_status -n nginx.pid -w 15000 -c 20000
  2. OK - nginx is running. requests per second, connections per second ( requests per connection) | \'reqpsec\'= \'conpsec\'= \'conpreq\'= ]
  3. [root@lb-net-2 run]#

看到这里发现'reqpsec'= 'conpsec'= 'conpreq'=都没有值,但是nginx又是在启动运行着,问题出在哪里?经过排查,原来是nginx_status服务没有启动,需要在/usr/local/nginx/conf/nginx.conf配置文件里面添加如下配置:

  1. # 添加pid参数
  2. pid logs/nginx.pid;
  3. #charset koi8-r;
  4.         access_log logs/host.access.log main;
  5.         location /nginx_status {
  6.                 stub_status on;
  7.                 access_log off;
  8.                                      deny all;
  9.          }

然后重新加载nginx,看到新的nginx-status文件是生成了,但是文件内容为空,如下所示:

  1. [root@lb-net-2 logs]# ll /tmp/nginx*
  2. -rw-r--r--. 1 root root 0 7月 3 15:06 /tmp/nginx-status.1
  3. [root@lb-net-2 logs]#

去查看ngins后台日志

  1. [root@lb-net-2 logs]# cd /usr/local/nginx/
  2. [root@lb-net-2 logs]# tail -n 300 error.log
  3. ……
  4. 2014/07/03 15:05:47 [error] 4285#0: *1851293 access forbidden by rule, client: 127.0.0.1, server: localhost, request: \"GET /nginx_status HTTP/1.0\", host: \"localhost\"
  5. 2014/07/03 15:05:48 [error] 4285#0: *1851294 access forbidden by rule, client: 127.0.0.1, server: localhost, request: \"GET /nginx_status HTTP/1.0\", host: \"localhost\"
  6. 2014/07/03 15:06:12 [error] 4282#0: *1851362 access forbidden by rule, client: 127.0.0.1, server: localhost, request: \"GET /nginx_status HTTP/1.0\", host: \"localhost\"
  7. 2014/07/03 15:06:13 [error] 4282#0: *1851363 access forbidden by rule, client: 127.0.0.1, server: localhost, request: \"GET /nginx_status HTTP/1.0\", host: \"localhost\"
  8. 2014/07/03 15:06:55 [error] 4285#0: *1851509 access forbidden by rule, client: 127.0.0.1, server: localhost, request: \"GET /nginx_status HTTP/1.0\", host: \"localhost\"
  9. 2014/07/03 15:06:56 [error] 4285#0: *1851519 access forbidden by rule, client: 127.0.0.1, server: localhost, request: \"GET /nginx_status HTTP/1.0\", host: \"localhost\"

查看nginx编译参数

  1. [root@lb-net-2 logs]# /usr/local/nginx/sbin/nginx -V
  2. nginx version: nginx/1.4.2
  3. built by gcc 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC)
  4. configure arguments: --prefix=/usr/local/nginx --with-http_stub_status_module --with-http_realip_module


证明确实是加载了stub_status插件,之后去修改配置文件,注释掉deny all;重新加载nginx。

  1. [root@lb-net-2 logs]# vim /usr/local/nginx/conf/nginx.conf
  2. #deny all;
  3. [root@lb-net-2 logs]# service nginx reload
  4. reload nginx
  5. [root@lb-net-2 logs]#
  6. [root@lb-net-2 logs]# ll /tmp/nginx*
  7. ls: 无法访问/tmp/nginx*: 没有那个文件或目录
  8. [root@lb-net-2 logs]#

还是没有看到/tmp/nginx-status.1状态文件生成,因为nagios下监控nginx的脚本是从nginx-status.1获取数据,如果没有这个文件,没有办法获取数据。

继续google”nginx stub_status没有生成nginx-status.1”文件,看到有人说只要配置好了这个状态文件有没有无所谓,我就试着直接运行脚本看看能否生效。

  1. [root@lb-net-2 logs]# ll /tmp/nginx*
  2. ls: 无法访问/tmp/nginx*: 没有那个文件或目录
  3. [root@lb-net-2 logs]# /root/check_nginx2.sh -H localhost -P 80 -p /usr/local/nginx/logs/ -n nginx.pid -s nginx_status -w 15000 -c 20000
  4. OK - nginx is running. 1 requests per second, 2 connections per second (.50 requests per connection) | \'reqpsec\'=1 \'conpsec\'=2 \'conpreq\'=.50 ]
  5. [root@lb-net-2 logs]#

看到'reqpsec'=1 'conpsec'=2 'conpreq'=.50里面有数据了,再去check下文件有没有生成,如下所示:

  1. [root@lb-net-2 logs]# ll /tmp/nginx*
  2. ls: 无法访问/tmp/nginx*: 没有那个文件或目录
  3. [root@lb-net-2 logs]#

还是没有文件生成,但是check已经有数据了,证明不一定要拘泥于是否在/tmp/目录下是否有nginx-status.1文件。通过脚本分析如下:

  1. [root@lb-net-2 logs]# vim /usr/lib/nagios/plugins/check_nginxstatus
  2. 180 get_status() {
  3. 181 if [ \"$secure\" = 1 ]
  4. 182 then
  5. 183 wget_opts=\"-O- -q -t 3 -T 3 --no-check-certificate\"
  6. 184 out1=`wget ${wget_opts} http://${hostname}:${port}/${status_page}`
  7. 185 sleep 1
  8. 186 out2=`wget ${wget_opts} http://${hostname}:${port}/${status_page}`
  9. 187 else
  10. 188 wget_opts=\"-O- -q -t 3 -T 3\"
  11. 189 out1=`wget ${wget_opts} http://${hostname}:${port}/${status_page}`
  12. 190 sleep 1
  13. 191 out2=`wget ${wget_opts} http://${hostname}:${port}/${status_page}`
  14. 192 fi
  15. 193
  16. 194 if [ -z \"$out1\" -o -z \"$out2\" ]
  17. 195 then
  18. 196 echo \"UNKNOWN - Local copy/copies of $status_page is empty.\"
  19. 197 exit $ST_UK
  20. 198 fi
  21. 199 }

是通过访问`wget -O- -q -t 3 -T 3 --no-check-certificate http://10.xx.xx.xx:80/nginx_status`这个链接来获取status的数据记录的,而不是去加载/tmp/nginx-status.1文件来获取数据的。直接访问 http://10.xx.xx.xx:80/nginx_status 地址就能获取nginx运行数据,如下图所示:
Nagios 监控 nginx 服务_第2张图片

nagios服务器上check下,报错:

  1. [root@cache-2 ~]# /usr/local/nagios/libexec/check_nrpe -H10.xx.xx.xx -c check_nginx_status
  2. UNKNOWN - Local copy/copies of nginx_status is empty.
  3. [root@cache-2 ~]#

检查监控脚本,搜索‘Local copy/copies of nginx_status is empty.’在第197行,有如下代码:

  1. 195 if [ -z \"$out1\" -o -z \"$out2\" ]
  2. 196 then
  3. 197 echo \"UNKNOWN - Local copy/copies of $status_page is empty.\"
  4. 198 exit $ST_UK
  5. 199 fi

看出是由于if [ -z "$out1" -o -z "$out2" ]这个判断生效,导致监控脚本运行到这里就exit了。继续调试,发现用nagios服务器调用脚本的时候,执行到以下第190行到第192
        
out1=`/usr/bin/wget ${wget_opts}http://${hostname}:${port}/${status_page}`

       sleep 1

       out2=`/usr/bin/wget ${wget_opts} http://${hostname}:${port}/${status_page}`

的时候,out1为空,out2也为空,所以在后面的if [ -z "$out1" -o -z "$out2" ]判断通过报出信息为:UNKNOWN- Local copy/copies of $status_page is empty. 然后直接exit

 

说明:由于nginx是要调用wget命令来获取nginx_status状态的,而wget命令是只能以root用户来运行的, 所以需要将nagios用户设置成可以无需密码直接suroot,这样就能以nagios用户运行命令sudo /usr/lib/nagios/plugins/check_nginxstatus 。在centos系统中,无法直接调用sudo命令,需要修改/etc/sudoers, 找到 #Defaultsrequiretty 并取消注释,另外新增一行。表示nagios用户不需要登陆终端就可以调用命令,如下所示:

Defaults    requiretty

Defaults:nagios    !requiretty

#添加nagios 请求sudo,允许特定指令时(可跟参数),不需要密码(如)。

nagios ALL=(ALL) NOPASSWD: ALL

修改完后,再check,数据出来了:

  1. [root@cache-2 ~]# /usr/local/nagios/libexec/check_nrpe -H10.xx.xx.xx -c check_nginx_status
  2. OK - nginx is running. 1 requests per second, 1 connections per second (1.00 requests per connection) | \'reqpsec\'=1 \'conpsec\'=1 \'conpreq\'=1.00 ]
  3. [root@cache-2 ~]#

3.2  sharecheck_nginxstatus脚本


  1. #!/bin/sh

  2. PROGNAME=`basename $0`
  3. VERSION=\\\"Version 1.1,\\\"
  4. AUTHOR=\\\"tim man\\\"

  5. ST_OK=0
  6. ST_WR=1
  7. ST_CR=2
  8. ST_UK=3
  9. hostname=\\\"localhost\\\"
  10. port=80
  11. path_pid=/var/run
  12. name_pid=\\\"nginx.pid\\\"
  13. status_page=\\\"nginx_status\\\"
  14. pid_check=1
  15. secure=0

  16. print_version() {
  17.     echo \\\"$VERSION $AUTHOR\\\"
  18. }

  19. print_help() {
  20.     print_version $PROGNAME $VERSION
  21.     echo \\\"\\\"
  22.     echo \\\"$PROGNAME is a Nagios plugin to check whether nginx is running.\\\"
  23.     echo \\\"It also parses the nginx\\\'s status page to get requests and\\\"
  24.     echo \\\"connections per second as well as requests per connection. You\\\"
  25.     echo \\\"may have to alter your nginx configuration so that the plugin\\\"
  26.     echo \\\"can access the server\\\'s status page.\\\"
  27.     echo \\\"The plugin is highly configurable for this reason. See below for\\\"
  28.     echo \\\"available options.\\\"
  29.     echo \\\"\\\"
  30.     echo \\\"$PROGNAME -H localhost -P 80 -p /var/run -n nginx.pid \\\"
  31.     echo \\\" -s nginx_statut -o /tmp [-w INT] [-c INT] [-S] [-N]\\\"
  32.     echo \\\"\\\"
  33.     echo \\\"Options:\\\"
  34.     echo \\\" -H/--hostname)\\\"
  35.     echo \\\" Defines the hostname. Default is: localhost\\\"
  36.     echo \\\" -P/--port)\\\"
  37.     echo \\\" Defines the port. Default is: 80\\\"
  38.     echo \\\" -p/--path-pid)\\\"
  39.     echo \\\" Path where nginx\\\'s pid file is being stored. You might need\\\"
  40.     echo \\\" to alter this path according to your distribution. Default\\\"
  41.     echo \\\" is: /var/run\\\"
  42.     echo \\\" -n/--name_pid)\\\"
  43.     echo \\\" Name of the pid file. Default is: nginx.pid\\\"
  44.     echo \\\" -N/--no-pid-check)\\\"
  45.     echo \\\" Turn this on, if you don\\\'t want to check for a pid file\\\"
  46.     echo \\\" whether nginx is running, e.g. when you\\\'re checking a\\\"
  47.     echo \\\" remote server. Default is: off\\\"
  48.     echo \\\" -s/--status-page)\\\"
  49.     echo \\\" Name of the server\\\'s status page defined in the location\\\"
  50.     echo \\\" directive of your nginx configuration. Default is:\\\"
  51.     echo \\\" nginx_status\\\"
  52.     echo \\\" -S/--secure)\\\"
  53.     echo \\\" In case your server is only reachable via SSL, use this\\\"
  54.     echo \\\" this switch to use HTTPS instead of HTTP. Default is: off\\\"
  55.     echo \\\" -w/--warning)\\\"
  56.     echo \\\" Sets a warning level for requests per second. Default is: off\\\"
  57.     echo \\\" -c/--critical)\\\"
  58.     echo \\\" Sets a critical level for requests per second. Default is:\\\"
  59.     echo \\\" off\\\"
  60.     exit $ST_UK
  61. }

  62. while test -n \\\"$1\\\"; do
  63.     case \\\"$1\\\" in
  64.         -help|-h)
  65.             print_help
  66.             exit $ST_UK
  67.             ;;
  68.         --version|-v)
  69.             print_version $PROGNAME $VERSION
  70.             exit $ST_UK
  71.             ;;
  72.         --hostname|-H)
  73.             hostname=$2
  74.             shift
  75.             ;;
  76.         --port|-P)
  77.             port=$2
  78.             shift
  79.             ;;
  80.         --path-pid|-p)
  81.             path_pid=$2
  82.             shift
  83.             ;;
  84.         --name-pid|-n)
  85.             name_pid=$2
  86.             shift
  87.             ;;
  88.         --no-pid-check|-N)
  89.             pid_check=0
  90.             ;;
  91.         --status-page|-s)
  92.             status_page=$2
  93.             shift
  94.             ;;
  95.         --secure|-S)
  96.             secure=1
  97.             ;;
  98.         --warning|-w)
  99.             warning=$2
  100.             shift
  101.             ;;
  102.         --critical|-c)
  103.             critical=$2
  104.             shift
  105.             ;;
  106.         *)
  107.             echo \\\"Unknown argument: $1\\\"
  108.             print_help
  109.             exit $ST_UK
  110.             ;;
  111.         esac
  112.     shift
  113. done

  114. get_wcdiff() {
  115.     if [ ! -z \\\"$warning\\\" -a ! -z \\\"$critical\\\" ]
  116.     then
  117.         wclvls=1

  118.         if [ ${warning} -ge ${critical} ]
  119.         then
  120.             wcdiff=1
  121.         fi
  122.     elif [ ! -z \\\"$warning\\\" -a -z \\\"$critical\\\" ]
  123.     then
  124.         wcdiff=2
  125.     elif [ -z \\\"$warning\\\" -a ! -z \\\"$critical\\\" ]
  126.     then
  127.         wcdiff=3
  128.     fi
  129. }

  130. val_wcdiff() {
  131.     if [ \\\"$wcdiff\\\" = 1 ]
  132.     then
  133.         echo \\\"Please adjust your warning/critical thresholds. The warning \\\\
  134. must be lower than the critical level!\\\"
  135.         exit $ST_UK
  136.     elif [ \\\"$wcdiff\\\" = 2 ]
  137.     then
  138.         echo \\\"Please also set a critical value when you want to use \\\\
  139. warning/critical thresholds!\\\"
  140.         exit $ST_UK
  141.     elif [ \\\"$wcdiff\\\" = 3 ]
  142.     then
  143.         echo \\\"Please also set a warning value when you want to use \\\\
  144. warning/critical thresholds!\\\"
  145.         exit $ST_UK
  146.     fi
  147. }

  148. check_pid() {
  149.     if [ -f \\\"$path_pid/$name_pid\\\" ]
  150.     then
  151.         retval=0
  152.     else
  153.         retval=1
  154.     fi
  155. }

  156. get_status() {
  157.     if [ \\\"$secure\\\" = 1 ]
  158.     then
  159.         wget_opts=\\\"-O- -q -t 3 -T 3 --no-check-certificate\\\"
  160.         #out1=`/usr/bin/wget ${wget_opts} http://${hostname}:${port}/${status_page}`
  161.        out1=`/usr/bin/wget -O- -q -t 3 -T 3 http://localhost:80/nginx_status`
  162.      sleep 1
  163.     out2=`/usr/bin/wget -O- -q -t 3 -T 3 http://localhost:80/nginx_status`
  164.     else
  165.         wget_opts=\\\"-O- -q -t 3 -T 3\\\"
  166.     out1=`/usr/bin/wget -O- -q -t 3 -T 3 http://localhost:80/nginx_status`
  167.     sleep 1
  168.         out2=`/usr/bin/wget -O- -q -t 3 -T 3 http://localhost:80/nginx_status`
  169.     fi
  170.     if [ -z \\\"$out1\\\" -o -z \\\"$out2\\\" ]
  171.     then
  172.         echo \\\"out1:$out1 out2:$out2, UNKNOWN - Local copy/copies of $status_page is empty.\\\"
  173.     exit $ST_UK
  174.     fi
  175. }

  176. get_vals() {
  177.     tmp1_reqpsec=`echo ${out1}|awk \\\'{print $10}\\\'`
  178.     tmp2_reqpsec=`echo ${out2}|awk \\\'{print $10}\\\'`
  179.     reqpsec=`expr $tmp2_reqpsec - $tmp1_reqpsec`

  180.     tmp1_conpsec=`echo ${out1}|awk \\\'{print $9}\\\'`
  181.     tmp2_conpsec=`echo ${out2}|awk \\\'{print $9}\\\'`
  182.     conpsec=`expr $tmp2_conpsec - $tmp1_conpsec`

  183.     reqpcon=`echo \\\"scale=2; $reqpsec / $conpsec\\\" | bc -l`
  184.     if [ \\\"$reqpcon\\\" = \\\".99\\\" ]
  185.     then
  186.         reqpcon=\\\"1.00\\\"
  187.     fi
  188. }

  189. do_output() {
  190.     output=\\\"nginx is running. $reqpsec requests per second, $conpsec connections per second ($reqpcon requests per connection)\\\"
  191. }

  192. do_perfdata() {
  193.     perfdata=\\\"\\\'reqpsec\\\'=$reqpsec \\\'conpsec\\\'=$conpsec \\\'conpreq\\\'=$reqpcon\\\"
  194. }

  195. # Here we
  196. get_wcdiff
  197. val_wcdiff

  198. if [ ${pid_check} = 1 ]
  199. then
  200.     check_pid
  201.     if [ \\\"$retval\\\" = 1 ]
  202.     then
  203.         echo \\\"There\\\'s no pid file for nginx. Is nginx running? Please also make sure whether your pid path and name is correct.\\\"
  204.         exit $ST_CR
  205.     fi
  206. fi

  207. get_status
  208. get_vals
  209. do_output
  210. do_perfdata

  211. if [[ -n \\\"$warning\\\" ]] && [[ -n \\\"$critical\\\" ]]
  212. then
  213.     if [[ \\\"$reqpsec\\\" -ge \\\"$warning\\\" ]] && [[ \\\"$reqpsec\\\" -lt \\\"$critical\\\" ]]
  214.     then
  215.         echo \\\"WARNING - ${output} | ${perfdata}\\\"
  216.     exit $ST_WR
  217.     elif [ \\\"$reqpsec\\\" -ge \\\"$critical\\\" ]
  218.     then
  219.         echo \\\"CRITICAL - ${output} | ${perfdata}\\\"
  220.     exit $ST_CR
  221.     else
  222.         echo \\\"OK - ${output} | ${perfdata} ]\\\"
  223.     exit $ST_OK
  224.     fi
  225. else
  226.     echo \\\"OK - ${output} | ${perfdata}\\\"
  227.     exit $ST_OK
  228. fi

你可能感兴趣的:(Nagios 监控 nginx 服务)