前言:nagios界面上,看到监控mysql服务报错如下:
Warning:NRPE: Unable to read output
1,去nagios监控服务器上check下
1.1,执行check_nrpe命令远程调用
在监控端nagios服务器上执行check_nrpe检查mysql状态报错如下:
[root@mysqlvm2 ~]# /usr/lib/nagios/plugins/check_nrpe -H192.xx.180.xx -c check_mysql_status
NRPE: Unable to read output
You have new mail in /var/spool/mail/root
[root@mysqlvm2 ~]#
1.2,检查下别的check服务
在nagios服务器端检查别的监控项比如check_users,正常如下:
[root@mysqlvm2 ~]# /usr/lib/nagios/plugins/check_nrpe -H192.xx.180.xx -c check_users
USERS OK - 2 users currently logged in |users=2;8;15;0
[root@mysqlvm2 ~]#
这里证明,nagios流程是正常的,能检测到别的监控项比如check_users,但是check_mysql故障报错,还是需要去mysql服务器上再去分析问题到底出在哪里。
2,在被监控端mysql服务器check
2.1,调用本地的check_nrpe服务,报一样的错误如下:
[root@mysqldb ~]# /usr/lib/nagios/plugins/check_nrpe -Hlocalhost -c check_mysql_status
NRPE: Unable to read output
[root@mysqldb ~]#
去单独执行/etc/nagios/nrpe.cfg里面的check_mysql_status命令。
先用cat找到check_mysql的命令行
[root@mysqldb ~]# cat /etc/nagios/nrpe.cfg |grep check_mysql_status
command[check_mysql_status]=/usr/bin/sudo /usr/lib/nagios/plugins/check_mysql -unagios -P3306 -s /usr/local/mysql/mysql.sock -Hlocalhost --password='nagiosq@0512' -d test
执行,正常显示如下:
[root@ mysqldb ~]# /usr/bin/sudo /usr/lib/nagios/plugins/check_mysql -unagios -P3306 -s /usr/local/mysql/mysql.sock -Hlocalhost --password='nagiosq@0512' -d test
Uptime: 1122870 Threads: 108 Questions: 11559152 Slow queries: 1278 Opens: 3190 Flush tables: 1 Open tables: 395 Queries per second avg: 10.294|Connections=844188c;;; Open_files=49;;; Open_tables=395;;; Qcache_free_memory=209024;;; Qcache_hits=51724c;;; Qcache_inserts=73877c;;; Qcache_lowmem_prunes=5599c;;; Qcache_not_cached=2572345c;;; Qcache_queries_in_cache=1985;;; Queries=11559153c;;; Questions=10724833c;;; Table_locks_waited=0c;;; Threads_connected=107;;; Threads_running=2;;; Uptime=1122870c;;;
[root@ mysqldb ~]#
从这里可以看到check_mysql脚本没有问题,是正常的。
2.2,检查下check_mysql的执行权限
[root@mysqldb ~]# ll /usr/lib/nagios/plugins/check_mysql
-rwxrwxr-x. 1 root root 168272 7月 8 14:54 /usr/lib/nagios/plugins/check_mysql
[root@mysqldb ~]#
看到是最后一个x,表明有执行权限,sudo到nagios账号,看是否能执行,如下所示:
[root@mysqldb ~]# su - nagios
-bash-4.1$ /usr/lib/nagios/plugins/check_mysql -unagios -P3306 -s /usr/local/mysql/mysql.sock -Hlocalhost --password='nagiosq@0512' -d test
Uptime: 1124403 Threads: 106 Questions: 11586454 Slow queries: 1278 Opens: 3190 Flush tables: 1 Open tables: 395 Queries per second avg: 10.304|Connections=846235c;;; Open_files=49;;; Open_tables=395;;; Qcache_free_memory=211696;;; Qcache_hits=51786c;;; Qcache_inserts=73915c;;; Qcache_lowmem_prunes=5732c;;; Qcache_not_cached=2578541c;;; Qcache_queries_in_cache=1890;;; Queries=11586455c;;; Questions=10750088c;;; Table_locks_waited=0c;;; Threads_connected=105;;; Threads_running=2;;; Uptime=1124403c;;;
-bash-4.1$
这里表明既是naigos账号也可以执行check_mysql脚本的,check_mysql脚本路径以及执行权限都没有问题,都是可以的。
2.3,检查sudo的里面的nagios权限配置
Nagios远程调用运行原理是,通过nagios账号来执行所有的check_xxx脚本的,但是我的nrpe客户端是root账号安装的,所以的check_xxx脚本也是root用户所属,nagios在远程调用的时候是否默认执行了su – root,然后在执行check_msyql脚本命令?所以去编辑sudo配置,修改如下,把Defaults requiretty注释掉,然后添加一行nagios ALL=(ALL) NOPASSWD:/usr/lib/nagios/plugins/check_mysql,
vim /etc/sudoers
#表示不需要终端控制
#Defaults requiretty
#表示通过nagios命令执行check_mysql命令不需要密码。
nagios ALL=(ALL) NOPASSWD:/usr/lib/nagios/plugins/check_mysql
修改完,wq!强行保存退出vim编辑,然后执行本次check_npre操作检查,已经恢复正常如下:
[root@mysqldb ~]# /usr/lib/nagios/plugins/check_nrpe -Hlocalhost -c check_mysql_status
Uptime: 1123659 Threads: 110 Questions: 11573270 Slow queries: 1278 Opens: 3190 Flush tables: 1 Open tables: 395 Queries per second avg: 10.299|Connections=845248c;;; Open_files=49;;; Open_tables=395;;; Qcache_free_memory=227704;;; Qcache_hits=51751c;;; Qcache_inserts=73892c;;; Qcache_lowmem_prunes=5656c;;; Qcache_not_cached=2575554c;;; Qcache_queries_in_cache=1943;;; Queries=11573271c;;; Questions=10737891c;;; Table_locks_waited=0c;;; Threads_connected=109;;; Threads_running=2;;; Uptime=1123659c;;;
再去nagios服务器端执行check_nrpe检查,正常如下:
[root@mysqlvm2 ~]# /usr/lib/nagios/plugins/check_nrpe -H192.xx.180.xx -c check_mysql_status
Uptime: 1123673 Threads: 110 Questions: 11573464 Slow queries: 1278 Opens: 3190 Flush tables: 1 Open tables: 395 Queries per second avg: 10.299|Connections=845264c;;; Open_files=49;;; Open_tables=395;;; Qcache_free_memory=227704;;; Qcache_hits=51751c;;; Qcache_inserts=73892c;;; Qcache_lowmem_prunes=5656c;;; Qcache_not_cached=2575596c;;; Qcache_queries_in_cache=1943;;; Queries=11573465c;;; Questions=10738069c;;; Table_locks_waited=0c;;; Threads_connected=109;;; Threads_running=2;;; Uptime=1123673c;;;
[root@mysqlvm2 ~]#
2.4,再去nagios监控界面,查看mysql选项已经恢复正常,如下图所示:
3 附带一些其他原因
引起NRPE: Unable to read output报错的原因有很多种,google了下其它的情况如下:
(1),客户端配置文件/etc/nagios/npre.cfg里面没有添加nagios服务器IP地址,比如 allowed_hosts=127.0.0.1,IP后面IP没有填写或者填写有误。
(2),查客户端nrpe的权限是否可读,可被nagios执行,如果nagios权限不够,需要赋予X可执行权限。
(3),nrpe.cfg里面commands命令路径是否正确,比如有些既有rpm方式安装的也有源码安装的,两者路径不一样,源码安装ngios客户端路径是/usr/local/nagios/libexec/check_mysql,而rpm包安装路径是/usr/lib/nagios/plugins/check_mysql。
(4),客户端配置文件里面有2个一模一样的命令,比如/etc/nagios/nrpe.cfg里面有如下2个check_zombie_procs配置命令:
command[check_zombie_procs]=/usr/lib/nagios/plugins/check_procs -w 10 -c 15 -s Z
command[check_zombie_procs]=/usr/lib/nagios/plugins/check_iostat -w
那么就会报NRPE: Unable to read output的错误,因为两个命令混乱了,不知道去执行哪一个了。
来自: http://blog.itpub.net/blog/post/id/1217246/
参考文章:http://blog.csdn.net/kakane/article/details/9615795