想用这个东西很久了,免得每天都有跑到机房去检查每台服务器的硬件状态指示灯。本文采取直播的方式记录,一边配置一边写,经验证成功的步骤贴在此处,供以后参考。废话不多说,直接上吧。
1、下载check_dell_openmanage插件,地址:
http://exchange.nagios.org/components/com_mtree/attachment.php?link_id=85&cf_id=24
2、将插件cp到以下目录:
/usr/local/nagios/libexec
3、先看看插件的用法:
[root@pcnnagios libexec]# ./check_dell_openmanage.pl -h SNMP Dell OpenManage Monitor for Nagios version 1.3 by Jason Ellison - infotek(at)gmail.com Usage: ./check_dell_openmanage.pl [-v] -H <host> -C <snmp_community> [-2] | (-l login -x passwd) [-P <port>] -T test|de llom|dellom_storage|blade|global|chassis|custom [-t <timeout>] [-V] [-u <unknown_default>] -v, --verbose print extra debugging information -h, --help print this help message -H, --hostname=HOST name or IP address of host to check -C, --community=COMMUNITY NAME community name for the host's SNMP agent (implies v 1 protocol) -2, --v2c use SNMP v2 (instead of SNMP v1) -P, --port=PORT SNMPd port (Default 161) -t, --timeout=INTEGER timeout for SNMP in seconds (Default: 5) -V, --version prints version number -u, --unknown_default=INT If attribute is not found then report the output as this number (i.e. -u 0) -T, --type=test|dellom|dellom_storage|blade|global|chassis|custom This allows to use pre-defined system type Currently support systems types are: test (tries all OID's in verbose mode can be used to generate new system type) dellom (Dell OpenManage general detailed) dellom_storage (Dell OpenManage plus Storage Management detailed) blade (some features are on the chassis not the blade) global (only check the global health status) chassis (only check the system chassis health status) custom (intended for customization)
4、测试一下环境:
./check_dell_openmanage.pl -v 2C -C public -H 192.168.0.1 -T test
报错了,提示:On the nagios server that will be running the plugin you must have the perl "Net::SNMP" module installed.有给出安装方法:
perl -MCPAN -e shell
cpan> install Net::SNMP
此处花了一点下心思,CPAN.pm我的系统中默认没装,经过一番摸索也装起来了,此处仅贴上网上找来的众大神给出的安装步骤,并非我的环境:
参考资料:
http://blog.haohtml.com/archives/12708
http://www.cnblogs.com/mopmoq/archive/2009/04/06/1430210.html
[root@GM ~]#wget http://cpan.communilink.net/authors/id/A/AN/ANDK/CPAN-1.9600.tar.gz [root@GM ~]# tar -zxvf CPAN-1.9600.tar.gz [root@GM ~]#cd CPAN-1.9600 [root@GM CPAN-1.9600]# perl Makefile.PL [root@GM CPAN-1.9600]# make [root@GM CPAN-1.9600]# make install [root@GM CPAN-1.9600]# perl -MCPAN -e shell 此处省略n行,稍微看了下,全选的默认和自动配置 cpan(1)> install Net::SNMP
5、OK,装完再测测看:
[root@pcnnagios libexec]# ./check_dell_openmanage.pl -v 2C -C public -H 10.40.1.131 -T test TEST MODE: Alarm at 5 The Net::SNMP library is available on your server Trying all preconfigured Dell OID's against target... StorageManagementGlobalSystemStatus (.1.3.6.1.4.1.674.10893.1.20.110.13.0) RESULT: 3(ok) chassisManufacturerName (1.3.6.1.4.1.674.10892.1.300.10.1.8.1) RESULT: Dell Inc. chassisModelName (1.3.6.1.4.1.674.10892.1.300.10.1.9.1) RESULT: PowerEdge 2950 chassisServiceTagName (1.3.6.1.4.1.674.10892.1.300.10.1.11.1) RESULT: 53DG52X chassisSystemName (1.3.6.1.4.1.674.10892.1.300.10.1.15.1) RESULT: pcnexconn operatingSystemOperatingSystemName (1.3.6.1.4.1.674.10892.1.400.10.1.6.1) RESULT: Microsoft Windows Server 2003, Enterprise Edition operatingSystemOperatingSystemVersionName (1.3.6.1.4.1.674.10892.1.400.10.1.7.1) RESULT: Version 5.2 (Build 3790 : Service Pack 2) (x86) systemStateACPowerCordStatusCombined (.1.3.6.1.4.1.674.10892.1.200.10.1.36.1) RESULT: NO RESPONSE systemStateACPowerSwitchStatusCombined (.1.3.6.1.4.1.674.10892.1.200.10.1.46.1) RESULT: NO RESPONSE systemStateAmperageStatusCombined (.1.3.6.1.4.1.674.10892.1.200.10.1.15.1) RESULT: NO RESPONSE systemStateBatteryStatusCombined (.1.3.6.1.4.1.674.10892.1.200.10.1.52.1) RESULT: 3(ok) systemStateChassisIntrusionStatusCombined (.1.3.6.1.4.1.674.10892.1.200.10.1.30.1) RESULT: 3(ok) systemStateChassisStatus (.1.3.6.1.4.1.674.10892.1.200.10.1.4.1) RESULT: 3(ok) systemStateCoolingDeviceStatusCombined (.1.3.6.1.4.1.674.10892.1.200.10.1.21.1) RESULT: 3(ok) systemStateCoolingUnitStatusCombined (.1.3.6.1.4.1.674.10892.1.200.10.1.44.1) RESULT: 3(ok) systemStateEventLogStatus (.1.3.6.1.4.1.674.10892.1.200.10.1.41.1) RESULT: 3(ok) systemStateGlobalSystemStatus (.1.3.6.1.4.1.674.10892.1.200.10.1.2.1) RESULT: 3(ok) systemStateMemoryDeviceStatusCombined (.1.3.6.1.4.1.674.10892.1.200.10.1.27.1) RESULT: 3(ok) systemStatePowerSupplyStatusCombined (.1.3.6.1.4.1.674.10892.1.200.10.1.9.1) RESULT: 3(ok) systemStatePowerUnitStatusCombined (.1.3.6.1.4.1.674.10892.1.200.10.1.42.1) RESULT: 3(ok) systemStateProcessorDeviceStatusCombined (.1.3.6.1.4.1.674.10892.1.200.10.1.50.1) RESULT: 3(ok) systemStateTemperatureStatusCombined (.1.3.6.1.4.1.674.10892.1.200.10.1.24.1) RESULT: 3(ok) systemStateVoltageStatusCombined (.1.3.6.1.4.1.674.10892.1.200.10.1.12.1) RESULT: 3(ok) Please email the results to Jason Ellison - [email protected] To add this system to check_dell_openmanage, use something like the following: "pexxxx" => [ 'StorageManagementGlobalSystemStatus', 'systemStateBatteryStatusCombined' 'systemStateChassisIntrusionStatusCombined' 'systemStateChassisStatus' 'systemStateCoolingDeviceStatusCombined' 'systemStateCoolingUnitStatusCombined' 'systemStateEventLogStatus' 'systemStateGlobalSystemStatus' 'systemStateMemoryDeviceStatusCombined' 'systemStatePowerSupplyStatusCombined', 'systemStatePowerUnitStatusCombined', 'systemStateProcessorDeviceStatusCombined', 'systemStateTemperatureStatusCombined', 'systemStateVoltageStatusCombined' ], [root@pcnnagios libexec]#
6、看起来一切OK,来用nagios让这厮发挥功效吧:
#'chech_dell_openmanage' command definition define command{ command_name check_dell_openmanage command_line $USER1$/check_dell_openmanage.pl -v 2C -H $HOSTADDRESS$ -C public -T $ARG1$ }
define service{ use generic-service host_name hostname service_description Check Dell Hardware check_command check_dell_openmanage!dellom_storage }
7、检查下配置:
nagioscheck ;service nagios reload
8、一切正常,看看成果吧;
(硬态 状态) 当前的状态: 正常(OK) 状态信息: Alarm at 5 The Net::SNMP library is available on your server SNMP responses... RESULT: systemStateChassisStatus .1.3.6.1.4.1.674.10892.1.200.10.1.4.1 = 3(ok) RESULT: systemStatePowerSupplyStatusCombined .1.3.6.1.4.1.674.10892.1.200.10.1.9.1 = 3(ok) RESULT: systemStateVoltageStatusCombined .1.3.6.1.4.1.674.10892.1.200.10.1.12.1 = 3(ok) RESULT: systemStateCoolingDeviceStatusCombined .1.3.6.1.4.1.674.10892.1.200.10.1.21.1 = 3(ok) RESULT: systemStateTemperatureStatusCombined .1.3.6.1.4.1.674.10892.1.200.10.1.24.1 = 3(ok) RESULT: systemStateMemoryDeviceStatusCombined .1.3.6.1.4.1.674.10892.1.200.10.1.27.1 = 3(ok) RESULT: systemStateChassisIntrusionStatusCombined .1.3.6.1.4.1.674.10892.1.200.10.1.30.1 = 3(ok) RESULT: systemStateEventLogStatus .1.3.6.1.4.1.674.10892.1.200.10.1.41.1 = 3(ok) RESULT: StorageManagementGlobalSystemStatus .1.3.6.1.4.1.674.10893.1.20.110.13.0 = 3(ok) Dell Status to Nagios Status mapping... systemStateTemperatureStatusCombined: statuscode = OK StorageManagementGlobalSystemStatus: statuscode = OK systemStateEventLogStatus: statuscode = OK systemStateMemoryDeviceStatusCombined: statuscode = OK systemStatePowerSupplyStatusCombined: statuscode = OK systemStateVoltageStatusCombined: statuscode = OK systemStateCoolingDeviceStatusCombined: statuscode = OK systemStateChassisIntrusionStatusCombined: statuscode = OK systemStateChassisStatus: statuscode = OK OK: EXIT CODE: 0 STATUS CODE: OK 性能数据: 当前尝试: 1/3 最近检查时间: 2013-06-22 23:04:23 检测类型: 主动式 检测等待时间/检测时延: 0.770 / 0.195 秒 下次检测计划检测时间: 2013-06-22 23:06:23 最近状态改变时间: 2013-06-22 22:56:22 最后一次送出通知时间: N/A (通知次数 0) 抖动是否执行? 未抖动 抖动值(状态变化率 0.00%) 处于计划宕机时间? 没有 最近更新: 2013-06-22 23:04:51 开启主动检查: 启用 开启被动检查: 启用 Obsessing: 启用 通知: 启用 事件处理: 启用 抖动监测: 启用
收工,洗洗睡咯!!