通过Nagios监控Weblogic服务
转自:http://skymax.blog.51cto.com/365901/101603
1. 前言本文主要介绍如何通过Nagios软件来监控Weblogic服务运行状况,其中主要包括Weblogic Server以及Weblogic JDBC Pool的运行状态。Nagios的插件中本身并不提供对于Weblogic服务监控的功能,所以要根据Nagios Plugin API编写自己的脚本,扩展其插件,完成我们所需要的功能。对于Weblogic运行状态信息的获得需通过JMX。本文参考了Nagios3的官方文档中有关Nagios Plugin部分,以及Weblogic官方文档有关JMX和命令行部分,具体的Weblogic版本是8.14。 2. Nagios Plugin API概述作为一个Nagios插件,无论你是用脚本(如shell、perl)还是用c编译后的可执行程序实现,它必须至少完成两件事,1、退出时有一个返回值。2、至少向标准输出设备(STDOUT)输出一行文本。返回值定义:
Plugin Return Code | Service State | Host State |
0 | OK | UP |
1 | WARNING | UP or DOWN/UNREACHABLE* |
2 | CRITICAL | DOWN/UNREACHABLE |
3 | UNKNOWN | DOWN/UNREACHABLE |
输出文本至少要一行,其信息主要反映被监控应用、服务的状态。例如:DISK OK - free space: / 3326 MB (56%); 3. 监控Weblogic的实现方法对于Weblogic运行状况的获得,我们是通过命令行的方式实现的,通过调用Weblogic的weblogic.Admin类实现的。这个类的功能很强大,可以通过它管理和配置Weblogic。以下介绍几个常用的命令写法。1、获得server运行状态
$ java weblogic.Admin -url ${URL} -username ${USER_NAME} -password ${PASS_WORD} get -pretty \ -mbean "${DOMAIN_NAME}:Location=${SERVER_NAME},Name=${SERVER_NAME},Type=ServerRuntime” |
2、获得JDBC Pool运行状态
$ java weblogic.Admin -url ${URL} -username ${USER_NAME} -password ${PASS_WORD} GET -pretty \ -mbean "${DOMAIN_NAME}:Location=${SERVER_NAME},Name=${POOL_NAME},ServerRuntime=${SERVER_NAME},Type=JDBCConnectionPoolRuntime" |
将***标记部分的变量替换成相应真实环境值即可。
${URL} | weblogic的URL,例如t3://192.168.1.2:7002 |
${USER_NAME} | 用户名 |
${PASS_WORD} | 密码 |
${DOMAIN_NAME} | weblogic域的名称,如mydomain |
${SERVER_NAME} | Server名 |
${POOL_NAME} | JDBC Pool名称 |
在运行上述命令前需要设置JAVA_HOME,并且将$JAVA_HOME/bin添加到PATH中,将weblogic的weblogic81/server/lib/weblogic.jar包添加到CLASSPATH中。 4. 具体实现的shell脚本有了监控的方法,根据Nagios Plugin API规则编写自己的shell实现脚本。具体的shell脚本如下:check_wls.sh
#!/bin/ksh #check_wls.sh --jdbcpool url username password domainname servername poolname#check_wls.sh --server url username password domainname servername PROGNAME=`basename $0`PROGPATH=`echo $0 | sed -e 's,[\\/][^\\/][^\\/]*$,,'`REVISION=`echo '$Revision: 1749 $' | sed -e 's/[^0-9.]//g'` . $PROGPATH/utils.sh print_usage() { echo "Usage:" echo " $PROGNAME --jdbcpool url username password domainname servername poolname echo " $PROGNAME --server url username password domainname servername echo " $PROGNAME --help" echo " $PROGNAME --version"} print_help() { print_revision $PROGNAME $REVISION echo "" print_usage echo "" echo "Check Weblogic status" echo "" echo "--jdbcpool url username password domainname servername poolname" echo " Check Weblogic JDBC Pool" echo "--server url username password domainname servername" echo " Check Weblogic Server" } if [[ -z "$JAVA_HOME" ]]then echo "Please set JAVA_HOME!" exit $STATE_UNKNOWNfi if [[ -z "$CLASSPATH" ]]then echo "Please set CLASSPATH!" exit $STATE_UNKNOWNelse echo $CLASSPATH | grep "weblogic.jar" | wc -l | read N if [[ "$N" = "0" ]] then echo "Please add weblogic.jar to CLASSPATH!" exit $STATE_UNKNOWN fifi PATH=$JAVA_HOME/bin:$PATHexport PATH JDBC_TYPE="JDBCConnectionPoolRuntime"SERVER_TYPE="ServerRuntime" cmd="$1" # Information optionscase "$cmd" in--help) print_help exit $STATE_OK ;;-h) print_help exit $STATE_OK ;;--version) print_revision $PROGNAME $REVISION exit $STATE_OK ;;-V) print_revision $PROGNAME $REVISION exit $STATE_OK ;;esac case "$cmd" in--server) URL=${2} USER_NAME=${3} PASS_WORD=${4} DOMAIN_NAME=${5} SERVER_NAME=${6} SERVER_INFO="${DOMAIN_NAME}:${SERVER_NAME}" RE=`java weblogic.Admin -url ${URL} -username ${USER_NAME} -password ${PASS_WORD} get -pretty \ -mbean "${DOMAIN_NAME}:Location=${SERVER_NAME},Name=${SERVER_NAME},Type=${SERVER_TYPE}"` printf "${RE}" | grep ^"-" | wc -l | read N if [[ "$N" -lt "1" ]] then #error printf "${RE}" | awk '{ printf $0 }' | read ERR_INFO echo "CRITICAL - ${ERR_INFO}" exit $STATE_CRITICAL fi if [[ "$N" -ge "1" ]] then HEALTH_STATE="" RUN_STATE="" #HealthState State printf "${RE}" | while read NAME VALUE do #PoolState WaitingForConnectionCurrentCount State #echo "NAME:${NAME} VALUE:${VALUE}" case "${NAME}" in HealthState:) HEALTH_STATE=${VALUE} ;; State:) RUN_STATE=${VALUE} ;; esac done #echo "HEALTH_STATE:${HEALTH_STATE}" #echo "RUN_STATE:${RUN_STATE}" HEALTH_STATE_INFO=${HEALTH_STATE} echo ${HEALTH_STATE_INFO} | awk -F, '{ print $1 }' | awk -F: '{ print $2 }' | read HEALTH_STATE #echo "HEALTH_STATE:${HEALTH_STATE}" #HEALTH_OK HEALTH_WARN HEALTH_CRITICAL HEALTH_FAILED if [[ "${RUN_STATE}" != "RUNNING" ]] then echo "CRITICAL - ${SERVER_INFO} State is ${RUN_STATE}" exit $STATE_CRITICAL fi case "${HEALTH_STATE}" in EALTH_OK) ;; HEALTH_WARN) echo "WARN - ${SERVER_INFO} HealthState is ${HEALTH_STATE_INFO}" exit $STATE_WARNING ;; HEALTH_CRITICAL) echo "CRITICAL - ${SERVER_INFO} HealthState is ${HEALTH_STATE_INFO}" exit $STATE_CRITICAL ;; HEALTH_FAILED) echo "FAILED - ${SERVER_INFO} HealthState is ${HEALTH_STATE_INFO}" exit $STATE_CRITICAL ;; esac fi echo "OK - ${SERVER_INFO} State is ${RUN_STATE},HealthState is ${HEALTH_STATE_INFO}" exit $STATE_OK ;;--jdbcpool) URL=${2} USER_NAME=${3} PASS_WORD=${4} DOMAIN_NAME=${5} SERVER_NAME=${6} POOL_NAME=${7} POOL_INFO="${DOMAIN_NAME}:${SERVER_NAME}:${POOL_NAME}" RE=`java weblogic.Admin -url ${URL} -username ${USER_NAME} -password ${PASS_WORD} GET -pretty \ -mbean "${DOMAIN_NAME}:Location=${SERVER_NAME},Name=${POOL_NAME},ServerRuntime=${SERVER_NAME},Type=${JDBC_TYPE}"` printf "${RE}" | grep ^"-" | wc -l | read N if [[ "$N" -lt "1" ]] then #error printf "${RE}" | awk '{ printf $0 }' | read ERR_INFO echo "CRITICAL - ${ERR_INFO}" exit $STATE_CRITICAL fi if [[ "$N" -ge "1" ]] then POOL_STATE="" WAIT_CNT="" RUN_STATE="" printf "${RE}" | while read NAME VALUE do #PoolState WaitingForConnectionCurrentCount State #echo "NAME:${NAME} VALUE:${VALUE}" case "${NAME}" in PoolState:) POOL_STATE=${VALUE} ;; WaitingForConnectionCurrentCount:) WAIT_CNT=${VALUE} ;; State:) RUN_STATE=${VALUE} ;; esac done #echo "POOL_STATE:${POOL_STATE}" #echo "WAIT_CNT:${WAIT_CNT}" #echo "RUN_STATE:${RUN_STATE}" if [[ "${POOL_STATE}" != "true" ]] then echo "CRITICAL - ${POOL_INFO} PoolState is ${POOL_STATE}" exit $STATE_CRITICAL fi if [[ "${RUN_STATE}" != "Running" ]] then echo "CRITICAL - ${POOL_INFO} State is ${RUN_STATE}" exit $STATE_CRITICAL fi if [[ "${WAIT_CNT}" -gt "0" ]] then echo "WARNING - ${POOL_INFO} WaitingForConnectionCurrentCount is ${WAIT_CNT}" exit $STATE_WARNING fi else #error printf "${RE}" | awk '{ printf $0 }' | read ERR_INFO echo "CRITICAL - ${ERR_INFO}" exit $STATE_CRITICAL fi echo "OK - ${POOL_INFO} State is ${RUN_STATE},PoolState is ${POOL_STATE},WaitingForConnectionCurrentCount is ${WAIT_CNT}" exit $STATE_OK ;;*) print_usage exit $STATE_UNKNOWN ;;esac |
5. 配置Weblogic监控将check_wls.sh上传到Nagios软件的libexec目录下,并创建一个ln文件check_wls。
$ ln -s ./check_wls.sh ./check_wls |
在nrpe的配置文件中增加相关的命令定义。Weblogic的具体配置信息如下,
${URL} | t3://172.17.1.2:7001 |
${USER_NAME} | weblogic |
${PASS_WORD} | weblogic |
${DOMAIN_NAME} | mydomain |
${SERVER_NAME} | myserver |
${POOL_NAME} | mypool |
编辑nrpe.cfg文件,增加如下内容,
$ vi ./nrpe.cfg... .... ... .... ... .... ... .... ... .... ... ....#check weblogic [check_wls]command[check_wls_server_myserver]=/usr/local/nagios//libexec/check_wls --server t3://172.2.10.2:7001 weblogic weblogic mydomain myserver command[check_wls_jdbcpool_mypool]=/usr/local/nagios//libexec/check_wls --jdbcpool t3://172.2.10.2:7001 weblogic weblogic mydomain myserver mypool |
在nrpe的启动脚本中添加环境变量(CLASSPATH、JAVA_HOME)
... .... ... .... ... .... ... .... ... .... ... ....JAVA_HOME=/data/bea/bea/jdk142_05export JAVA_HOMECLASSPATH=/data/bea/bea/weblogic81/server/lib/weblogic.jarexport CLASSPATH... .... ... .... ... .... ... .... ... .... ... .... |
编辑监控主机的nagios.cfg文件,添加如下内容。
$ vi ./nagios.cfg... .... ... .... ... .... ... .... ... .... ... ....# Define a host for the local machine define host{ use linux-box ; Name of host template to use ; This host definition will inherit all variables that are defined ; in (or inherited by) the linux-server host template definition. host_name sol_172.2.10.2 alias sol_172.2.10.2 address 172.2.10.2 } #the check_wls_server_myserver on the remote host.define service{ use generic-service host_name sol_172.2.10.2 service_description Weblogic Server myserver check_command check_nrpe!check_wls_server_myserver } #the check_wls_jdbcpool_mypool on the remote host.define service{ use generic-service host_name sol_172.2.10.2 service_description Weblogic JDBCPool mypool check_command check_nrpe!check_wls_jdbcpool_mypool } |
验证配置是否正确。重启监控主机上的nagios服务以及远程主机上的nrpe服务。通过IE观察监控情况。
![]() 图5.1 |
就此配置工作完成。 6. 结语本文介绍了一种通过Nagios监控Weblogic应用的实现方式,按照Nagios Plugin API规则编写自己的Shell脚本实现该功能,并简单的描述了配置过程,提供了Shell源码。希望大家指正。