nagios 监控java_nagios+logstash实时监控java日志(一)

简介

nagios插件check_logfiles可以监控日志,但是实时性及监控效果都不尽如人意。因此介绍naigos的nsca被动监控结合logstash进行日志的实时监控。此种方式适合日质量比较比较小的情况下,如果日志量比较大,logstash还需要配合redis/kafka等工具进行。

需求

nagios 实时监控java日志,当日志中出现ERROR字段时,进行报警通知。

IP

hostname

组件

备注

192.168.1.1

nagios server

nsca+nagios

nagios服务器

192.168.1.2

nagios client

send_nsca+logstash

java日志

实现

一、nagios server端配置

由于之前nagios server已经配置好,我们继续引用以下监控服务项:

define host{

use linux-server

host_name nagios-client

alias passive-2

address 192.168.1.2

}

define service{

use passive_service

host_name nagios-client

service_description java service

check_command check_dummy!0

notifications_enabled 1

}

二、nagios client端配置

1.配置logstash

input {

log4j {

type => "log4j-java"

port => 4560

}

}

output {

#为方便调试我们可以将logstash设置成console输出到界面或输出到文件

stdout {

# codec => "json"

codec => "rubydebug"

}

# file {

# path => "/logs/out.log"

# }

}

#启动

/usr/local/logstash/bin/logstash agent -f /usr/local/logstash/etc/logstash.conf -l /usr/local/logstash/logs/stdout.log

2.配置java的log4j输出

由于java由多种日志框架,而logstash可以支持log4j,因此我们需要更改我们java框架的日志打印使用log4j

vim log4j.properties

#加上logstash配置

log4j.rootLogger=INFO, stdout, logstash

log4j.appender.stdout=org.apache.log4j.ConsoleAppender

log4j.appender.stdout.layout=org.apache.log4j.PatternLayout

log4j.appender.stdout.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n

#logstash

log4j.appender.logstash=org.apache.log4j.net.SocketAppender

log4j.appender.logstash.Port=4560

log4j.appender.logstash.RemoteHost=192.168.1.2

log4j.appender.logstash.ReconnectionDelay=60000

log4j.appender.logstash.LocationInfo=true

log4j.appender.logstash.Threshold = INFO

#也可自定义日志输出格式

log4j.appender.logstash.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss}[%p] [%t] [%c] [%F:%L]-%m%n

重新启动java程序后,log4j会持续尝试链接你配置的logstash ip:port,建立链接后,即开始发送日志数据。

注意:打印INFO级别的日志会很多,因此log4j传输到logstash速度慢可能会引起java程序所在的服务器io压力大或是java程序处理慢,进而导致java程序异常。建议过滤使用ERROR级别日志,设置如下:

log4j.appender.logstash.Threshold = ERROR

这样stdout打印INFO级别的日志,输出到logstash的是ERROR级别日志。

3.测试

输入访问java程序的命令后,logstash控制台会在屏幕打印日志

{

"message" => "jdbc:mysql://192.168.1.1::3306;characterEncoding=utf8",

"@version" => "1",

"@timestamp" => "2017-03-20T01:24:31.477Z",

"timestamp" => 1489973070924,

"path" => "com.atomikos.jdbc.AtomikosXAConnectionFactory",

"priority" => "WARN",

"logger_name" => "com.atomikos.jdbc.AtomikosXAConnectionFactory",

"thread" => "Atomikos:3",

"class" => "com.atomikos.logging.Slf4jLogger",

"file" => "Slf4jLogger.java:12",

"method" => "logWarning",

"host" => "192.168.1.2:28337",

"type" => "log4j-java"

}

从logstash的output输出的json格式的数据来看,我们可以根据”priority”字段来进行nagios告警,当”priority”=INFO时,正常;当”priority”=ERROR时,报警通知;另外方便我们迅速定位问题,当报警时,我们需要知道”@timestamp”和”thread”来查找具体问题原因,也就是message_format => “%{@timestamp} %{thread}”。因此我们的logstash具体可以这样配置:

input {

log4j {

type => "log4j-java"

port => 4560

}

}

output {

# stdout {

# codec => "json"

# codec => "rubydebug"

# }

# file {

# path => "/logs/out.log"

# }

if [priority] == "ERROR" {

nagios_nsca {

host => "192.168.1.1"

port => "5667"

message_format => "%{@timestamp} %{thread}"

send_nsca_bin => "/usr/local/nagios/bin/send_nsca"

send_nsca_config => "/usr/local/nagios/etc/send_nsca.cfg"

nagios_host => "192.168.1.2"

nagios_service => "java service"

nagios_status => 2

}

}

if [type] == "log4j-jetty" {

nagios_nsca {

host => "192.168.1.1"

port => "5667"

message_format => "OK"

send_nsca_bin => "/usr/local/nagios/bin/send_nsca"

send_nsca_config => "/usr/local/nagios/etc/send_nsca.cfg"

nagios_host => "192.168.1.2"

nagios_service => "java service"

nagios_status => 0

}

}

}

其中,当”priority”=ERROR时,nagios_status => 2,因此checkdummy接受的参数为2,此时send_nsca会将此值传给nagios server的nsca,从而发出报警。

4.排错

以上过程虽然看似顺利,但是在配置过程中也出现了错误。如通过logstash的输出日志/usr/local/logstash/logs/stdout.log,我们可以看到以下报错:

{:timestamp=>"2017-03-17T09:21:00.097000+0800", :message=>"192.168.1.1~CheckDummy~2~ERROR", :error=>#>, :nagios_nsca_command=>"/usr/local/nagios/bin/send_nsca -H 192.168.1.1 -p 5667 -d ~ -c /usr/local/nagios/etc/send_nsca.cfg", :missed_event=>#<:event:0x1ad6acb6>, @cancelled=false......}

其中报错”error=NameError: undefined local variable or method `message’ for #>”,经排查根据https://github.com/logstash-plugins/logstash-output-nagios/issues/3,我对logstash插件进行了以下更改:

vim vendor/bundle/jruby/1.9/gems/logstash-output-nagios_nsca-2.0.2/lib/logstash/outputs/nagios_nsca.rb

114行 send_to_nagios(cmd)

改成 send_to_nagios(cmd, message)

131行 defsend_to_nagios(cmd)

改成 defsend_to_nagios(cmd, message)

然后logstash能够正常通过nagios进行报警。

你可能感兴趣的:(nagios,监控java)