ELFK + Kafka + Zabbix日志监控问题

其实之前部署过该架构,只不过是单机部署的,部署过程没遇到什么问题,具体部署过程参考装这里:https://blog.csdn.net/miss1181248983/article/details/93738593

这次重新部署该架构,同时监控内外网机器的日志(文中ip及域名已做修饰)。在同时监控内外网机器上的日志时,我们加入了kafka,而与filebeat结合时出现了问题,内网机器的日志可以正常收集到,但外网filebeat与内网kafka传输数据之间,通过抓包发现,仍夹杂了kafka的内网地址。这样就导致外网机器的日志无法收集到。

假如内网没有固定的外网ip,但有固定的外网域名lzx.linux.cn

  • 修改kafka机器配置:
# vim /usr/local/kafka/config/server.properties

listeners = PLAINTEXT://192.168.1.254:9092

advertised.listeners=PLAINTEXT://lzx.linux.cn:9092

advertised.listeners是暴露给外部的listeners,如果没有设置,会用listeners;如果同时存在,advertised.listeners起作用。

kafka需要声明内网的外部地址或域名,这样外网的filebeat与内网的kafka之间的通信才没有问题。

当内外网机器同时使用kakfa时,需要给kafka设置advertised.listeners,否则kafka与外网机器的通信存在问题。


整体架构:

logstash/kafka      192.168.1.254

filebeat            192.168.1.253

filebeat            192.168.1.241

filebeat            192.168.1.230

zabbix server       192.168.1.252

在部署ELK+Zabbix时,我们可以指定Zabbix告警的主机,很轻易地让错误日志触发告警,这是因为我们是在每台需要收集日志的机器上都部署了logstash,只要在logstash中配置的[zabbix_host]和zabbix server端配置的主机名(hostname或ip)是一致的。

而在部署ELFK+Kafka+Zabbix时,我们只需要在一台机器上部署logstash+kafka,其余多台需要收集日志的机器上部署filebeat,然后让filebeat输出数据到kafka即可。这就导致了Zabbix一直无法触发告警,即使在logstash中我们配置的[zabbix_host]和zabbix server端配置的主机名是一致的。

如果logstash(kafka)和filebeat都部署在同一机器,也不会报错;但如果logstash(kafka)和filebeat不在同一主机上则会报错。

当filebeat中不配置name和logstash中[zabbix_host]配置为固定ip(192.168.1.253)时,

# echo '11:47:16:08 TRSID[main] ERROR - eee该消息通过kafka到达es!!!eee' >> webservice.log`

故意产生错误日志,logstash日志报错:

[2019-07-24T10:47:34,898][WARN ][logstash.outputs.zabbix  ] Zabbix server at 192.168.1.252 rejected all items sent. {:zabbix_host=>"192.168.1.253"}

当filebeat中配置name和logstash中[zabbix_host]配置为%{[host][name]}(kafka)时,故意产生错误日志,logstash日志报错:

[2019-07-24T11:12:09,353][WARN ][logstash.outputs.zabbix  ] Zabbix server at 192.168.1.252 rejected all items sent. {:zabbix_host=>"kafka"}

查看zabbix server端的日志:

16536:20190724:114734.686 cannot process trapper item "webservice_error": connection from "192.168.1.254" rejected, allowed hosts: "192.168.1.230,192.168.1.241,192.168.1.253"

可以看到,在ELFK+Kafka+Zabbix架构中,因为是一个logstash(192.168.1.254)收集多个filebeat的数据,所以告警的zabbix_host变成了192.168.1.254,而不是192.168.1.253,所以不管配置的是hostname还是ip,都无法触发告警。

为了触发告警,我们在配置监控项时,除了添加192.168.1.230,192.168.1.241,192.168.1.253允许的主机外,还需要在对应监控项上添加允许的主机——192.168.1.254,否则会无法触发告警。

如果是为了方便,可以将允许的主机留空,这样就是允许所有主机,这样也是可以的。只不过我在监控项的配置过程中习惯性地添加了允许的主机,所以导致自己无法正常触发告警。


logstash(kafka)部署机器,即192.168.1.254上安装zabbix-agent:

# rpm -ivh https://mirrors.tuna.tsinghua.edu.cn/zabbix/zabbix/3.4/rhel/7/x86_64/zabbix-agent-3.4.15-1.el7.x86_64.rpm

Zabbix server端配置:

模板名:ELK logstash alert

应用集:Log

监控项:

名称    webservice log monitor

类型    Zabbix采集器

键值    webservice_error

信息类型    文本

允许的主机  192.168.1.230,192.168.1.241,192.168.1.253,192.168.1.254

触发器:

名称    webservice log error alert

问题表达式  {ELK logstash alert:webservice_error.nodata(60)}=0

恢复表达式  {ELK logstash alert:webservice_error.nodata(60)}<>0

附上filebeat及logstash配置文件示例。

filebeat配置文件:

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /software/webservice.log
  fields:
    logtopic: 253-webservice

- type: log
  enabled: true
  paths:
    - /software/ciphermachine.log
  fields:
    logtopic: 253-ciphermachine

processors:
- drop_fields:
  fields: ["beat", "input", "source", "offset", "prospector"]
  
#name: 192.168.1.253                #如果配置name,则logstash中 [zabbix_host] 配置为 %{[host][name]}

#output.elasticsearch:              #需要注释其余的output,否则报错
  # Array of hosts to connect to.
#  hosts: ["localhost:9200"]

output.kafka:
  enabled: true
  hosts: ["192.168.1.254:9092"]
  topic: '%{[fields.logtopic]}'
  partition.round_robin:
    reachable_only: true
  worker: 2
  required_acks: 1
  compression: gzip
  max_message_bytes: 10000000

logstash配置文件:

input {
    kafka {
        bootstrap_servers => "192.168.1.254:9092"
        group_id => "253webservice"
        client_id => "253webservice-1"
        topics => "253-webservice"
        auto_offset_reset => "latest"
        codec => "json"
        consumer_threads => 5
        decorate_events => false
        type => "253_webservice"
    }
}

filter {
    if [type] == "253_webservice" {
        ruby {
            code => "event.set('log_time',event.get('@timestamp').time.localtime  + 8*60*60)"
        }

        grok {
            match => [ "log_time","(?^\d{4}-\d{1,2}-\d{1,2})"  ]
        }

        multiline {
            pattern => "^\d{1,2}:\d{1,2}:\d{1,2}"
            negate => true
            what => "previous"
        }

        grok {
            match => [ "message", "%{TIME:thistime} %{NOTSPACE:thread-id}\[%{DATA:namer}\] %{LOGLEVEL:level}" ]
        }

        mutate {
            add_field => [ "[zabbix_key]", "webservice_error" ]
            add_field => [ "[zabbix_host]", "192.168.1.253" ]               #如果filebeat配置了name,则此处配置为 %{[host][name]}
        }

        ruby {
            code => "event.set('logtime',event.get('thisdate') + ' ' + event.get('thistime') )"
        }

        date {
            match => [ "logtime","yyyy-MM-dd HH:mm:ss,SSS",'ISO8601' ]
            target => "@timestamp"
        }

        ruby {
            code => "event.set('logtime',event.get('@timestamp').time.localtime  + 8*60*60)"
        }

        mutate {
            remove_field => "@version"
            remove_field => "host"
            remove_field => "path"
            remove_field => "_type"
            remove_field => "_score"
            remove_field => "_id"
            remove_field => "thread-id"
            remove_field => "log_time"
            remove_field => "thisdate"
            remove_field => "thistime"
            remove_field => "score"
            remove_field => "id"
            remove_field => "namer"
            remove_field => "beat"
            remove_field => "offset"
            remove_field => "prospector.type"
            remove_field => "source"
            remove_field => "fields.logtopic"
            remove_field => "input.type"
            remove_field => "log.file.path"
        }
    }
}

output {
    if [type] == "253_webservice" {
        elasticsearch {
            hosts => ["192.168.1.254:9200"]
            user => "elastic"
            password => "linux~2019"
            index => "253_webservice"
        }

        if [level]  =~ /(ERR|error|ERROR|Failed)/ {
            zabbix {
                zabbix_host => "[zabbix_host]"
                zabbix_key => "[zabbix_key]"
                zabbix_server_host => "192.168.1.252"
                zabbix_server_port => "10051"
                zabbix_value => "message"
            }
        }
    }
}

你可能感兴趣的:(ELFK,运维知识,Kafka,Zabbix)