日志系统EFK搭建

参考: dapeng日志的收集处理及查询应用

最近照着 dapeng 的日志管理系统, 想着自己也来搭建一个EFK试试. 在这里记录一下自己的踩坑经历, 也非常感谢 Ever_00、洋洋_3720 等大佬们的支持和帮助

技术选型

回归正题, 我们选取的依旧是fluent-bit+fluentd+kafka+elasticsearch作为日志系统的方案, 其中dapeng服务已经集成了单结点fluent-bit收集各个docker容器中的日志文件发往fluentd, fluentd做为中转收集所有的日志发往kafak用于削峰填谷，削峰后的数据再经由fluentd发送给elasticsearch进行存储. 此次搭建过程中没有使用Kibana, 而是使用elasticsearch-head来做日志界面展示.

对于非dapeng服务需要自己修改Dockerfile, 将修改过得fluent-bit打包到容器服务启动时运行 sh /opt/fluent-bit/fluent-bit.sh即可

fluent-bit的日志收集配置

fluent-bit-dapeng.conf

[SERVICE]
    Flush        5
    Daemon       On
    Log_Level    error
    Log_File     /fluent-bit/log/fluent-bit.log
    Parsers_File parse_dapeng.conf

[INPUT]
    Name tail
    Path /dapeng-container/logs/*.log
    Exclude_Path  /dapeng-container/logs/fluent*.log,/dapeng-container/logs/console.log,/dapeng-container/logs/gc*.log
    Tag  dapeng
    Multiline  on
    Buffer_Chunk_Size 2m
    buffer_max_size  30m
    Mem_Buf_Limit  32m
    DB.Sync  Normal
    db_count 400
    Parser_Firstline dapeng_multiline
    db  /fluent-bit/db/logs.db

[FILTER]
    Name record_modifier
    Match *
    Record hostname ${soa_container_ip}
    Record tag ${serviceName}

[OUTPUT]
    Name  Forward
    Match *
    Host  fluentd
    Port  24224
    HostStandby fluentdStandby
    PortStandby 24224

在dapeng服务中, 对于每个服务 serviceName, soa_container_ip, fluentd, fluentdStandby 的配置必不可少, 其中的Path, Exclude_Path用来指定哪些日志需要收集, 哪些需要过滤, 可以通过环境变量来修改:

fluentBitLogPath=/dapeng-container/logs/*.log
fluentBitLogPathExclude=/dapeng-container/logs/fluent*.log,/dapeng-container/logs/console.log,/dapeng-container/logs/gc*.log

同时需要将上面的 fluent-bit-dapeng.conf 挂载到 /opt/fluent-bit/etc/fluent-bit.conf

    environment:
       - serviceName=payment
       - container_ip=${host_ip}
       - soa_container_port=${payment_port}
       - soa_container_ip=${host_ip}
       - host_ip=${host_ip}
       - soa_service_timeout=60000
       - JAVA_OPTS=-Dname=payment -Dfile.encoding=UTF-8 -Dsun.jun.encoding=UTF-8  -Dio.netty.leakDetectionLevel=advanced
       - kafka_consumer_host=${kafka_host_ip}:9092
       - kafka_producer_host=${kafka_host_ip}:9092
     env_file:
       - .envs/application.env
       - .envs/common.env
     volumes:
       - "/data/logs/payment:/dapeng-container/logs"
       - "/data/var/fluent/order/:/fluent-bit/db/"
       - "./config/fluent-bit-dapeng.conf:/opt/fluent-bit/etc/fluent-bit.conf"
       - "/data/var/shm:/data/shm"
     ports:
       - "${payment_port}:${payment_port}"
     extra_hosts:
       - "fluentd:${fluentd_host}"
       - "fluentdStandby:${fluentdStandby_host}"
       - "db-master:${mysql_host_ip}"
       - "soa_zookeeper:${zookeeper_host_ip}"
       - "redis_host:${redis_host_ip}"

同时在dapeng服务容器中可以看到parse_dapeng.conf如下

[PARSER]
    Name        dapeng_multiline
    Format      regex
    Regex       (?\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2} \d{1,3}) (?.*) (?.*) \[(?.*)\] - (?.*)

其中的Regex就是对日志进行正则匹配解析出我们需要的信息, 例如其中的logtime, message等
我们也可以通过环境变量来设置解析表达式

fluentbitParserRegex=(?^\d{2}-\d{2} \d{2}:\d{2}:\d{2} \d{3}) (?[^ ]+|Check idle connection Thread) (?[^ ]+) \[(?\w*)\] - (?.*)

注意: 虽然dapeng集成了fluent-bit, 但是默认是不开启的, 需要修改环境变量:
fluent_bit_enable=true

fluentd的镜像

首先是准备fluentd的镜像, 以下是fluentd的Dockerfile

FROM fluent/fluentd:v1.2
#增加es插件, kafka插件
RUN  fluent-gem install fluent-plugin-elasticsearch
RUN  fluent-gem install fluent-plugin-kafka
CMD exec fluentd -c /fluentd/etc/${FLUENTD_CONF} -p /fluentd/plugins $FLUENTD_OPT

打包image (注意在Dockerfile所在目录, . 即代表Dockerfile的上下文)
docker build docker.****.com:80/basic/fluentd:v1.2 .
push到docker私服
docker push docker.****.com:80/basic/fluentd:v1.2
dc-all.yml文件中配置fluentd (dapeng 的 source-compose 封装了 docker-compose)

   fluentd:
    container_name: fluentd
    image: docker.****.com:80/basic/fluentd:v1.2
    restart: on-failure:3
    volumes:
      - /data/var/fluentd/log:/fluentd/log
      - /data/var/fluentd/etc:/fluentd/etc
    environment:
      - LANG=zh_CN.UTF-8
      - TZ=CST-8
    ports:
      - "24224:24224"
    labels:
      - project.source=
      - project.extra=public-image
      - project.depends=
      - project.owner=

对于fluentd的相关配置配置在/data/var/fluentd/etc下
fluent.conf 配置fluentd的转发器
理论上需要开启两个fluentd, 分别做下面1和2的工作, 这里我们先合并到一个服务当中

# 1. 收集日志发送到kafka, topic为efk  
# 开启8个工作线程, 端口从24225往后累加

        log_level error
        flush_thread_count 8
        workers 8


  @type  forward
  port  24224


  @type monitor_agent
  port 24225



  @type kafka_buffered
  brokers kafak服务器地址:9092
  topic_key efk
  buffer_type file
  buffer_path /tmp/buffer
  flush_interval 5s
  default_topic efk
  output_data_type json
  compression_codec gzip
  max_send_retries 3
  required_acks -1
  discard_kafka_delivery_failed true

# 1. 收集日志发送到kafka, topic为efk  结尾

# 2. 消费kafka中的日志消息发送到elasticsearch, topic为efk, group为efk-consumer
#
#        log_level error
#        flush_thread_count 2
#        workers 2
#
#
#  @type monitor_agent
#  port 24225
#

  @type kafka_group
  brokers kafka服务器地址:9092
  consumer_group efk-consumer
  topics efk
  format json
  start_from_beginning false
  max_wait_time 5
  max_bytes 1500000



    @type elasticsearch
    hosts elasticsearch服务器地址:9200
    index_name dapeng_log_index
    type_name  dapeng_log
    #content_type application/x-ndjson
    buffer_type file
    buffer_path /tmp/buffer_file
    buffer_chunk_limit 10m
    buffer_queue_limit 512
    flush_mode interval
    flush_interval 5s
    request_timeout 5s
    flush_thread_count 2
    reload_on_failure true
    resurrect_after 30s
    reconnect_on_error true
    with_transporter_log true
    logstash_format true
    logstash_prefix dapeng_log_index
    template_name dapeng_log_index
    template_file  /fluentd/etc/template.json
    num_threads 2
    utc_index  false

# 2. 消费kafka中的日志消息发送到elasticsearch  结尾

template.json配置elasticsearch关于索引创建的模板

{
  "template": "dapeng_log_index-*",
  "mappings": {
    "dapeng_log": {
      "properties": {
        "logtime": {
          "type": "date",
          "format": "MM-dd HH:mm:ss SSS"
        },
        "threadPool": {
          "type": "keyword",
          "norms": false,
          "index_options": "docs"
        },
        "level": {
          "type": "keyword",
          "norms": false,
          "index_options": "docs"
        },
        "tag": {
          "type": "keyword",
          "norms": false,
          "index_options": "docs"
        },
        "message": {
          "type": "keyword",
          "ignore_above": 2048,
          "norms": false,
          "index_options": "docs"
        },
        "hostname": {
          "type": "keyword",
          "norms": false,
          "index_options": "docs"
        },
        "sessionTid": {
          "type": "keyword",
          "norms": false,
          "index_options": "docs"
        },
        "log": {
          "type": "keyword",
          "norms": false,
          "index_options": "docs"
        }
      }
    }
  },
  "settings": {
    "index": {
      "max_result_window": "100000000",
      "number_of_shards": "3",
      "number_of_replicas": "1",
      "codec": "best_compression",
      "translog": {
        "sync_interval": "60s",
        "durability": "async",
        "flush_threshold_size": "1024mb"
      },
      "merge":{
        "policy":{
          "max_merged_segment": "2gb"
        }
      },
      "refresh_interval": "10s"
    }
  },
  "warmers": {}
}

elasticsearch的镜像准备

dc-all.yml文件中关于elasticsearch的配置

  elasticsearch:
    image: elasticsearch:6.7.1
    container_name: elasticsearch
    restart: on-failure:3
    environment:
      - LANG=zh_CN.UTF-8
      - TZ=CST-8
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    volumes:
      - /data/var/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml
    ports:
      - "9200:9200"
      - "9300:9300"
    labels:
      - project.source=
      - project.extra=public-image
      - project.depends=
      - project.owner=

elasticsearch.yml 配置启用cors跨域访问, 就可以通过elasticsearch-head来访问elasticsearch了

cluster.name: "docker-cluster"
network.host: 0.0.0.0

http.cors.enabled: true
http.cors.allow-origin: "*"
http.cors.allow-methods: OPTIONS, HEAD, GET, POST, PUT, DELETE
http.cors.allow-headers: "X-Requested-With, Content-Type, Content-Length, X-User"

elasticsearch启动报错:

max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

即: elasticsearch用户拥有的内存权限太小，至少需要262144

sudo vi /etc/sysctl.conf 文件最后添加一行: vm.max_map_count=262144 然后 sudo sysctl -p 重新加载配置, 再重启elasticsearch即可

elasticsearch-head的镜像准备

首先clone elasticsearch-head 项目到/data/workspace目录下
git clone git://github.com/mobz/elasticsearch-head.git
dc-all.yml文件中配置elasticsearch-head

  elasticsearch-head:
    image: mobz/elasticsearch-head:5
    container_name: elasticsearch-head
    restart: on-failure:3
    environment:
      - LANG=zh_CN.UTF-8
      - TZ=CST-8
    volumes:
      - /data/workspace/elasticsearch-head/Gruntfile.js:/usr/src/app/Gruntfile.js
      - /data/workspace/elasticsearch-head/_site/app.js:/usr/src/app/_site/app.js
    ports:
      - "9100:9100"
    labels:
      - project.source=
      - project.extra=public-image
      - project.depends=
      - project.owner=

对于 Gruntfile.js 需要改动97行如下:

connect: {
        server: {
                options: {
                        hostname: '0.0.0.0',
                        port: 9100,
                        base: '.',
                        keepalive: true
                }
        }
}

对于 app.js 需要改动4379行: 修改localhost为elasticsearch集群地址

/** 修改localhost为elasticsearch集群地址，Docker部署中，一般是elasticsearch宿主机地址 */
this.base_uri = this.config.base_uri || this.prefs.get("app-base_uri") || "http://elasticsearch服务器地址:9200/";

服务启动

将以上的服务全部启动之后, 访问 http://elasticsearch-head服务器地址:9100/, 可以看到如下界面(集群健康值为黄色, 是因为我没有做备份)

当然, 刚开始因为没有创建索引, 所以是看不到日志的, 我们可以加一个定时任务每天自动创建索引并处理掉之前的索引:
autoIndex4DapengLog.sh: 定期保存7天的索引, 打开最近三天的索引, 创建第二天的索引

#!/bin/bash
#
# 索引关闭及删除

# @date 2018年05月10日18:00:00
# @description Copyright (c) 2015, github.com/dapeng-soa All Rights Reserved.


date=`date -d "2 days ago" +%Y.%m.%d`
date1=`date -d "6 days ago" +%Y.%m.%d`
echo $date
echo $date1
#关闭索引
curl -H "Content-Type: application/json" -XPOST http://elasticsearch服务器地址:9200/dapeng_log_index-$date/_close
#删除索引
curl -H "Content-Type: application/json" -XDELETE "http://elasticsearch服务器地址:9200/dapeng_log_index-$date1"
#添加索引
tomorrow=`date -d tomorrow +%Y.%m.%d`
# 需要创建索引的elasticsearch服务器列表
ipList=(elasticsearch服务器地址:9200)
for i in ${ipList[@]};do
curl -H "Content-Type: application/json" -XPUT http://$i/dapeng_log_index-$tomorrow -d'
{
  "mappings": {
    "_default_": {
            "_all": {
                "enabled": "false"
            }
        },
    "dapeng_log": {
      "properties": {
        "logtime": {
          "type": "date",
          "format": "MM-dd HH:mm:ss SSS"
        },
        "threadPool": {
          "type": "keyword",
          "norms": false,
          "index_options": "docs"
        },
        "level": {
          "type": "keyword",
          "norms": false,
          "index_options": "docs"
        },
        "tag": {
          "type": "keyword",
          "norms": false,
          "index_options": "docs"
        },
        "message": {
          "type": "keyword",
          "ignore_above": 2048,
          "norms": false,
          "index_options": "docs"
        },
        "hostname": {
          "type": "keyword",
          "norms": false,
          "index_options": "docs"
        },
        "sessionTid": {
          "type": "keyword",
          "norms": false,
          "index_options": "docs"
        },
        "log": {
          "type": "keyword",
          "norms": false,
          "index_options": "docs"
        }
      }
    }
  },
  "settings": {
    "index": {
      "max_result_window": "100000000",
      "number_of_shards": "3",
      "number_of_replicas": "1",
      "codec": "best_compression",
      "translog": {
        "sync_interval": "60s",
        "durability": "async",
        "flush_threshold_size": "1024mb"
      },
      "merge":{
        "policy":{
          "max_merged_segment": "2gb"
        }
      },
      "refresh_interval": "10s"

    }
  },
  "warmers": {}
}'
response=`curl -H "Content-Type: application/json" -s "http://$i/_cat/indices?v" |grep open | grep dapeng_log_index-$tomorrow |wc -l`

echo -e "\n"

if [ "$response" == 1 ];then
    break
else
    continue
fi
done;

crontab -e 将此命令加入定时任务, 每天23:00定时执行, 创建第二天的索引:

0 23 * * *    (cd /data/workspace/elasticsearch-head/; sh autoIndex4DapengLog.sh) > /data/workspace/elasticsearch-head/autoIndex4DapengLog.log

现在就可以在查看日志数据了

如果想去除elasticsearch自带的一些字段信息(例如_index, _id, _score等)显示在表格中, 需要修改elasticsearch-head/_site/app.js, 改动2038行如下:

_data_handler: function(store) {
        // 去除结果集中无用字段
        var customFields = ["logtime", "hostname", "tag", "sessionTid", "threadPool", "level", "message", "log"];
        store.columns = customFields;
        //store.columns = store.columns.filter(i => customFields.indexOf(i) > -1);
        this.tools.text(store.summary);
        this.headers.empty().append(this._header_template(store.columns));
        this.body.empty().append(this._body_template(store.data, store.columns));
        this._reflow();
},

注意 customFields 中的字段和创建索引时的字段一致, 且其中部分字段是由fluent-bit解析得到的

todo

当然这边图中也可以看到一些字段为空的数据, 将其中的log字段的值拷贝到 https://regex101.com/ 网站进行解析, 发现和之前的正则解析 Regex 不匹配, 所以部分字段没有解析到值, 无法解析的部分内容就在log中, 后续需要将这些内容过滤掉
基于现有的日志系统，开发生产故障实时告警系统