30分钟搭建Fluent-bit采集日志,并完成Kafka和Spark日志消息报警

第一次接触Kafka和Spark,也没有前期去铺垫什么理论,直接上手,踩了一些坑,但现在再来过程会很快很简单。

先声明下,我的搭建环境是Ubuntu

1. 安装Kafka,5分钟

还是老规矩,参考官网,并按照官网说的去启动kafka的服务。这个过程只需要5分钟。
https://kafka.apache.org/quickstart

2. 安装Spark,5分钟

官网下载https://spark.apache.org/downloads.html最新的包

wget http://mirrors.tuna.tsinghua.edu.cn/apache/spark/spark-2.4.3/spark-2.4.3-bin-hadoop2.7.tgz
tar xvf spark-2.4.3/spark-2.4.3-bin-hadoop2.7.tgz

启动spark

./spark_shell

3. 安装 Fluent-bit,并配置日志同时采集到Kafka和ES,5分钟

依然参照官网,这里也只需要3分钟左右:https://fluentbit.io/documentation/0.13/installation/ubuntu.html
安装完毕后,到/etc/td-agent-bit下修改td-agent-bit.conf文件,最开始,这里我的input是从fluent-bit采集到的日志转发到kafka和elasticsearch

[INPUT]
    Name forward
    Listen 0.0.0.0
    Port 24224

[OUTPUT]
    Name  kafka
    Match *
    Brokers localhost:9092
    Topics messages

[OUTPUT]
    Name es
    Match *
    Host localhost
    Port 9200
    Index fluentbit-gw
    Type docker

后来测试发现,这种方式的日志会被不断的从头到尾的重复采集,不符合我的期望,我期望的是只得到最新的日志信息,而不是从头再拿一次日志,于是将[INPUT]这里改成“tail”插件来完成日志的采集工作。这里的Path是Ubuntu系统,docker容器日志的输出路径。

[INPUT]
    Name tail
    Path /var/lib/docker/containers/*/*-json.log

修改完毕以后需要重启服务

sudo service td-agent-bit restart

以及查看服务状态

sudo service td-agent-bit status

4. 示例: 启动一个docker服务,它的日志会被Fluent-bit采集

这里我启动了Nginx的docker服务来验证。这里给容器打一个tag标签,方便后面spark过滤使用。

docker run --name nginx --log-opt tag="nginx-service" -p 8081:80 -d nginx

这样日志信息就会被送往到kafka和es了,通过kibana查看数据

docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" --name elasticsearch docker.elastic.co/elasticsearch/elasticsearch:6.7.1

docker run --link elasticsearch:elasticsearch -p 5601:5601 -e "elasticsearch.hosts=http://elasticsearch:9200" --name kibana docker.elastic.co/kibana/kibana:6.7.1

30分钟搭建Fluent-bit采集日志,并完成Kafka和Spark日志消息报警_第1张图片
Kibana可以看到数据已经到了ES

5. Java完成Spark消费和监控Kafka的日志数据

这里有一个国外大神写的demo,我做了自己需求的修改。
https://github.com/eugenp/tutorials/tree/master/apache-spark
直接拉下code,首先在pom.xml里面,增加后面需要用到的fastjson的库。


            com.alibaba
            fastjson
            1.2.58
        

然后,增加自己的类LogMonitor, 来监控(打印出)错误的日志信息,并外接钉钉来触发错误日志报警。

package com.baeldung.data.pipeline;

import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONObject;
import org.apache.http.HttpResponse;
import org.apache.http.HttpStatus;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.entity.StringEntity;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.common.serialization.StringDeserializer;
import org.apache.log4j.Level;
import org.apache.log4j.Logger;
import org.apache.spark.SparkConf;
import org.apache.spark.streaming.Durations;
import org.apache.spark.streaming.api.java.JavaDStream;
import org.apache.spark.streaming.api.java.JavaInputDStream;
import org.apache.spark.streaming.api.java.JavaPairDStream;
import org.apache.spark.streaming.api.java.JavaStreamingContext;
import org.apache.spark.streaming.kafka010.ConsumerStrategies;
import org.apache.spark.streaming.kafka010.KafkaUtils;
import org.apache.spark.streaming.kafka010.LocationStrategies;
import scala.Tuple2;

import java.io.IOException;
import java.util.*;

/**
 * @Author You Jia
 * @Date 6/4/2019 4:53 PM
 */
public class LogMonitor {
    public static void main(String[] args) throws InterruptedException {
        Logger.getLogger("org")
                .setLevel(Level.OFF);
        Logger.getLogger("akka")
                .setLevel(Level.OFF);

        Map kafkaParams = new HashMap<>();
        kafkaParams.put("bootstrap.servers", "139.24.217.54:9092");
        kafkaParams.put("key.deserializer", StringDeserializer.class);
        kafkaParams.put("value.deserializer", StringDeserializer.class);
        kafkaParams.put("group.id", "use_a_separate_group_id_for_each_stream");
//        kafkaParams.put("auto.offset.reset", "latest");  
//        kafkaParams.put("enable.auto.commit", false); //这里是个坑,如果设置为false,那么消息一直还在kafka里, so注释掉这里。

        Collection topics = Arrays.asList("messages");

        SparkConf sparkConf = new SparkConf();
        sparkConf.setMaster("local[2]");
        sparkConf.setAppName("LogMonitor");

        JavaStreamingContext streamingContext = new JavaStreamingContext(sparkConf, Durations.seconds(10));

        JavaInputDStream> messages = KafkaUtils.createDirectStream(streamingContext, LocationStrategies.PreferConsistent(), ConsumerStrategies.Subscribe(topics, kafkaParams));

        JavaDStream lines = messages.map(ConsumerRecord::value);
        lines.count().print();
        JavaDStream errorLines = lines.filter(x->x.toLowerCase().contains("nginx-service") && x.toLowerCase().contains("[error]"));
        errorLines.print();

        errorLines.foreachRDD(javaRdd ->{
            long count = javaRdd.count();
            System.out.println("number is " + count);
       //如果10秒里有3个错误信息,就把报警信息发给钉钉
            if(count >= 3){
                List alertLines = Arrays.asList(javaRdd.collect().toArray());
                alertLines.forEach(alertLine -> dingtalk(alertLine.toString()));
            }
        }
        );
        streamingContext.start();
        streamingContext.awaitTermination();
    }
//告警信息发往钉钉。
    public static void dingtalk(String alertLines){
  
        //jiangbiao dingtalk
        String WEBHOOK_TOKEN = "https://oapi.dingtalk.com/robot/send?access_token=xxxxx";
        HttpClient httpclient = HttpClients.createDefault();

        String textMsg = "{ \"msgtype\": \"text\", \"text\": {\"content\": \"this is msg\"}}";
        JSONObject testMsgJson = JSON.parseObject(textMsg);
        testMsgJson.getJSONObject("text").put("content","Alert!!! " + "\r\n" + alertLines);
        textMsg = testMsgJson.toString();

        HttpPost httppost = new HttpPost(WEBHOOK_TOKEN);
        httppost.addHeader("Content-Type", "application/json; charset=utf-8");


        StringEntity se = new StringEntity(textMsg, "utf-8");
        httppost.setEntity(se);

        HttpResponse response = null;
        try {
            response = httpclient.execute(httppost);
            if (response.getStatusLine().getStatusCode()== HttpStatus.SC_OK){
                String result= EntityUtils.toString(response.getEntity(), "utf-8");
                System.out.println(result);
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

生成jar包,然后在spark的安装目录(spark-2.4.3-bin-hadoop2.7/bin)下执行

./spark-submit --class com.baeldung.data.pipeline.LogMonitor --master local[2] /home/ubuntu/apache-spark-1.0-SNAPSHOT-jar-with-dependencies.jar

然后你会看到消息为nginx-service且是[error]的日志会被捕获并打印。示例是Error信息大于3条,于是触发报警到钉钉:

ubuntu@ubuntu:~/spark-2.4.3-bin-hadoop2.7/bin$ ./spark-submit --class com.baeldung.data.pipeline.LogMonitor --master local[2] /home/ubuntu/apache-spark-1.0-SNAPSHOT-jar-with-dependencies.jar
19/06/05 02:15:12 WARN Utils: Your hostname, ubuntu resolves to a loopback address: 127.0.1.1; using 139.24.217.54 instead (on interface enp0s17)
19/06/05 02:15:12 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
19/06/05 02:15:13 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
-------------------------------------------
Time: 1560149520000 ms
-------------------------------------------

{"@timestamp":1560149498.468915, "log":"{\"log\":\"2019/06/10 06:51:38 [error] 6#6: *17 open() \\\"/usr/share/nginx/html/jiajia1\\\" failed (2: No such file or directory), client: 172.17.0.1, server: localhost, request: \\\"GET /jiajia1 HTTP/1.1\\\", host: \\\"localhost:8081\\\"\\n\",\"stream\":\"stderr\",\"attrs\":{\"tag\":\"nginx-service\"},\"time\":\"2019-06-10T06:51:38.468823865Z\"}"}
{"@timestamp":1560149503.747338, "log":"{\"log\":\"2019/06/10 06:51:43 [error] 6#6: *17 open() \\\"/usr/share/nginx/html/jiajia2\\\" failed (2: No such file or directory), client: 172.17.0.1, server: localhost, request: \\\"GET /jiajia2 HTTP/1.1\\\", host: \\\"localhost:8081\\\"\\n\",\"stream\":\"stderr\",\"attrs\":{\"tag\":\"nginx-service\"},\"time\":\"2019-06-10T06:51:43.747309379Z\"}"}
{"@timestamp":1560149506.673414, "log":"{\"log\":\"2019/06/10 06:51:46 [error] 6#6: *17 open() \\\"/usr/share/nginx/html/jiajia3\\\" failed (2: No such file or directory), client: 172.17.0.1, server: localhost, request: \\\"GET /jiajia3 HTTP/1.1\\\", host: \\\"localhost:8081\\\"\\n\",\"stream\":\"stderr\",\"attrs\":{\"tag\":\"nginx-service\"},\"time\":\"2019-06-10T06:51:46.673326784Z\"}"}
{"@timestamp":1560149510.666708, "log":"{\"log\":\"2019/06/10 06:51:50 [error] 6#6: *17 open() \\\"/usr/share/nginx/html/jiajia4\\\" failed (2: No such file or directory), client: 172.17.0.1, server: localhost, request: \\\"GET /jiajia4 HTTP/1.1\\\", host: \\\"localhost:8081\\\"\\n\",\"stream\":\"stderr\",\"attrs\":{\"tag\":\"nginx-service\"},\"time\":\"2019-06-10T06:51:50.666606272Z\"}"}

number is 4
{"errcode":0,"errmsg":"ok"}
{"errcode":0,"errmsg":"ok"}
{"errcode":0,"errmsg":"ok"}
{"errcode":0,"errmsg":"ok"}


30分钟搭建Fluent-bit采集日志,并完成Kafka和Spark日志消息报警_第2张图片
钉钉收到报警消息

这段代码只是基本示例。

基本过程就是酱紫。

后面还是要恶补一下理论。

你可能感兴趣的:(30分钟搭建Fluent-bit采集日志,并完成Kafka和Spark日志消息报警)