filebeat和flume写入kafka后数据格式的不同

filebeat和flume都是当下非常流行的日志采集工具,flume功能更强,使用范围也更广,而filebeat相比flume更加轻量,一般跟ELK一起使用。
笔者在使用这两个工具对接kafka的过程中发现他们采集到kafka里的数据格式是不同的,这里我使用spark streaming分别消费出几条数据对比一下:

spark streaming程序:

def run():Unit={
    val sparkConf=new SparkConf().setAppName("Test").setMaster("local[*]")
    val ssc=new StreamingContext(sparkConf,Seconds(10))

    val topics=Set("topic_test")
    val kafkaParams=Map[String,Object](
      "bootstrap.servers" -> "localhost:9092",
      "key.deserializer" -> classOf[StringDeserializer],
      "value.deserializer" -> classOf[StringDeserializer],
      "group.id" -> "group_test",
      "auto.offset.reset" -> "earliest"
    )

    val dataStream=KafkaUtils.createDirectStream[String,String](
      ssc,
      LocationStrategies.PreferConsistent,
      ConsumerStrategies.Subscribe[String,String](topics,kafkaParams)
    )

    dataStream.foreachRDD(rdd => {
      rdd.map(_.value()).take(1).foreach(println)
    })

    ssc.start()
    ssc.awaitTermination()
  }

原始日志为:
23145234534|3235346343245|2020-04-10 00:00:09|47|failed
由filebeat采集到kafka的数据打印:
{"@timestamp":“2020-04-20T02:43:19.460Z”,"@metadata":{“beat”:“filebeat”,“type”:"_doc",“version”:“7.0.0”,“topic”:“topic_test”},“ecs”:{“version”:“1.0.0”},“host”:{“name”:“node1”},“agent”:{“id”:“b95c56bc-2d0a-4a5a-ae4a-df51f4a2cde7”,“version”:“7.0.0”,“type”:“filebeat”,“ephemeral_id”:“8709e24c-6c5a-4c5e-9116-52f372804d57”,“hostname”:“node1”},“log”:{“offset”:45414270,“file”:{“path”:"/data/test.log"}},“message”:“23145234534|3235346343245|2020-04-10 00:00:09|47|failed”,“input”:{“type”:“log”}}
由flume采集到kafka的数据打印:
23145234534|3235346343245|2020-04-10 00:00:09|47|failed

可以看到filebeat采集到kafka里的数据在消费时是一个json字符串,里面的 message 字段才是原始日志,而flume采集到kafka里的数据在消费时就是原始日志

同时放一下两者的ConsumerRecord对象:
filebeat:
ConsumerRecord(topic = topic_test, partition = 0, offset = 199418359, CreateTime = 1587350599460, checksum = 2021080248, serialized key size = -1, serialized value size = 829, key = null, value = {"@timestamp":“2020-04-20T02:43:19.460Z”,"@metadata":{“beat”:“filebeat”,“type”:"_doc",“version”:“7.0.0”,“topic”:“topic_test”},“ecs”:{“version”:“1.0.0”},“host”:{“name”:“node1”},“agent”:{“id”:“b95c56bc-2d0a-4a5a-ae4a-df51f4a2cde7”,“version”:“7.0.0”,“type”:“filebeat”,“ephemeral_id”:“8709e24c-6c5a-4c5e-9116-52f372804d57”,“hostname”:“node1”},“log”:{“offset”:45414270,“file”:{“path”:"/data/test.log"}},“message”:“23145234534|3235346343245|2020-04-10 00:00:09|47|failed”,“input”:{“type”:“log”}})
flume:
ConsumerRecord(topic = topic_test, partition = 2, offset = 2056830, CreateTime = -1, serialized key size = -1, serialized value size = 66, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 23145234534|3235346343245|2020-04-10 00:00:09|47|failed)

你可能感兴趣的:(kafka,大数据,flume)