filebeat和flume都是当下非常流行的日志采集工具,flume功能更强,使用范围也更广,而filebeat相比flume更加轻量,一般跟ELK一起使用。
笔者在使用这两个工具对接kafka的过程中发现他们采集到kafka里的数据格式是不同的,这里我使用spark streaming分别消费出几条数据对比一下:
spark streaming程序:
def run():Unit={
val sparkConf=new SparkConf().setAppName("Test").setMaster("local[*]")
val ssc=new StreamingContext(sparkConf,Seconds(10))
val topics=Set("topic_test")
val kafkaParams=Map[String,Object](
"bootstrap.servers" -> "localhost:9092",
"key.deserializer" -> classOf[StringDeserializer],
"value.deserializer" -> classOf[StringDeserializer],
"group.id" -> "group_test",
"auto.offset.reset" -> "earliest"
)
val dataStream=KafkaUtils.createDirectStream[String,String](
ssc,
LocationStrategies.PreferConsistent,
ConsumerStrategies.Subscribe[String,String](topics,kafkaParams)
)
dataStream.foreachRDD(rdd => {
rdd.map(_.value()).take(1).foreach(println)
})
ssc.start()
ssc.awaitTermination()
}
原始日志为:
23145234534|3235346343245|2020-04-10 00:00:09|47|failed
由filebeat采集到kafka的数据打印:
{"@timestamp":“2020-04-20T02:43:19.460Z”,"@metadata":{“beat”:“filebeat”,“type”:"_doc",“version”:“7.0.0”,“topic”:“topic_test”},“ecs”:{“version”:“1.0.0”},“host”:{“name”:“node1”},“agent”:{“id”:“b95c56bc-2d0a-4a5a-ae4a-df51f4a2cde7”,“version”:“7.0.0”,“type”:“filebeat”,“ephemeral_id”:“8709e24c-6c5a-4c5e-9116-52f372804d57”,“hostname”:“node1”},“log”:{“offset”:45414270,“file”:{“path”:"/data/test.log"}},“message”:“23145234534|3235346343245|2020-04-10 00:00:09|47|failed”,“input”:{“type”:“log”}}
由flume采集到kafka的数据打印:
23145234534|3235346343245|2020-04-10 00:00:09|47|failed
可以看到filebeat采集到kafka里的数据在消费时是一个json字符串,里面的 message 字段才是原始日志,而flume采集到kafka里的数据在消费时就是原始日志
同时放一下两者的ConsumerRecord对象:
filebeat:
ConsumerRecord(topic = topic_test, partition = 0, offset = 199418359, CreateTime = 1587350599460, checksum = 2021080248, serialized key size = -1, serialized value size = 829, key = null, value = {"@timestamp":“2020-04-20T02:43:19.460Z”,"@metadata":{“beat”:“filebeat”,“type”:"_doc",“version”:“7.0.0”,“topic”:“topic_test”},“ecs”:{“version”:“1.0.0”},“host”:{“name”:“node1”},“agent”:{“id”:“b95c56bc-2d0a-4a5a-ae4a-df51f4a2cde7”,“version”:“7.0.0”,“type”:“filebeat”,“ephemeral_id”:“8709e24c-6c5a-4c5e-9116-52f372804d57”,“hostname”:“node1”},“log”:{“offset”:45414270,“file”:{“path”:"/data/test.log"}},“message”:“23145234534|3235346343245|2020-04-10 00:00:09|47|failed”,“input”:{“type”:“log”}})
flume:
ConsumerRecord(topic = topic_test, partition = 2, offset = 2056830, CreateTime = -1, serialized key size = -1, serialized value size = 66, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 23145234534|3235346343245|2020-04-10 00:00:09|47|failed)