Flume 运维 - Tips

简单粗暴,直入主题,最近数仓 Streaming ETL 强依赖 Flume 作为异构同步手段,开一个帖子专门记录踩过的坑,以及如何爬上来。

Tips:

  • Flume 1.6 和 1.7 是目前最流行的版本,Kafka Client 从 0.8.X 升级到了 0.10.X;
  • Flume 1.6 和 1.7 配置项也有很大的修改,举个例子,针对 Kafka Sink 的配置,然后配置错误了也不会报错,直接的体现是 channel 被打爆:
# 1.6
a1.sinks.k1.topic = mytopic
a1.sinks.k1.brokerList = localhost:9092
a1.sinks.k1.requiredAcks = 1
a1.sinks.k1.batchSize = 20
# 1.7
a1.sinks.k1.kafka.topic = mytopic
a1.sinks.k1.kafka.bootstrap.servers = localhost:9092
a1.sinks.k1.kafka.flumeBatchSize = 20
a1.sinks.k1.kafka.producer.acks = 1
  • 即使配置文件写错了,也没有一个很好的报错提示,语义很模糊,加大了排除问题的难度;
  • Kafka to Kafka 的场景,在 1.6 版本下,如果源 topic 和目的 topic 名称不一致,必须在 Kafka Sink 设置 ignoreTopicInHeader = true,达到同步的目的(说实话我没有调试成功,暂时搁置);

An optinal property called ignoreTopicInHeader is added for Kafka Sink. Its default value is false, so it is compatible with Flume 1.6.0. If you want to ignore topic in header and write events to the topic you specified in properties file, you can set ignoreTopicInHeader to true.
Besides, three optinal properties topicHeader, keyHeader, timestampHeader are added for Kafka Source. They are similar to fileHeader and basenameHeader for Spooling Directory Source. Their default value are true, so they are compatible with Flume 1.6.0. If you do not want to add headers storing topic, key or timestamp, you can set them to false. It is also helpful for performance of Kafka Source.

你可能感兴趣的:(Flume 运维 - Tips)