flume kafka sink 往kafka topic中发数据,数据不均衡,只发到一个partition中

Kafka Sink uses the topic and key properties from the FlumeEvent headers to send events to Kafka. If topic exists in the headers, the event will be sent to that specific topic, overriding the topic configured for the Sink. If key exists in the headers, the key will used by Kafka to partition the data between the topic partitions. Events with same key will be sent to the same partition. If the key is null, events will be sent to random partitions.

摘自官方文档,没有设置flumeEvent 的header 应该默认为null 但是flume kafka sink并没有均匀的往kafka的各个分区发送数据,而是发送到了同一个partition。

解决方案:利用 flume Interceptor 中的UUID Interceptor,会将flumeEvent发送时的header改为UUID随机的字符串,这样数据发往kafka分区的时候就会均匀的分配

    注:是flume sink 往kafka分区发送的规则是按照 header.hashcode/partitionNum 算的,而不是kafka的分区是这个规则。

ps:flume的官方文档做的是真的垃圾 つ﹏⊂

 

你可能感兴趣的:(flume kafka sink 往kafka topic中发数据,数据不均衡,只发到一个partition中)