首先我使用的版本是0.8.2
最近在使用kafka的时候发现服务器的网卡经常吃满,使用iftop和dstat看网络流量,发现kafka broker之间有很大流量。如果topic设置了多个replica,那么经常会有replica因为不能同步成功而掉线,最终topic的可用isr只剩下一个。
这个问题非常头疼,首先会导致机器网络io不正常,影响其他服务,其次会导致topic的备份失效。
在log4j.properties的配置中把 log4j.logger.kafka改为TRACE后,看到日志如下:
[2015-09-10 20:11:52,649] TRACE Processor id 0 selection time = 10121190 ns (kafka.network.Processor)
[2015-09-10 20:11:52,649] TRACE [KafkaApi-4] 1 bytes written to log test_queue-38 beginning at offset 679 and ending at offset 679 (kafka.server.KafkaApis)
[2015-09-10 20:11:52,649] TRACE Socket server received response to send, registering for write: Response(0,Request(0,sun.nio.ch.SelectionKeyImpl@14d964af,null,1441887112600,/192.168.201.238:46389),kafka.api.FetchResponseSend@7640d611,SendAction) (kafka.network.Processor)
[2015-09-10 20:11:52,649] DEBUG [KafkaApi-4] Produce to local log in 10 ms (kafka.server.KafkaApis)
[2015-09-10 20:11:52,649] TRACE Processor id 0 selection time = 36105 ns (kafka.network.Processor)
[2015-09-10 20:11:52,649] TRACE Bytes written as part of multisend call : 162Total bytes written so far : 162Expected bytes to write : 162 (kafka.api.TopicDataSend$$anon$1)
[2015-09-10 20:11:52,650] TRACE FileMessageSet /home/dfs/kafka-data/test_queue-38/00000000000000000000.log : bytes transferred : 289600 bytes requested for transfer : 855076 (kafka.log.FileMessageSet)
[2015-09-10 20:11:52,650] TRACE Bytes written as part of multisend call : 289672Total bytes written so far : 289672Expected bytes to write : 855148 (kafka.api.TopicDataSend$$anon$1)
[2015-09-10 20:11:52,650] TRACE Bytes written as part of multisend call : 289874Total bytes written so far : 289874Expected bytes to write : 855391 (kafka.api.FetchResponseSend$$anon$2)
[2015-09-10 20:11:52,650] TRACE 289886 bytes written to /192.168.201.238:46389 using key sun.nio.ch.SelectionKeyImpl@14d964af (kafka.network.Processor)
[2015-09-10 20:11:52,650] TRACE 1800 bytes read. (kafka.network.BoundedByteBufferReceive)
[2015-09-10 20:11:52,650] TRACE 5810 bytes read. (kafka.network.BoundedByteBufferReceive)
[2015-09-10 20:11:52,650] TRACE 8192 bytes read. (kafka.network.BoundedByteBufferReceive)
[2015-09-10 20:11:52,650] TRACE 8192 bytes read. (kafka.network.BoundedByteBufferReceive)
[2015-09-10 20:11:52,650] TRACE 8192 bytes read. (kafka.network.BoundedByteBufferReceive)
[2015-09-10 20:11:52,651] TRACE 5832 bytes read. (kafka.network.BoundedByteBufferReceive)
[2015-09-10 20:11:52,651] TRACE Did not finish writing, registering for write again on connection /192.168.201.238:46389 (kafka.network.Processor)
[2015-09-10 20:11:52,651] TRACE Socket server received response to send, registering for write: Response(0,Request(0,sun.nio.ch.SelectionKeyImpl@69107c05,null,1441887112638,/192.168.207.79:33435),kafka.network.BoundedByteBufferSend@c4a3158,SendAction) (kafka.network.Processor)
[2015-09-10 20:11:52,651] TRACE 8192 bytes read. (kafka.network.BoundedByteBufferReceive)
[2015-09-10 20:11:52,651] TRACE Processor id 0 selection time = 54822 ns (kafka.network.Processor)
[2015-09-10 20:11:52,651] TRACE 8192 bytes read. (kafka.network.BoundedByteBufferReceive)
[2015-09-10 20:11:52,651] TRACE 42 bytes written to /192.168.207.79:33435 using key sun.nio.ch.SelectionKeyImpl@69107c05 (kafka.network.Processor)
[2015-09-10 20:11:52,651] TRACE 8192 bytes read. (kafka.network.BoundedByteBufferReceive)
[2015-09-10 20:11:52,651] TRACE 8192 bytes read. (kafka.network.BoundedByteBufferReceive)
[2015-09-10 20:11:52,651] TRACE 8192 bytes read. (kafka.network.BoundedByteBufferReceive)
[2015-09-10 20:11:52,651] TRACE Finished writing, registering for read on connection /192.168.207.79:33435 (kafka.network.Processor)
[2015-09-10 20:11:52,652] TRACE 8192 bytes read. (kafka.network.BoundedByteBufferReceive)
[2015-09-10 20:11:52,652] TRACE 8192 bytes read. (kafka.network.BoundedByteBufferReceive)
[2015-09-10 20:11:52,652] TRACE 8192 bytes read. (kafka.network.BoundedByteBufferReceive)
[2015-09-10 20:11:52,652] TRACE 8192 bytes read. (kafka.network.BoundedByteBufferReceive)
[2015-09-10 20:11:52,652] TRACE 8192 bytes read. (kafka.network.BoundedByteBufferReceive)
[2015-09-10 20:11:52,652] TRACE Processor id 0 selection time = 758901 ns (kafka.network.Processor)
[2015-09-10 20:11:52,652] TRACE 8192 bytes read. (kafka.network.BoundedByteBufferReceive)
[2015-09-10 20:11:52,652] TRACE 8192 bytes read. (kafka.network.BoundedByteBufferReceive)
[2015-09-10 20:11:52,652] TRACE FileMessageSet /home/dfs/kafka-data/test_queue-38/00000000000000000000.log : bytes transferred : 143352 bytes requested for transfer : 565476 (kafka.log.FileMessageSet)
[2015-09-10 20:11:52,652] TRACE 8192 bytes read. (kafka.network.BoundedByteBufferReceive)
[2015-09-10 20:11:52,653] TRACE Bytes written as part of multisend call : 143352Total bytes written so far : 433024Expected bytes to write : 855148 (kafka.api.TopicDataSend$$anon$1)
[2015-09-10 20:11:52,653] TRACE 8192 bytes read. (kafka.network.BoundedByteBufferReceive)
[2015-09-10 20:11:52,653] TRACE Bytes written as part of multisend call : 143352Total bytes written so far : 433226Expected bytes to write : 855391 (kafka.api.FetchResponseSend$$anon$2)
[2015-09-10 20:11:52,653] TRACE 8192 bytes read. (kafka.network.BoundedByteBufferReceive)
[2015-09-10 20:11:52,653] TRACE 143352 bytes written to /192.168.201.238:46389 using key sun.nio.ch.SelectionKeyImpl@14d964af (kafka.network.Processor)
[2015-09-10 20:11:52,653] TRACE 8192 bytes read. (kafka.network.BoundedByteBufferReceive)
[2015-09-10 20:11:52,653] TRACE Did not finish writing, registering for write again on connection /192.168.201.238:46389 (kafka.network.Processor)
[2015-09-10 20:11:52,653] TRACE 8192 bytes read. (kafka.network.BoundedByteBufferReceive)
[2015-09-10 20:11:52,653] TRACE Processor id 0 selection time = 23668 ns (kafka.network.Processor)
TRACE 8192 bytes read. (kafka.network.BoundedByteBufferReceive)
这个收包的日志每秒钟打了六千多次,计算起来每秒钟流量也有50m了
另外一个值得注意的地方是
TRACE Did not finish writing, registering for write again on connection /192.168.201.238:46389 (kafka.network.Processor)
这条日志表明broker一直在发送数据,然后失败重试
然后简单看了一下源代码,发现kafka的配置文件中,如果没有定义 replica.fetch.max.bytes 这个值,server会给一个默认值给他(1M),这个1M,在短消息的应用场景下是没有问题的,但是我们的消息长度比较大,经常有大于1m的消息,虽然在topic中定义了max.message.bytes=52428700 即50M,但是对于broker,replica.fetch.max.bytes是不会随着topic的max.message.bytes变大而自动更新的,这样就导致一个问题,一旦消息的长度大于1M,而配置文件中没有定义 replica.fetch.max.bytes,就会导致replica之间数据同步失败。
那么这个问题能想到的解决办法就是将所有topic中最大的max.message.bytes赋给replica.fetch.max.bytes,在apache上相关的issue也有类似的讨论, https://issues.apache.org/jira/browse/KAFKA-1756
暂时的解决办法就是把replica.fetch.max.bytes在配置文件中设置为一个较大的值,而且确保所有topic的max.message.bytes都小于他。