kafka报Commit cannot be completed since the group has already rebalanced and assigned the partitions

问题描述:
新版本的kafka消息处理程序中,当消息量特别大时不断出现如下错误,并且多个相同groupId的消费者重复消费消息。

2018-10-12 19:49:34,903 WARN [DESKTOP-8S2E5H7 id2-1-C-1] Caller+0 at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$4.onComplete(ConsumerCoordinator.java:649)
Auto-commit of offsets {xxxTopic-5=OffsetAndMetadata{offset=359, metadata=’’}} failed for group My-Group-Name: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.

解决办法:
分析:
1, 根据问题描述,处理消息的时间太长,优化消息处理(整个消息的处理时间有所减少),该告警有所减少,但是依然存在。
2,根据问题描述,将max.poll.records值设置为200(默认值是500),并增加了session timeout(session.timeout.ms=60000, 默认值是5000,也就是5s),检测日志,问题有所改善,但是依然存在。

至于消息被重复消费,这是因为发送大量消息(group.id=abc)时,consumer1消息处理时间太长,而consumer设置的是自动提交,因为不能在默认的自动提交时间内处理完毕,所以自动提交失败,导致kafka认为该消息没有消费成功,因此consumer2(group.id=abc,同一个group.id的多个消费实例)又获得该消息开始重新消费。可以通过查看kafka中该topic对应的group的lag来验证。

最终决绝办法,增加auto.commit.interval.ms , 默认值是5000,增加到7000之后,同等kafka消息量下,基本没有了该告警消息。
为什么修改该参数,因为该告警的本质原因是, 消息处理时间过长,不能在设置的自动提交间隔时间内完成消息确认提交。

总结:
这只是我遇到该问题的解决办法,纯属个人解决办法。非官方提供的解决方法,仅供参考。

你可能感兴趣的:(kafka)