【kafka】consumer offset提交异常时数据会重复么?

文章目录

    • 一、问题描述:
      • 问题:
      • 初步思考
    • 二、场景复现:
      • 测试环境:
      • 复现
    • 三、问题分析:
      • 日志
      • 服务端源码
    • 参考

一、问题描述:

问题:

kafka consumer消费poll到某分区数据后未及时commit offset,此时另外一个消费线程消费该分区时会消费到重复数据么?

初步思考

我们已知:

  • 一个分区在某个时刻只能被一个消费线程消费数据
  • kafka服务端和客户端都会维护一个offset

问题引申出来的思考:

  • 当某个消费线程异常了,该怎么处理?
  • 当某个消费线程因为网络抖动暂时异常又很快恢复了,该怎么处理?

二、场景复现:

测试环境:

  • kafkaf服务端版本:2.11-2.2.0
  • kafka客户端版本:0.9.0.1

生产者:

        while (true){
            String messageStr = "Message_" + (++messageNo);
            long startTime = System.currentTimeMillis();
            producer.send(new ProducerRecord<Integer, String>(topic,
                    messageNo,
                    messageStr), new DemoCallBack(startTime, messageNo, messageStr));
            Thread.sleep(1000);
        }

消费者:

        while (true) {
            ConsumerRecords<Integer, String> records = consumer.poll(1);
            for (ConsumerRecord<Integer, String> record : records) {
                System.out.println("Received message: (" + record.key() + ", " + record.value() + ") at offset " + record.offset());
            }
            System.out.println("-------------------");
            Thread.sleep(5000);
            consumer.commitSync();
        }

复现

提交offset前暂停下一波poll
超时超过阈值
新消费者可以消费到重复数据
超时未超过阈值
原消费者恢复,数据未重复

三、问题分析:

日志

此时我们需要分析kafka服务端的日志

[2020-03-30 16:02:22,761] INFO [GroupCoordinator 0]: Assignment received from leader for group DemoConsumer for generation 3 (kafka.coordinator.group.GroupCoordinator)
[2020-03-30 16:03:02,915] INFO [GroupCoordinator 0]: Member consumer-1-c4fa22f5-6917-4dc2-ab61-68b794761072 in group DemoConsumer has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2020-03-30 16:03:02,915] INFO [GroupCoordinator 0]: Preparing to rebalance group DemoConsumer in state PreparingRebalance with old generation 3 (__consumer_offsets-17) (reason: removing member consumer-1-c4fa22f5-6917-4dc2-ab61-68b794761072 on heartbeat expiration) (kafka.coordinator.group.GroupCoordinator)
[2020-03-30 16:03:02,916] INFO [GroupCoordinator 0]: Group DemoConsumer with generation 4 is now empty (__consumer_offsets-17) (kafka.coordinator.group.GroupCoordinator)

超时之后,服务端会对消费组做一次rebalance。(默认大概30s),将这个消费线程拉黑。此时第二个消费线程会接管这个消费的大任,还会消费到重复数据。

[2020-03-30 16:15:40,466] INFO [GroupCoordinator 0]: Assignment received from leader for group DemoConsumer for generation 7 (kafka.coordinator.group.GroupCoordinator)
[2020-03-30 16:16:10,467] INFO [GroupCoordinator 0]: Member consumer-1-12890630-8078-4c8b-aaeb-4249db5b1968 in group DemoConsumer has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)

被拉黑的消费线程之后再commit offset的话,会报错。

Exception in thread "main" org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed due to group rebalance
	at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:552)
	at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:493)
	at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:665)
	at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:644)

服务端源码

todo

参考

https://kafka.apache.org/10/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html

After subscribing to a set of topics, the consumer will automatically join the group when poll(long) is invoked. The poll API is designed to ensure consumer liveness. As long as you continue to call poll, the consumer will stay in the group and continue to receive messages from the partitions it was assigned. Underneath the covers, the consumer sends periodic heartbeats to the server. If the consumer crashes or is unable to send heartbeats for a duration of session.timeout.ms, then the consumer will be considered dead and its partitions will be reassigned.
It is also possible that the consumer could encounter a “livelock” situation where it is continuing to send heartbeats, but no progress is being made. To prevent the consumer from holding onto its partitions indefinitely in this case, we provide a liveness detection mechanism using the max.poll.interval.ms setting. Basically if you don’t call poll at least as frequently as the configured max interval, then the client will proactively leave the group so that another consumer can take over its partitions. When this happens, you may see an offset commit failure (as indicated by a CommitFailedException thrown from a call to commitSync()). This is a safety mechanism which guarantees that only active members of the group are able to commit offsets. So to stay in the group, you must continue to call poll.

The consumer provides two configuration settings to control the behavior of the poll loop:

max.poll.interval.ms: By increasing the interval between expected polls, you can give the consumer more time to handle a batch of records returned from poll(long). The drawback is that increasing this value may delay a group rebalance since the consumer will only join the rebalance inside the call to poll. You can use this setting to bound the time to finish a rebalance, but you risk slower progress if the consumer cannot actually call poll often enough.
max.poll.records: Use this setting to limit the total records returned from a single call to poll. This can make it easier to predict the maximum that must be handled within each poll interval. By tuning this value, you may be able to reduce the poll interval, which will reduce the impact of group rebalancing.
For use cases where message processing time varies unpredictably, neither of these options may be sufficient. The recommended way to handle these cases is to move message processing to another thread, which allows the consumer to continue calling poll while the processor is still working. Some care must be taken to ensure that committed offsets do not get ahead of the actual position. Typically, you must disable automatic commits and manually commit processed offsets for records only after the thread has finished handling them (depending on the delivery semantics you need). Note also that you will need to pause the partition so that no new records are received from poll until after thread has finished handling those previously returned.

注意: max.poll.interval.ms、max.poll.records 0.9的client不生效。

你可能感兴趣的:(Kafka)