kafka消费不到数据的排查过程

kafka消费不到数据的排查

集群上新安装并启动了3个kafka Broker,代码打包上传至集群,运行后发现一直消费不到数据,

本地idea中debug后发现,程序一直阻塞在如下程序中,陷入了死循环。

  /**
     * Block until the coordinator for this group is known and is ready to receive requests.
     * 等待直到我们和服务端的GroupCoordinator取得连接
     */
    public void ensureCoordinatorReady() {
        while (coordinatorUnknown()) {//无法获取GroupCoordinator
            RequestFuture future = sendGroupCoordinatorRequest();//发送请求
            client.poll(future);//同步等待异步调用的结果
            if (future.failed()) {
                if (future.isRetriable())
                    client.awaitMetadataUpdate();
                else
                    throw future.exception();
            } else if (coordinator != null && client.connectionFailed(coordinator)) {
                // we found the coordinator, but the connection has failed, so mark
                // it dead and backoff before retrying discovery
                coordinatorDead();
                time.sleep(retryBackoffMs);//等待一段时间,然后重试
            }

        }
    }

流程大概说就是

  • consumer会从集群中选取一个broker作为coordinator
  • 然后group中的consumer会向coordinator发请求申请成为consumergroup中的leader
  • 最后有1个consumer会成为consumerLeader ,其他consumer成为follower
  • consumerLeader做分区分配任务,同步给coordinator
  • consumerFollower从coordinator同步分区分配数据

问题出现在第一步,意思就是说Consumer和服务端的GroupCoordinator无法取得连接,所以程序一直在等待状态。

看了下__consumer_offsets 这个topic情况,50个分区全在broker id为152的broker上

bin/kafka-topics.sh --describe --zookeeper localhost:2182 --topic __consumer_offsets
Topic:__consumer_offsets    PartitionCount:50    ReplicationFactor:1    Configs:segment.bytes=104857600,cleanup.policy=compact,compression.type=producer
    Topic: __consumer_offsets    Partition: 0    Leader: 152    Replicas: 152   Isr:152
    Topic: __consumer_offsets    Partition: 1    Leader: 152    Replicas: 152   Isr:152
    Topic: __consumer_offsets    Partition: 2    Leader: 152    Replicas: 152   Isr:152
    Topic: __consumer_offsets    Partition: 3    Leader: 152   
......

但是集群上并没有broker id为152的节点,想到该集群kafka节点曾经添加删除过节点,初步断定152是之前的kafka节点,后来该节点去掉后又加入新的节点但是zookeeper中的数据并没有更新。

所以就关闭broker,进入zookeeper客户端,将brokers节点下的topics节点下的__consumer_offsets删除,然后重启broker,注意,此时zookeeper上__consumer_offsets还并没有生成,要开启消费者之后才会生成.

然后再观察__consumer_offsets,分区已经均匀分布在三个broker上面了

 bin/kafka-topics.sh --zookeeper localhost:2182 --describe --topic __consumer_offsets
Topic:__consumer_offsets    PartitionCount:50    ReplicationFactor:3    Configs:segment.bytes=104857600,cleanup.policy=compact,compression.type=producer
    Topic: __consumer_offsets    Partition: 0    Leader: 420    Replicas: 420,421,422    Isr: 420,422,421
    Topic: __consumer_offsets    Partition: 1    Leader: 421    Replicas: 421,422,420    Isr: 422,420,421
    Topic: __consumer_offsets    Partition: 2    Leader: 422    Replicas: 422,420,421    Isr: 422,420,421
    Topic: __consumer_offsets    Partition: 3    Leader: 420    Replicas: 420,422,421    Isr: 420,422,421
    Topic: __consumer_offsets    Partition: 4    Leader: 421    Replicas: 421,420,422    Isr: 420,422,421
    Topic: __consumer_offsets    Partition: 5    Leader: 422    Replicas: 422,421,420    Isr: 422,420,421
    Topic: __consumer_offsets    Partition: 6    Leader: 420    Replicas: 420,421,422    Isr: 420,422,421
    Topic: __consumer_offsets    Partition: 7    Leader: 421    Replicas: 421,422,420    Isr: 422,420,421
    Topic: __consumer_offsets    Partition: 8    Leader: 422    Replicas: 422,420,421    Isr: 422,420,421
    Topic: __consumer_offsets    Partition: 9    Leader: 420    Replicas: 420,422,421    Isr: 420,422,421
    Topic: __consumer_offsets    Partition: 10    Leader: 421    Replicas: 421,420,422    Isr: 420,422,421
    Topic: __consumer_offsets    Partition: 11    Leader: 422    Replicas: 422,421,420    Isr: 422,420,421
    Topic: __consumer_offsets    Partition: 12    Leader: 420    Replicas: 420,421,422    Isr: 420,422,421
    Topic: __consumer_offsets    Partition: 13    Leader: 421    Replicas: 421,422,420    Isr: 422,420,421
    Topic: __consumer_offsets    Partition: 14    Leader: 422    Replicas: 422,420,421    Isr: 422,420,421
    Topic: __consumer_offsets    Partition: 15    Leader: 420    Replicas: 420,422,421    Isr: 420,422,421
    Topic: __consumer_offsets    Partition: 16    Leader: 421    Replicas: 421,420,422    Isr: 420,422,421
    Topic: __consumer_offsets    Partition: 17    Leader: 422    Replicas: 422,421,420    Isr: 422,420,421
    Topic: __consumer_offsets    Partition: 18    Leader: 420    Replicas: 420,421,422    Isr: 420,422,421
    Topic: __consumer_offsets    Partition: 19    Leader: 421    Replicas: 421,422,420    Isr: 422,420,421
    Topic: __consumer_offsets    Partition: 20    Leader: 422    Replicas: 422,420,421    Isr: 422,420,421
    Topic: __consumer_offsets    Partition: 21    Leader: 420    Replicas: 420,422,421    Isr: 420,422,421
    Topic: __consumer_offsets    Partition: 22    Leader: 421    Replicas: 421,420,422    Isr: 420,422,421
    Topic: __consumer_offsets    Partition: 23    Leader: 422    Replicas: 422,421,420    Isr: 422,420,421
    Topic: __consumer_offsets    Partition: 24    Leader: 420    Replicas: 420,421,422    Isr: 420,422,421
    Topic: __consumer_offsets    Partition: 25    Leader: 421    Replicas: 421,422,420    Isr: 422,420,421
    Topic: __consumer_offsets    Partition: 26    Leader: 422    Replicas: 422,420,421    Isr: 422,420,421
    Topic: __consumer_offsets    Partition: 27    Leader: 420    Replicas: 420,422,421    Isr: 420,422,421
    Topic: __consumer_offsets    Partition: 28    Leader: 421    Replicas: 421,420,422    Isr: 420,422,421
    Topic: __consumer_offsets    Partition: 29    Leader: 422    Replicas: 422,421,420    Isr: 422,420,421
    Topic: __consumer_offsets    Partition: 30    Leader: 420    Replicas: 420,421,422    Isr: 420,422,421
    Topic: __consumer_offsets    Partition: 31    Leader: 421    Replicas: 421,422,420    Isr: 422,420,421
    Topic: __consumer_offsets    Partition: 32    Leader: 422    Replicas: 422,420,421    Isr: 422,420,421
    Topic: __consumer_offsets    Partition: 33    Leader: 420    Replicas: 420,422,421    Isr: 420,422,421
    Topic: __consumer_offsets    Partition: 34    Leader: 421    Replicas: 421,420,422    Isr: 420,422,421
    Topic: __consumer_offsets    Partition: 35    Leader: 422    Replicas: 422,421,420    Isr: 422,420,421
    Topic: __consumer_offsets    Partition: 36    Leader: 420    Replicas: 420,421,422    Isr: 420,422,421
    Topic: __consumer_offsets    Partition: 37    Leader: 421    Replicas: 421,422,420    Isr: 422,420,421
    Topic: __consumer_offsets    Partition: 38    Leader: 422    Replicas: 422,420,421    Isr: 422,420,421
    Topic: __consumer_offsets    Partition: 39    Leader: 420    Replicas: 420,422,421    Isr: 420,422,421
    Topic: __consumer_offsets    Partition: 40    Leader: 421    Replicas: 421,420,422    Isr: 420,422,421
    Topic: __consumer_offsets    Partition: 41    Leader: 422    Replicas: 422,421,420    Isr: 422,420,421
    Topic: __consumer_offsets    Partition: 42    Leader: 420    Replicas: 420,421,422    Isr: 420,422,421
    Topic: __consumer_offsets    Partition: 43    Leader: 421    Replicas: 421,422,420    Isr: 422,420,421
    Topic: __consumer_offsets    Partition: 44    Leader: 422    Replicas: 422,420,421    Isr: 422,420,421
    Topic: __consumer_offsets    Partition: 45    Leader: 420    Replicas: 420,422,421    Isr: 420,422,421
    Topic: __consumer_offsets    Partition: 46    Leader: 421    Replicas: 421,420,422    Isr: 420,422,421
    Topic: __consumer_offsets    Partition: 47    Leader: 422    Replicas: 422,421,420    Isr: 422,420,421
    Topic: __consumer_offsets    Partition: 48    Leader: 420    Replicas: 420,421,422    Isr: 420,422,421
    Topic: __consumer_offsets    Partition: 49    Leader: 421    Replicas: 421,422,420    Isr: 422,420,421

这个时候重启程序,发现已经可以正常消费了,问题解决。

参考资料:

  • https://stackoverflow.com/questions/42362911/kafka-high-level-consumer-error-code-15

总结

以上为个人经验,希望能给大家一个参考,也希望大家多多支持脚本之家。

你可能感兴趣的:(kafka消费不到数据的排查过程)