关于kafka集群出现异常,无法选取topics分区领导问题

1.基础环境

ubuntu系统

4台物理机

4个kafka节点

3个zookeeper节点

节点全部都在docker容器环境运行

主要是接手的Farbic1.4的kafka共识orderer节点,挖的坑哎

2.问题日志

kafka报错日志

[2021-11-03 02:59:56,910] INFO [ReplicaFetcher replicaId=1, leaderId=4, fetcherId=0] Retrying leaderEpoch request for partition byfn-sys-channel-0 as the leader reported an error: UNKNOWN_SERVER_ERROR (kafka.server.ReplicaFetcherThread)
[2021-11-03 02:59:57,910] WARN [ReplicaFetcher replicaId=1, leaderId=4, fetcherId=0] Error when sending leader epoch request for Map(byfn-sys-channel-0 -> -1) (kafka.server.ReplicaFetcherThread)
java.io.IOException: Connection to 5f2f56fcc720:9092 (id: 4 rack: null) failed.
	at org.apache.kafka.clients.NetworkClientUtils.awaitReady(NetworkClientUtils.java:68)
	at kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:91)
	at kafka.server.ReplicaFetcherThread.fetchEpochsFromLeader(ReplicaFetcherThread.scala:316)
	at kafka.server.AbstractFetcherThread.maybeTruncate(AbstractFetcherThread.scala:130)
	at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:102)
	at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:64)

还有个fabric的orderer节点报错日志,他说kafka还在领导选举哎,有没有,no leader,像个小笨蛋似的

2021-11-03 03:25:00.881 UTC [orderer.consensus.kafka] HealthCheck -> WARN 010 [channel byfn-sys-channel] Cannot post CONNECT message = kafka server: In the middle of a leadership election, there is currently no leader for this partition and hence it is unavailable for writes.

3.问题排查

此处默认zookeeper环境已搭建好,zookeeper无法选举领导问题上一篇也有

https://blog.csdn.net/langchao7946/article/details/121080561

看日志吧,兄弟们,就硬啃

先看kafka的UNKNOWN_SERVER_ERROR,Connection to 5f2f56fcc720:9092 (id: 4 rack: null) failed。

找不到服务哎,前面那个域名是什么鬼,我host明明配置过了

1. 先进zookeeper容器(先docker ps查看id什么的) 执行zookeeper客户端查查


// 操作日志
root@zdya-desktop:~# docker exec -it 9a61ec4d76b6 bash

root@zookeeper1:/zookeeper-3.4.14# docker exec -it 9a61ec4d76b6 bash

root@zookeeper1:/zookeeper-3.4.14# cd bin/

root@zookeeper1:/zookeeper-3.4.14/bin# ll

total 52
drwxr-xr-x 2 2002 2002 4096 Mar  6  2019 ./
drwxr-xr-x 1 root root 4096 Mar  3  2020 ../
-rwxr-xr-x 1 2002 2002  232 Mar  6  2019 README.txt*
-rwxr-xr-x 1 2002 2002 1937 Mar  6  2019 zkCleanup.sh*
-rwxr-xr-x 1 2002 2002 1056 Mar  6  2019 zkCli.cmd*
-rwxr-xr-x 1 2002 2002 1534 Mar  6  2019 zkCli.sh*
-rwxr-xr-x 1 2002 2002 1759 Mar  6  2019 zkEnv.cmd*
-rwxr-xr-x 1 2002 2002 2919 Mar  6  2019 zkEnv.sh*
-rwxr-xr-x 1 2002 2002 1089 Mar  6  2019 zkServer.cmd*
-rwxr-xr-x 1 2002 2002 6773 Mar  6  2019 zkServer.sh*
-rwxr-xr-x 1 2002 2002  996 Mar  6  2019 zkTxnLogToolkit.cmd*
-rwxr-xr-x 1 2002 2002 1385 Mar  6  2019 zkTxnLogToolkit.sh*

root@zookeeper1:/zookeeper-3.4.14/bin# zkCli.sh

2. 查看注册情况

[zk: localhost:2181(CONNECTED) 3] ls /brokers/ids 
[1, 2, 3, 4]


[zk: localhost:2181(CONNECTED) 5] get /brokers/ids/1
{"listener_security_protocol_map":{"PLAINTEXT":"PLAINTEXT"},"endpoints":["PLAINTEXT://415ff22fe50c:9092"],"jmx_port":-1,"host":"415ff22fe50c","timestamp":"1635753493847","port":9092,"version":4}
cZxid = 0x10000001d
ctime = Mon Nov 01 07:58:12 GMT 2021
mZxid = 0x10000001d
mtime = Mon Nov 01 07:58:12 GMT 2021
pZxid = 0x10000001d
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x2015e1ba60b0000
dataLength = 194
numChildren = 0

呕吼,host莫名其妙的415ff22fe50c,和刚才的5f2f56fcc720半斤八两,我随便找的一台

3.重新配置kafka

坑壁环境,公司5台不配置跑的好好的

KAFKA_LISTENERS是kafka监听服务的端口,默认是localhost,默认的doker网络模式是桥接,容易出现通信问题,建议改成0.0.0.0,这个会监听属于本机所有网卡的ip,若是设置个内网ip172.16.1.222,只能监听到这个内网ip的消息

KAFKA_ADVERTISED_LISTENERS是kafaka的公布地址了,具体体现就是,他会把这个地址直接注册到zookeeper

      - KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://kafka3.example.com:9092
      - KAFKA_LISTENERS=PLAINTEXT://0.0.0.0:9092

关于kafka集群出现异常,无法选取topics分区领导问题_第1张图片

 docker-compose -f docker-compose-kafka3.yaml up -d

docker  logs  -f  --tail 333 容器ID

改完,重新创建启动一下,看了看日志没毛病

去zookeeper看看

关于kafka集群出现异常,无法选取topics分区领导问题_第2张图片

 呕吼  牛了变了,host的变成kafka了,创建通道发送消息时,使用Kafka也不报错了,

分区领导选出

你可能感兴趣的:(Linux,kafka,分布式)