kafka问题解决:org.apache.kafka.common.errors.TimeoutException

记录使用kafka遇到的问题:
- 1.Caused by java.nio.channels.UnresolvedAddressException null
- 2.org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for t2-0: 30042 ms has passed since batch creation plus linger time

两个报错是同一个问题。

问题重现

java 生产者向kafka集群生产消息,报错如下:

2018-06-03 00:10:02.071  INFO 80 --- [nio-8080-exec-1] o.a.kafka.common.utils.AppInfoParser     : Kafka version : 0.10.2.0
2018-06-03 00:10:02.071  INFO 80 --- [nio-8080-exec-1] o.a.kafka.common.utils.AppInfoParser     : Kafka commitId : 576d93a8dc0cf421
2018-06-03 00:10:32.253 ERROR 80 --- [ad | producer-1] o.s.k.support.LoggingProducerListener    : Exception thrown when sending a message with key='test1' and payload='hello122' to topic t2:

org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for t2-0: 30042 ms has passed since batch creation plus linger time

明显连接kafka cluster超时,但是实际上并不知道具体是什么原因。

排除找原因

  • 确定kafka集群已经启动,包括zookeeper、kafka集群。可以通过命令
ps -ef | grep java

个人排查,kafka集群即zookeeper已经成功启动

  • 确定生产者程序一方的kafka配置是否有误。
    spring boot的配置如下
spring.kafka.bootstrap-servers=39.108.61.252:9092
spring.kafka.consumer.group-id=springboot-group1
spring.kafka.consumer.auto-offset-reset=earliest

自然也没有问题

  • 确定kafka集群所在机器防火墙或者说安全组是否已经开放相关端口。
    若是在windows上,打开telnet工具可以查看是否端口开放被监听

cmd进入命令行

telnet 39.108.61.252 9092

执行后成功连接,说明问题在程序所在的一方。


  • 打开debug日志级别看错误详细信息

spring boot 的设置方式是:
logging.level.root=debug
然后重启应用
发现后台在不停的刷错误
如下:

2018-06-03 00:22:37.703 DEBUG 5972 --- [       t1-0-C-1] org.apache.kafka.clients.NetworkClient   : Initialize connection to node 0 for sending metadata request
2018-06-03 00:22:37.703 DEBUG 5972 --- [       t1-0-C-1] org.apache.kafka.clients.NetworkClient   : Initiating connection to node 0 at izwz9c79fdwp9sb65vpyk3z:9092.
2018-06-03 00:22:37.703 DEBUG 5972 --- [       t1-0-C-1] org.apache.kafka.clients.NetworkClient   : Error connecting to node 0 at izwz9c79fdwp9sb65vpyk3z:9092:

java.io.IOException: Can't resolve address: izwz9c79fdwp9sb65vpyk3z:9092
    at org.apache.kafka.common.network.Selector.connect(Selector.java:182) ~[kafka-clients-0.10.2.0.jar:na]
    at org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:629) [kafka-clients-0.10.2.0.jar:na]
    at org.apache.kafka.clients.NetworkClient.access$600(NetworkClient.java:57) [kafka-clients-0.10.2.0.jar:na]
    at org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate(NetworkClient.java:768) [kafka-clients-0.10.2.0.jar:na]
    at org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate(NetworkClient.java:684) [kafka-clients-0.10.2.0.jar:na]
    at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:347) [kafka-clients-0.10.2.0.jar:na]
    at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:226) [kafka-clients-0.10.2.0.jar:na]
    at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:203) [kafka-clients-0.10.2.0.jar:na]
    at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:138) [kafka-clients-0.10.2.0.jar:na]
    at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:216) [kafka-clients-0.10.2.0.jar:na]
    at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:193) [kafka-clients-0.10.2.0.jar:na]
    at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:275) [kafka-clients-0.10.2.0.jar:na]
    at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1030) [kafka-clients-0.10.2.0.jar:na]
    at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:995) [kafka-clients-0.10.2.0.jar:na]
    at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:558) [spring-kafka-1.2.2.RELEASE.jar:na]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_144]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_144]
    at java.lang.Thread.run(Thread.java:748) [na:1.8.0_144]
Caused by: java.nio.channels.UnresolvedAddressException: null
    at sun.nio.ch.Net.checkAddress(Net.java:101) ~[na:1.8.0_144]
    at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:622) ~[na:1.8.0_144]
    at org.apache.kafka.common.network.Selector.connect(Selector.java:179) ~[kafka-clients-0.10.2.0.jar:na]
    ... 17 common frames omitted

可知建立socket时不能解析kafka所在服务器地址
查看日志可知,解析的地址是izwz9c79fdwp9sb65vpyk3z
这个地址是远程服务器的实例名称(阿里云服务器)。自己配置的明明是ip,程序内部却去获取他的别名,那如果生产者所在机器上没有配置这个ip的别名,就不能解析到对应的ip,所以连接失败报错。

解决

windows则去添加一条host映射

C:\Windows\System32\drivers\etc\hosts

39.108.61.252 izwz9c79fdwp9sb65vpyk3z
127.0.0.1 localhost

linux则

vi /etc/hosts 修改方式一致

此时重启生产者应用
日志如下部分,成功启动,后台会一直在心跳检测连接和更新offset,所以debug日志一直在刷,此时就可以把日志级别修改为info

2018-06-03 00:29:46.543 DEBUG 12772 --- [       t2-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator  : Group springboot-group1 committed offset 10 for partition t2-0
2018-06-03 00:29:46.543 DEBUG 12772 --- [       t2-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator  : Completed auto-commit of offsets {t2-0=OffsetAndMetadata{offset=10, metadata=''}} for group springboot-group1
2018-06-03 00:29:46.563 DEBUG 12772 --- [       t1-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator  : Group springboot-group1 committed offset 0 for partition t1-0
2018-06-03 00:29:46.563 DEBUG 12772 --- [       t1-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator  : Completed auto-commit of offsets {t1-0=OffsetAndMetadata{offset=0, metadata=''}} for group springboot-group1
2018-06-03 00:29:46.672 DEBUG 12772 --- [       t2-0-C-1] o.a.k.c.consumer.internals.Fetcher       : Sending fetch for partitions [t2-0] to broker izwz9c79fdwp9sb65vpyk3z:9092 (id: 0 rack: null)
2018-06-03 00:29:46.867 DEBUG 12772 --- [       t1-0-C-1] essageListenerContainer$ListenerConsumer : Received: 0 records
2018-06-03 00:29:46.872 DEBUG 12772 --- [       t2-0-C-1] essageListenerContainer$ListenerConsumer : Received: 0 records

验证

进行生产数据,成功执行!

你可能感兴趣的:(Kafka)