storm1.0.2使用中遇到的问题

1，启动的时候报错

2015-12-03 12:28:53.338 b.s.m.n.Client [ERROR] connection attempt 10 to Netty-Client-host1.grid.myco.com/10.1.2.3:6710 failed: java.net.ConnectException: Connection refused: host1.grid.myco.com/10.1.2.3:6710

开始的时候没有注意到这个错误信息，应为拓扑发布后是正常使用的。只是在无意间看到日志中有error级别的错误日志。
分析：
1，感觉不是致命的错误，根据内容查看了相应端口是否可用。
2，考虑到官网中写明不支持ip6，检查机子将ip6关闭，但是还是有这样的问题。
3，考虑到启动的顺序问题，应为查看到一个并行度为4的spout中，有3个worker日志中是有错误信息的，另外一个是没有的。
4，看到 http://stackoverflow.com/questions/36612557/aws-workers-cant-communicate-due-to-netty-client-hostname-resolution/39104515#39104515 有人已经提出过这个。

2,拓扑提交后，报错

Consumer has failed with exception: org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed due to group rebalance

storm ui 出现的错误信息，但是消息还是可以消费，看到spout中的消费进程确实有的时间不对了，证明有过balance。
分析：
1，错误信息写的明白，是应为提交offset的时候做了rebalance过程，rebalance触发的条件，在http://zqhxuyuan.github.io/2016/10/27/Kafka-Definitive-Guide-cn-04/ 这里写的很明确，但是好端端的为什么要rebalance。
2，考虑重要是通信超时，consumer 在通kafka中某个broker通信的时候。需要增加超时时间。
3，http://stackoverflow.com/questions/35658171/kafka-commitfailedexception-consumer-exception
提到了这样的错误。也可以参考下 http://blog.csdn.net/weitry/article/details/53009134 ，但是参数max.poll.records （1.×）需要注意版本问题。

解决方式：

增加消费超时时间。消费超时时间通过heartbeat.interval.ms设置，heartbeat.interval.ms的大小不能超过session.timeout.ms，session.timeout.ms必须在[group.min.session.timeout.ms, group.max.session.timeout.ms]范围内。
减少消息处理时间；由后端处理决定。
减少一次消费的消息量。max.partition.fetch.bytes决定容量，max.poll.records（1.×）决定数量。max.partition.fetch.bytes规定了一个partition一次pull获取的获取的数据大小。max.poll.records规定一次pull获取的消息数量。

storm1.0.2使用中遇到的问题

你可能感兴趣的:(storm1.0.2使用中遇到的问题)