今天网上认识一妹子让我帮着解决问题,人家很信任的把自己的服务器账号给我了,所以花了一个晚上帮着解决。
首先配置文件:
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/usr/local/zookeeper/data
clientPort=2181
server.0=47.94.204.115:2888:3888
server.1=47.94.192.253:2888:3888
server.2=47.94.199.37:2888:3888
然后是:启动日志大面积异常:
2017-07-05 23:40:14,814 [myid:0] - WARN [WorkerSender[myid=0]:QuorumCnxManager@588] - Cannot open channel to 1 at election address /47.94.192.253:3888
java.net.ConnectException: 拒绝连接 (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:562)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:538)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:452)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:433)
at java.lang.Thread.run(Thread.java:745)
然后就是启动不起来,
再说解决办法:一波三折
首先我到47.94.192.253服务器上去查看netstat -nalp|java 发现端口如下
2181是zookeeper客户端连接的端口,所以进程号32143启动起来的,监听37271端口,但是zookeeper没有配置这个端口,而是配置2888,3888端口,正常情况下作为follower的时候是3888端口监听中,用于选举leader通讯。出现这个情况不得而知。重新启动该进程,上面一个端口号在不断的变化。至此问题是找到了,就是服务端进程没有监听配置的3888端口,而是监听了随机端口导致其它服务器进程无法与之通讯,所以看到了这个异常。
那么出现随机监听端口的原因要找到才能解决这个问题。我再次把日志文件重新打开发现在开头有这么一个异常:
2017-07-05 23:40:14,695 [myid:] - INFO [main:QuorumPeerConfig@134] - Reading configuration from: /usr/local/zookeeper/bin/../conf/zoo.cfg
2017-07-05 23:40:14,713 [myid:] - INFO [main:QuorumPeer$QuorumServer@167] - Resolved hostname: 47.94.192.253 to address: /47.94.192.253
2017-07-05 23:40:14,713 [myid:] - INFO [main:QuorumPeer$QuorumServer@167] - Resolved hostname: 47.94.204.115 to address: /47.94.204.115
2017-07-05 23:40:14,714 [myid:] - INFO [main:QuorumPeer$QuorumServer@167] - Resolved hostname: 47.94.199.37 to address: /47.94.199.37
2017-07-05 23:40:14,714 [myid:] - INFO [main:QuorumPeerConfig@396] - Defaulting to majority quorums
2017-07-05 23:40:14,721 [myid:0] - INFO [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3
2017-07-05 23:40:14,725 [myid:0] - INFO [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 0
2017-07-05 23:40:14,725 [myid:0] - INFO [main:DatadirCleanupManager@101] - Purge task is not scheduled.
2017-07-05 23:40:14,741 [myid:0] - INFO [main:QuorumPeerMain@127] - Starting quorum peer
2017-07-05 23:40:14,751 [myid:0] - INFO [main:NIOServerCnxnFactory@89] - binding to port 0.0.0.0/0.0.0.0:2181
2017-07-05 23:40:14,776 [myid:0] - INFO [main:QuorumPeer@1134] - minSessionTimeout set to -1
2017-07-05 23:40:14,776 [myid:0] - INFO [main:QuorumPeer@1145] - maxSessionTimeout set to -1
2017-07-05 23:40:14,777 [myid:0] - INFO [main:QuorumPeer@1419] - QuorumPeer communication is not secured!
2017-07-05 23:40:14,778 [myid:0] - INFO [main:QuorumPeer@1448] - quorum.cnxn.threads.size set to 20
2017-07-05 23:40:14,793 [myid:0] - INFO [ListenerThread:QuorumCnxManager$Listener@739] - My election bind port: /47.94.204.115:3888
2017-07-05 23:40:14,794 [myid:0] - ERROR [/47.94.204.115:3888:QuorumCnxManager$Listener@763] - Exception while listening
java.net.BindException: 无法指定被请求的地址 (Bind failed)
at java.net.PlainSocketImpl.socketBind(Native Method)
at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
at java.net.ServerSocket.bind(ServerSocket.java:375)
at java.net.ServerSocket.bind(ServerSocket.java:329)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:742)
2017-07-05 23:40:14,807 [myid:0] - INFO [QuorumPeer[myid=0]/0.0.0.0:2181:QuorumPeer@865] - LOOKING
2017-07-05 23:40:14,808 [myid:0] - INFO [QuorumPeer[myid=0]/0.0.0.0:2181:FastLeaderElection@818] - New election. My id = 0, proposed zxid=0x2
2017-07-05 23:40:14,810 [myid:0] - INFO [WorkerReceiver[myid=0]:FastLeaderElection@600] - Notification: 1 (message format version), 0 (n.leader), 0x2 (n.zxid), 0x1 (n.round), LOOKING (n.state), 0 (n.sid), 0x1 (n.peerEpoch) LOOKING (my state)
2017-07-05 23:40:14,814 [myid:0] - WARN [WorkerSender[myid=0]:QuorumCnxManager@588] - Cannot open channel to 1 at election address /47.94.192.253:3888
java.net.ConnectException: 拒绝连接 (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:562)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:538)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:452)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:433)
at java.lang.Thread.run(Thread.java:745)
前面有一个绑定异常,一般来说出现这个异常的是很常见的2种原因:
1.端口被占用
2.ip地址不是本机网卡。
刚刚看了,3888端口没有被占用,那么出现的原因就是第二个了,
使用ifconfig命令查看得到如下结果:
果然是第一个原因,不存在这个网卡。可能有的朋友就要问了,问什么通过ssh这个ip地址能登录上来呢、原因很简单,这是云服务器,云服务器采用虚拟化的技术,监听的网卡是属于物理网关的网卡,而虚拟化机内部自然没有这个网卡。
这个时候真正的原因找到了,解决办法就是让服务器进程监听0.0.0.0的ip地址,也就是监听所有网卡。
怎么办呢,官网上翻了翻没找到这个配置说明。于是把zookeeper的源码拷贝过来。找到QuorumCnxManager.java:742行
发现前边有一个listenOnAllIPs这个参数,如果他是true,那么问题就解决了。于是向上级跟踪。找到QuorumPeerConfig.java中
很明显了,配置文件有一个quorumListenOnAllIPs参数指定为true
问题就解决了。
服务器监听端口3888了,为所有节点增加配置项,问题得到解决