Storm排错调优之SessionTimeout

                                                                              Storm排错调优之SessionTimeout

 

在Storm的日志和zk的日志中均有如下连接超时信息:

Unable to read additional data from client sessionid 0x364f4b88098081e, likely client has closed socket

zk:

2018-08-15 22:44:36,552 [myid:2] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x264f4b839520805, likely client has closed socket
	at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:239)
	at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
	at java.lang.Thread.run(Thread.java:748)
2018-08-15 22:44:36,552 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1044] - Closed socket connection for client /xxx.xxx.xxx.xxx:40611 which had sessionid 0x264f4b839520805
2018-08-15 22:44:36,552 [myid:2] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x364f4b88098081e, likely client has closed socket
	at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:239)
	at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
	at java.lang.Thread.run(Thread.java:748)
2018-08-15 22:44:36,552 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1044] - Closed socket connection for client /xxx.xxx.xxx.xxx:57330 which had sessionid 0x364f4b88098081e
2018-08-15 22:44:36,552 [myid:2] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x264f4b839520805, likely client has closed socket
	at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:239)
	at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
	at java.lang.Thread.run(Thread.java:748)
2018-08-15 22:44:36,552 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1044] - Closed socket connection for client /xxx.xxx.xxx.83:40611 which had sessionid 0x264f4b839520805
2018-08-15 22:44:36,552 [myid:2] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x364f4b88098081e, likely client has closed socket
	at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:239)
	at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
	at java.lang.Thread.run(Thread.java:748)
2018-08-15 22:44:36,552 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1044] - Closed socket connection for client /xxx.xxx.xxx.83:57330 which had sessionid 0x364f4b88098081e

zk端和客户端失去session连接,session号为0x364f4b88098081e   连接机器为xxx.xxx.xxx.83

相应的storm的日志中,也会输出同样的unable信息,Unable to read additional data from server xxxx ...

由此可以看到storm 和 zk之间的session超时。

那如何设置超时时间?  设置超时时间为多大?

首先看zk端的配置:

#设置一个时间单元为8000
tickTime=8000
initLimit=20
syncLimit=10

设置时间单元为8000 ,如果不设置zk的另外两个参数:

minSessionTimeout

maxSessionTimeout

不设置这两个参数,默认的session超时时间为2*  tickTime ~ 20 * tickTime ,也就是16000ms --- 160000ms

此时如果客户端设置的session超时时间不在这个范围内(<16000ms   或者  >160000ms),则会重置为最大或者最小超时时间。

查看storm-core-1.1.0.jar下面的org.apache.storm.shade.org.apache.zookeeper.ClientCnxn类

void onConnected(int _negotiatedSessionTimeout, long _sessionId, byte[] _sessionPasswd, boolean isRO) throws IOException {
            ClientCnxn.this.negotiatedSessionTimeout = _negotiatedSessionTimeout;
            if(ClientCnxn.this.negotiatedSessionTimeout <= 0) {
                ClientCnxn.this.state = States.CLOSED;
                ClientCnxn.this.eventThread.queueEvent(new WatchedEvent(EventType.None, KeeperState.Expired, (String)null));
                ClientCnxn.this.eventThread.queueEventOfDeath();
                throw new ClientCnxn.SessionExpiredException("Unable to reconnect to ZooKeeper service, session 0x" + Long.toHexString(ClientCnxn.this.sessionId) + " has expired");
            } else {
                if(!ClientCnxn.this.readOnly && isRO) {
                    ClientCnxn.LOG.error("Read/write client got connected to read-only server");
                }

                ClientCnxn.this.readTimeout = ClientCnxn.this.negotiatedSessionTimeout * 2 / 3;
                ClientCnxn.this.connectTimeout = ClientCnxn.this.negotiatedSessionTimeout / ClientCnxn.this.hostProvider.size();
                ClientCnxn.this.hostProvider.onConnected();
                ClientCnxn.this.sessionId = _sessionId;
                ClientCnxn.this.sessionPasswd = _sessionPasswd;
                ClientCnxn.this.state = isRO?States.CONNECTEDREADONLY:States.CONNECTED;
                ClientCnxn.this.seenRwServerBefore |= !isRO;
                ClientCnxn.LOG.info("Session establishment complete on server " + this.clientCnxnSocket.getRemoteSocketAddress() + ", sessionid = 0x" + Long.toHexString(ClientCnxn.this.sessionId) + ", negotiated timeout = " + ClientCnxn.this.negotiatedSessionTimeout + (isRO?" (READ-ONLY mode)":""));
                KeeperState eventState = isRO?KeeperState.ConnectedReadOnly:KeeperState.SyncConnected;
                ClientCnxn.this.eventThread.queueEvent(new WatchedEvent(EventType.None, eventState, (String)null));
            }
        }
public int getSessionTimeout() {
        return this.negotiatedSessionTimeout;
    }

此时已经断开连接negotiatedSessionTimeout<= 0 所以就会抛出Unable to reconnect to ZooKeeper service, session 0x 的异常 

类中run方法:

        public void run() {
            this.clientCnxnSocket.introduce(this, ClientCnxn.this.sessionId);
            this.clientCnxnSocket.updateNow();
            this.clientCnxnSocket.updateLastSendAndHeard();
            long lastPingRwServer = System.currentTimeMillis();
            boolean var3 = true;

            while(ClientCnxn.this.state.isAlive()) {
                try {
                    if(!this.clientCnxnSocket.isConnected()) {
                        if(!this.isFirstConnect) {
                            try {
                                Thread.sleep((long)this.r.nextInt(1000));
                            } catch (InterruptedException var11) {
                                ClientCnxn.LOG.warn("Unexpected exception", var11);
                            }
                        }

                        if(ClientCnxn.this.closing || !ClientCnxn.this.state.isAlive()) {
                            break;
                        }

                        this.startConnect();
                        this.clientCnxnSocket.updateLastSendAndHeard();
                    }

                    int to;
                    if(ClientCnxn.this.state.isConnected()) {
                        if(ClientCnxn.this.zooKeeperSaslClient != null) {
                            boolean sendAuthEvent = false;
                            if(ClientCnxn.this.zooKeeperSaslClient.getSaslState() == SaslState.INITIAL) {
                                try {
                                    ClientCnxn.this.zooKeeperSaslClient.initialize(ClientCnxn.this);
                                } catch (SaslException var10) {
                                    ClientCnxn.LOG.error("SASL authentication with Zookeeper Quorum member failed: " + var10);
                                    ClientCnxn.this.state = States.AUTH_FAILED;
                                    sendAuthEvent = true;
                                }
                            }

                            KeeperState authState = ClientCnxn.this.zooKeeperSaslClient.getKeeperState();
                            if(authState != null) {
                                if(authState == KeeperState.AuthFailed) {
                                    ClientCnxn.this.state = States.AUTH_FAILED;
                                    sendAuthEvent = true;
                                } else if(authState == KeeperState.SaslAuthenticated) {
                                    sendAuthEvent = true;
                                }
                            }

                            if(sendAuthEvent) {
                                ClientCnxn.this.eventThread.queueEvent(new WatchedEvent(EventType.None, authState, (String)null));
                            }
                        }

                        to = ClientCnxn.this.readTimeout - this.clientCnxnSocket.getIdleRecv();
                    } else {
                        to = ClientCnxn.this.connectTimeout - this.clientCnxnSocket.getIdleRecv();
                    }

                    if(to <= 0) {
                        throw new ClientCnxn.SessionTimeoutException("Client session timed out, have not heard from server in " + this.clientCnxnSocket.getIdleRecv() + "ms" + " for sessionid 0x" + Long.toHexString(ClientCnxn.this.sessionId));
                    }

                    if(ClientCnxn.this.state.isConnected()) {
                        int timeToNextPing = ClientCnxn.this.readTimeout / 2 - this.clientCnxnSocket.getIdleSend() - (this.clientCnxnSocket.getIdleSend() > 1000?1000:0);
                        if(timeToNextPing > 0 && this.clientCnxnSocket.getIdleSend() <= 10000) {
                            if(timeToNextPing < to) {
                                to = timeToNextPing;
                            }
                        } else {
                            this.sendPing();
                            this.clientCnxnSocket.updateLastSend();
                        }
                    }

                    if(ClientCnxn.this.state == States.CONNECTEDREADONLY) {
                        long now = System.currentTimeMillis();
                        int idlePingRwServer = (int)(now - lastPingRwServer);
                        if(idlePingRwServer >= this.pingRwTimeout) {
                            lastPingRwServer = now;
                            idlePingRwServer = 0;
                            this.pingRwTimeout = Math.min(2 * this.pingRwTimeout, '\uea60');
                            this.pingRwServer();
                        }

                        to = Math.min(to, this.pingRwTimeout - idlePingRwServer);
                    }

                    this.clientCnxnSocket.doTransport(to, ClientCnxn.this.pendingQueue, ClientCnxn.this.outgoingQueue, ClientCnxn.this);
                } catch (Throwable var12) {
                    if(ClientCnxn.this.closing) {
                        if(ClientCnxn.LOG.isDebugEnabled()) {
                            ClientCnxn.LOG.debug("An exception was thrown while closing send thread for session 0x" + Long.toHexString(ClientCnxn.this.getSessionId()) + " : " + var12.getMessage());
                        }
                        break;
                    }

                    if(var12 instanceof ClientCnxn.SessionExpiredException) {
                        ClientCnxn.LOG.info(var12.getMessage() + ", closing socket connection");
                    } else if(var12 instanceof ClientCnxn.SessionTimeoutException) {
                        ClientCnxn.LOG.info(var12.getMessage() + ", closing socket connection and attempting reconnect");
                    } else if(var12 instanceof ClientCnxn.EndOfStreamException) {
                        ClientCnxn.LOG.info(var12.getMessage() + ", closing socket connection and attempting reconnect");
                    } else if(var12 instanceof ClientCnxn.RWServerFoundException) {
                        ClientCnxn.LOG.info(var12.getMessage());
                    } else {
                        ClientCnxn.LOG.warn("Session 0x" + Long.toHexString(ClientCnxn.this.getSessionId()) + " for server " + this.clientCnxnSocket.getRemoteSocketAddress() + ", unexpected error" + ", closing socket connection and attempting reconnect", var12);
                    }

                    this.cleanup();
                    if(ClientCnxn.this.state.isAlive()) {
                        ClientCnxn.this.eventThread.queueEvent(new WatchedEvent(EventType.None, KeeperState.Disconnected, (String)null));
                    }

                    this.clientCnxnSocket.updateNow();
                    this.clientCnxnSocket.updateLastSendAndHeard();
                }
            }

            this.cleanup();
            this.clientCnxnSocket.close();
            if(ClientCnxn.this.state.isAlive()) {
                ClientCnxn.this.eventThread.queueEvent(new WatchedEvent(EventType.None, KeeperState.Disconnected, (String)null));
            }

            ZooTrace.logTraceMessage(ClientCnxn.LOG, ZooTrace.getTextTraceLevel(), "SendThread exitedloop.");
        }

在to<0的情况下就会输出 Client session timed out, have not heard from server in .....

由此:

storm端的超时时间设置为120000ms 

storm.zookeeper.session.timeout: 120000

storm.zookeeper.connection.timeout: 90000

kafka端的超时时间设置为120000ms

zookeeper.session.timeout.ms=120000
zookeeper.connection.timeout.ms=60000
源码中:to = ClientCnxn.this.readTimeout - this.clientCnxnSocket.getIdleRecv();

其中 readTimeout = sessionTimeout * 2 / 3      //也就是我们设置的120000 * 2 / 3  = 80000ms

其中getIdleRecv  //也就是当前时间减去上一次心跳时间。

如果to小于0 就会抛出异常Client session timed out, have not heard from server in .....

org.apache.storm.shade.org.apache.zookeeper.ClientCnxnSocket

int getIdleRecv() {
        return (int)(this.now - this.lastHeard);
}

 

生效之后会输出:

               Established session 0x164f4b8390b0b88 with negotiated timeout 120000 for client /xxx.xxx.xxxx.83:44569

 

你可能感兴趣的:(Storm,zookeeper)