2019独角兽企业重金招聘Python工程师标准>>>
随着生产集群里应用的增多,zookeeper的压力越来越大,resourcemanager出现了异常挂起。
延迟问题
首先是“fsync-ing the write ahead log in SyncThread:3 took 1606ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide”
这个问题主要是日志同步延迟,一般是磁盘性能不好,log和data目录同磁盘,分开两个目录,设置一些参数可能有点用。
tickTime=4000 # The number of ticks that the initial # synchronization phase can take initLimit=20 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=10 |
CancelledKeyException
再次是zookeeper的CancelledKeyException,这个bug在CDH-5.11及之前(笔者写这篇文章的时候5.11是最新的版本)版本的zookeeper中并没有进行修复。
这个bug在在zookeeper集群负载较高时,可能导致其他使用zookeeper的服务(包括yarn、storm、kafka等)出现失去连接挂起的状态,所以需要打patch的。
https://issues.apache.org/jira/browse/ZOOKEEPER-1237
diff -uwp zookeeper-3.4.5/src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java.ZK1237 zookeeper-3.4.5/src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java --- zookeeper-3.4.5/src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java.ZK1237 2012-09-30 10:53:32.000000000 -0700 +++ zookeeper-3.4.5/src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java 2013-08-07 13:20:19.227152865 -0700 @@ -150,7 +150,8 @@ public class NIOServerCnxn extends Serve // We check if write interest here because if it is NOT set, // nothing is queued, so we can try to send the buffer right // away without waking up the selector - if ((sk.interestOps() & SelectionKey.OP_WRITE) == 0) { + if (sk.isValid() && + (sk.interestOps() & SelectionKey.OP_WRITE) == 0) { try { sock.write(bb); } catch (IOException e) { @@ -214,14 +215,18 @@ public class NIOServerCnxn extends Serve return; } - if (k.isReadable()) { + if (k.isValid() && k.isReadable()) { int rc = sock.read(incomingBuffer); if (rc < 0) { - throw new EndOfStreamException( + if (LOG.isDebugEnabled()) { + LOG.debug( "Unable to read additional data from client sessionid 0x" + Long.toHexString(sessionId) + ", likely client has closed socket"); } + close(); + return; + } if (incomingBuffer.remaining() == 0) { boolean isPayload; if (incomingBuffer == lenBuffer) { // start of next request @@ -242,7 +247,7 @@ public class NIOServerCnxn extends Serve } } } - if (k.isWritable()) { + if (k.isValid() && k.isWritable()) { // ZooLog.logTraceMessage(LOG, // ZooLog.CLIENT_DATA_PACKET_TRACE_MASK // "outgoingBuffers.size() = " + |
日志
zookeeper日志
2017-06-15 03:09:03,098 [myid:3] - WARN [SyncThread:3:FileTxnLog@321] - fsync-ing the write ahead log in SyncThread:3 took 1506ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide 2017-06-15 03:09:05,098 [myid:3] - WARN [SyncThread:3:FileTxnLog@321] - fsync-ing the write ahead log in SyncThread:3 took 1606ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide 2017-06-15 03:09:05,744 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException when processing sessionid:0x35c9bd7493c22c0 type:create cxid:0x2 zxid:0x4f003006ef txntype:-1 reqpath:n/a Error Path:/hive_zookeeper_namespace/zb_ods Error:KeeperErrorCode = NodeExists for /hive_zookeeper_namespace/zb_ods 2017-06-15 03:09:09,952 [myid:3] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x25c9bd75af70091, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2017-06-15 03:09:12,076 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /133.0.91.42:46579 which had sessionid 0x25c9bd75af70091 2017-06-15 03:09:12,076 [myid:3] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x35c9bd7493c22ac, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2017-06-15 03:09:12,076 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /133.0.91.41:53351 which had sessionid 0x35c9bd7493c22ac 2017-06-15 03:09:12,076 [myid:3] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x35c9bd7493c2268, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2017-06-15 03:09:12,075 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException when processing sessionid:0x25c9bd75af70091 type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:null Error:KeeperErrorCode = Session moved 2017-06-15 03:09:12,083 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException when processing sessionid:0x35c9bd7493c2268 type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:null Error:KeeperErrorCode = Session moved 2017-06-15 03:09:12,083 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@574] - Got user-level KeeperException when processing sessionid:0x35c9bd7493c2268 type:multi cxid:0x7e06 zxid:0x4f003006f2 txntype:-1 reqpath:n/a aborting remaining multi ops. Error Path:null Error:KeeperErrorCode = Session moved 2017-06-15 03:09:12,087 [myid:3] - WARN [SyncThread:3:FileTxnLog@321] - fsync-ing the write ahead log in SyncThread:3 took 6988ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide 2017-06-15 03:09:14,136 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /133.0.91.42:46580 which had sessionid 0x35c9bd7493c2268 2017-06-15 03:09:14,136 [myid:3] - ERROR [CommitProcessor:3:NIOServerCnxn@180] - Unexpected Exception: java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77) at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:153) at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1076) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:170) at org.apache.zookeeper.server.quorum.Leader$ToBeAppliedRequestProcessor.processRequest(Leader.java:634) at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74) 2017-06-15 03:09:14,137 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /133.0.91.42:47513 2017-06-15 03:09:14,341 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client attempting to renew session 0x35c9bd7493c2268 at /133.0.91.42:47513 2017-06-15 03:09:14,341 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@595] - Established session 0x35c9bd7493c2268 with negotiated timeout 10000 for client /133.0.91.42:47513 2017-06-15 03:09:14,342 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@863] - got auth packet /133.0.91.42:47513 2017-06-15 03:09:14,342 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@897] - auth success /133.0.91.42:47513 2017-06-15 03:09:14,342 [myid:3] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x35c9bd7493c2268 due to java.io.IOException: Len error 1673753 2017-06-15 03:09:14,342 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /133.0.91.42:47513 which had sessionid 0x35c9bd7493c2268 2017-06-15 03:09:14,341 [myid:3] - ERROR [CommitProcessor:3:NIOServerCnxn@180] - Unexpected Exception: java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77) at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:153) at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1076) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:404) at org.apache.zookeeper.server.quorum.Leader$ToBeAppliedRequestProcessor.processRequest(Leader.java:634) at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74) 2017-06-15 03:09:14,433 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException when processing sessionid:0x35c9bd7493c22c0 type:create cxid:0x5 zxid:0x4f003006fd txntype:-1 reqpath:n/a Error |
resourcemanager日志:
2017-06-15 03:09:11,943 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session connected. 2017-06-15 03:09:11,946 INFO org.apache.hadoop.ha.ActiveStandbyElector: Checking for any old active which needs to be fenced... 2017-06-15 03:09:11,947 INFO org.apache.hadoop.ha.ActiveStandbyElector: Old node exists: 0a036265681203726d32 2017-06-15 03:09:11,947 INFO org.apache.hadoop.ha.ActiveStandbyElector: But old node has our own data, so don't need to fence it. 2017-06-15 03:09:11,947 INFO org.apache.hadoop.ha.ActiveStandbyElector: Writing znode /yarn-leader-election/beh/ActiveBreadCrumb to indicate that the local node is the most recent active... 2017-06-15 03:09:11,953 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x35c9bd7493c2268, likely server has closed socket, closing socket connection and attempting reconnect 2017-06-15 03:09:12,051 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: checking for deactivate... 2017-06-15 03:09:12,361 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server had02.hadoop/133.0.91.42:2181. Will not attempt to authenticate using SASL (unknown error) 2017-06-15 03:09:12,361 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to had02.hadoop/133.0.91.42:2181, initiating session 2017-06-15 03:09:12,387 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server had02.hadoop/133.0.91.42:2181, sessionid = 0x35c9bd7493c2268, negotiated timeout = 10000 2017-06-15 03:09:12,392 WARN org.apache.zookeeper.ClientCnxn: Session 0x35c9bd7493c2268 for server had02.hadoop/133.0.91.42:2181, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Broken pipe at sun.nio.ch.FileDispatcherImpl.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) at sun.nio.ch.IOUtil.write(IOUtil.java:65) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487) at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1075) 2017-06-15 03:09:12,534 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: checking for deactivate... 2017-06-15 03:09:12,678 INFO org.apache.hadoop.conf.Configuration: found resource yarn-site.xml at file:/opt/beh/core/hadoop/etc/hadoop/yarn-site.xml 2017-06-15 03:09:12,681 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hadoop OPERATION=refreshAdminAcls TARGET=AdminService RESULT=SUCCESS 2017-06-15 03:09:12,681 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Already in active state 2017-06-15 03:09:12,681 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hadoop OPERATION=refreshQueues TARGET=AdminService RESULT=SUCCESS 2017-06-15 03:09:12,681 INFO org.apache.hadoop.conf.Configuration: found resource yarn-site.xml at file:/opt/beh/core/hadoop/etc/hadoop/yarn-site.xml 2017-06-15 03:09:12,683 INFO org.apache.hadoop.util.HostsFileReader: Setting the includes file to 2017-06-15 03:09:12,683 INFO org.apache.hadoop.util.HostsFileReader: Setting the excludes file to 2017-06-15 03:09:12,683 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list 2017-06-15 03:09:12,683 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hadoop OPERATION=refreshNodes TARGET=AdminService RESULT=SUCCESS 2017-06-15 03:09:12,683 INFO org.apache.hadoop.conf.Configuration: found resource core-site.xml at file:/opt/beh/core/hadoop/etc/hadoop/core-site.xml 2017-06-15 03:09:12,685 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hadoop OPERATION=refreshSuperUserGroupsConfiguration TARGET=AdminService RESULT=SUCCESS 2017-06-15 03:09:12,685 INFO org.apache.hadoop.conf.Configuration: found resource core-site.xml at file:/opt/beh/core/hadoop/etc/hadoop/core-site.xml 2017-06-15 03:09:12,685 INFO org.apache.hadoop.security.Groups: clearing userToGroupsMap cache 2017-06-15 03:09:12,685 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hadoop OPERATION=refreshUserToGroupsMappings TARGET=AdminService RESULT=SUCCESS 2017-06-15 03:09:12,685 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hadoop OPERATION=transitionToActive TARGET=RMHAProtocolService RESULT=SUCCESS 2017-06-15 03:09:12,876 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server had03.hadoop/133.0.91.43:2181. Will not attempt to authenticate using SASL (unknown error) 2017-06-15 03:09:12,877 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to had03.hadoop/133.0.91.43:2181, initiating session 2017-06-15 03:09:13,535 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_1497466013139_0394_000001 (auth:SIMPLE) 2017-06-15 03:09:13,539 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: AM registration appattempt_1497466013139_0394_000001 2017-06-15 03:09:13,539 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hadoop IP=133.0.91.43 OPERATION=Register App Master TARGET=ApplicationMasterService RESULT=SUCCESS APPID=application_1497466013139_0394 APPATTEMPTID=appattempt_1497466013139_0394_000001 2017-06-15 03:09:14,920 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server had03.hadoop/133.0.91.43:2181, sessionid = 0x35c9bd7493c2268, negotiated timeout = 10000 2017-06-15 03:09:14,927 WARN org.apache.zookeeper.ClientCnxn: Session 0x35c9bd7493c2268 for server had03.hadoop/133.0.91.43:2181, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Broken pipe at sun.nio.ch.FileDispatcherImpl.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) at sun.nio.ch.IOUtil.write(IOUtil.java:65) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487) at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1075) 2017-06-15 03:09:15,028 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Waiting for zookeeper to be connected, retry no. + 2 2017-06-15 03:09:15,077 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server had01.hadoop/133.0.91.41:2181. Will not attempt to authenticate using SASL (unknown error) 2017-06-15 03:09:15,078 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to had01.hadoop/133.0.91.41:2181, initiating session 2017-06-15 03:09:15,100 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server had01.hadoop/133.0.91.41:2181, sessionid = 0x35c9bd7493c2268, negotiated timeout = 10000 2017-06-15 03:09:15,105 WARN org.apache.zookeeper.ClientCnxn: Session 0x35c9bd7493c2268 for server had01.hadoop/133.0.91.41:2181, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Broken pipe at sun.nio.ch.FileDispatcherImpl.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) at sun.nio.ch.IOUtil.write(IOUtil.java:65) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487) at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1075) 2017-06-15 03:09:15,588 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new applicationId: 397 2017-06-15 03:09:15,772 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new applicationId: 398 2017-06-15 03:09:16,091 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server had02.hadoop/133.0.91.42:2181. Will not attempt to authenticate using SASL (unknown error) 2017-06-15 03:09:16,092 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to had02.hadoop/133.0.91.42:2181, initiating session 2017-06-15 03:09:16,118 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server had02.hadoop/133.0.91.42:2181, sessionid = 0x35c9bd7493c2268, negotiated timeout = 10000 2017-06-15 03:09:16,124 WARN org.apache.zookeeper.ClientCnxn: Session 0x35c9bd7493c2268 for server had02.hadoop/133.0.91.42:2181, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Broken pipe at sun.nio.ch.FileDispatcherImpl.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) at sun.nio.ch.IOUtil.write(IOUtil.java:65) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487) at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1075) 2017-06-15 03:09:16,315 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server had03.hadoop/133.0.91.43:2181. Will not attempt to authenticate using SASL (unknown error) 2017-06-15 03:09:16,315 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to had03.hadoop/133.0.91.43:2181, initiating session 2017-06-15 03:09:16,339 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server had03.hadoop/133.0.91.43:2181, sessionid = 0x35c9bd7493c2268, negotiated timeout = 10000 2017-06-15 03:09:16,344 WARN org.apache.zookeeper.ClientCnxn: Session 0x35c9bd7493c2268 for server had03.hadoop/133.0.91.43:2181, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Broken pipe at sun.nio.ch.FileDispatcherImpl.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) at sun.nio.ch.IOUtil.write(IOUtil.java:65) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487) at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1075) 2017-06-15 03:09:17,176 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server had01.hadoop/133.0.91.41:2181. Will not attempt to authenticate using SASL (unknown error) 2017-06-15 03:09:17,177 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to had01.hadoop/133.0.91.41:2181, initiating session 2017-06-15 03:09:17,201 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server had01.hadoop/133.0.91.41:2181, sessionid = 0x35c9bd7493c2268, negotiated timeout = 10000 2017-06-15 03:09:17,206 WARN org.apache.zookeeper.ClientCnxn: Session 0x35c9bd7493c2268 for server had01.hadoop/133.0.91.41:2181, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Broken pipe at sun.nio.ch.FileDispatcherImpl.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) at sun.nio.ch.IOUtil.write(IOUtil.java:65) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487) at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1075) 2017-06-15 03:09:17,307 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Waiting for zookeeper to be connected, retry no. + 3 2017-06-15 03:09:17,329 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server had02.hadoop/133.0.91.42:2181. Will not attempt to authenticate using SASL (unknown error) |