Hbase不稳定,分析日志发现,归纳总结,目前发现共存在两个问题,一个就是上篇博客提到的问题,还有个问题就是zookeeper的问题
我的异常输出为:
2010-10-28 00:36:49,573 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x9d2be33dbe860005 to sun.nio.ch.SelectionKeyImpl@6655bb93
java.io.IOException: Read error rc = -1 java.nio.DirectByteBuffer[pos=0 lim=4 cap=4]
at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:701)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945)
2010-10-28 00:36:49,602 WARN org.apache.zookeeper.ClientCnxn: Ignoring exception during shutdown input
java.net.SocketException: Transport endpoint is not connected
at sun.nio.ch.SocketChannelImpl.shutdown(Native Method)
at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:658)
at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:378)
at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:999)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970)
2010-10-28 00:36:49,602 WARN org.apache.zookeeper.ClientCnxn: Ignoring exception during shutdown output
java.net.SocketException: Transport endpoint is not connected
at sun.nio.ch.SocketChannelImpl.shutdown(Native Method)
at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:669)
at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:386)
at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1004)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970)
2010-10-28 00:36:49,622 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Attempt=1
org.apache.hadoop.hbase.Leases$LeaseStillHeldException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:532)
at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94)
at org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48)
at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:549)
at java.lang.Thread.run(Thread.java:636)
2010-10-28 00:36:49,703 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, state: Disconnected, type: None, path: null
2010-10-28 00:36:50,505 INFO org.apache.zookeeper.ClientCnxn: Attempting connection to server /192.168.5.151:2181
2010-10-28 00:36:50,505 INFO org.apache.zookeeper.ClientCnxn: Priming connection to java.nio.channels.SocketChannel[connected local=/192.168.5.156:49407 remote=/192.168.5.151:2181]
2010-10-28 00:36:50,506 INFO org.apache.zookeeper.ClientCnxn: Server connection successful
2010-10-28 00:36:50,507 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x9d2be33dbe860005 to sun.nio.ch.SelectionKeyImpl@335819e4
java.io.IOException: Session Expired
at org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(ClientCnxn.java:589)
at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:709)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945)
2010-10-28 00:36:50,507 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, state: Expired, type: None, path: null
2010-10-28 00:36:50,507 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: ZooKeeper session expired
然后,区域节点就会exit!
对了,介绍个好地方哦:http://wiki.apache.org/hadoop/Hbase/Troubleshooting
这里会有些你碰到的问题,对了还个汇总的地方,也介绍下:http://bbs.hadoopor.com/thread-71-1-1.html
说说解决方法:
设置zookeeper的过期时间长一点,默认的过期时间(zookeeper.session.timeout )是60秒,参看这里: http://hbase.apache.org/docs/r0.20.6/hbase-conf.html
他和另外个因素(hbase.zookeeper.property.tickTime )配合使用。
我设置如下:
<property> <name>zookeeper.session.timeout</name> <value>90000</value> </property> <property> <name>hbase.zookeeper.property.tickTime</name> <value>9000</value> </property>
在段时间内,没有发现再出问题,不知道是否根解了。
另外注意细读理解这里的列出来的几点:
<property> <name>zookeeper.session.timeout</name> <value>1200000</value> </property> <property> <name>hbase.zookeeper.property.tickTime</name> <value>6000</value> </property>
If this is happening during an upload which only happens once (like initially loading all your data into HBase), consider importing into HFiles directly .
HBase ships with some GC tuning, for more information see Performance Tuning .