HBase的一个RegionServer莫名其妙的挂了,看了下日志,还是不明白。
-----------------------------------------------------------------------------------------------------
5:18:27.795 PM WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
Node /hbase/rs/0.0.0.0,60020,1411723106922 already deleted, retry=false
5:18:27.795 PM WARN org.apache.hadoop.hbase.regionserver.HRegionServer
Failed deleting my ephemeral node
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/rs/0.0.0.0,60020,1411723106922
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:156)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1265)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1254)
at org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1212)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:942)
at java.lang.Thread.run(Thread.java:744)
5:18:27.816 PM INFO org.apache.zookeeper.ZooKeeper
Session: 0x148b13fd9000018 closed
5:18:27.816 PM INFO org.apache.zookeeper.ClientCnxn
EventThread shut down
5:18:27.816 PM INFO org.apache.hadoop.hbase.regionserver.HRegionServer
stopping server null; zookeeper connection closed.
5:18:27.816 PM INFO org.apache.hadoop.hbase.regionserver.HRegionServer
regionserver60020 exiting
5:18:27.816 PM ERROR org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine
Region server exiting
java.lang.RuntimeException: HRegionServer Aborted
at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:66)
at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:85)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2320)
-----------------------------------------------------------------------------------------------------
正头大,日志往前翻,有了。
-----------------------------------------------------------------------------------------------------
5:18:27.634 PM FATAL org.apache.hadoop.hbase.regionserver.HRegionServer
Master rejected startup because clock is out of sync
org.apache.hadoop.hbase.ClockOutOfSyncException: org.apache.hadoop.hbase.ClockOutOfSyncException: Server bdtest04,60020,1411723106922 has been rejected; Reported time is too far out of sync with master. Time difference of 39663ms > max allowed of 30000ms
at org.apache.hadoop.hbase.master.ServerManager.checkClockSkew(ServerManager.java:314)
at org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:215)
at org.apache.hadoop.hbase.master.HMaster.regionServerStartup(HMaster.java:1291)
at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:5085)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175)
at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:277)
at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1935)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:781)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.ClockOutOfSyncException): org.apache.hadoop.hbase.ClockOutOfSyncException: Server bdtest04,60020,1411723106922 has been rejected; Reported time is too far out of sync with master. Time difference of 39663ms > max allowed of 30000ms
at org.apache.hadoop.hbase.master.ServerManager.checkClockSkew(ServerManager.java:314)
at org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:215)
at org.apache.hadoop.hbase.master.HMaster.regionServerStartup(HMaster.java:1291)
at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:5085)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175)
at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879)
at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1449)
at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1653)
at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1711)
at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:5402)
at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1933)
... 2 more
5:18:27.650 PM FATAL org.apache.hadoop.hbase.regionserver.HRegionServer
ABORTING region server 0.0.0.0,60020,1411723106922: Unhandled: org.apache.hadoop.hbase.ClockOutOfSyncException: Server bdtest04,60020,1411723106922 has been rejected; Reported time is too far out of sync with master. Time difference of 39663ms > max allowed of 30000ms
at org.apache.hadoop.hbase.master.ServerManager.checkClockSkew(ServerManager.java:314)
at org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:215)
at org.apache.hadoop.hbase.master.HMaster.regionServerStartup(HMaster.java:1291)
at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:5085)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175)
at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879)
org.apache.hadoop.hbase.ClockOutOfSyncException: org.apache.hadoop.hbase.ClockOutOfSyncException: Server bdtest04,60020,1411723106922 has been rejected; Reported time is too far out of sync with master. Time difference of 39663ms > max allowed of 30000ms
at org.apache.hadoop.hbase.master.ServerManager.checkClockSkew(ServerManager.java:314)
at org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:215)
at org.apache.hadoop.hbase.master.HMaster.regionServerStartup(HMaster.java:1291)
at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:5085)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175)
at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:277)
at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1935)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:781)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.ClockOutOfSyncException): org.apache.hadoop.hbase.ClockOutOfSyncException: Server bdtest04,60020,1411723106922 has been rejected; Reported time is too far out of sync with master. Time difference of 39663ms > max allowed of 30000ms
at org.apache.hadoop.hbase.master.ServerManager.checkClockSkew(ServerManager.java:314)
at org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:215)
at org.apache.hadoop.hbase.master.HMaster.regionServerStartup(HMaster.java:1291)
at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:5085)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175)
at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879)
at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1449)
at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1653)
at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1711)
at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:5402)
at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1933)
... 2 more
5:18:27.652 PM FATAL org.apache.hadoop.hbase.regionserver.HRegionServer
RegionServer abort: loaded coprocessors are: []
-----------------------------------------------------------------------------------------------------
很明显,是时间同步上有问题,查了一下RegionServer和Master两台机的时间,果然不一致。
猜测集群机子虽然是对同一台外部服务器进行对时,但是公众的国外对时服务器的原因。
对策就是Master机子对外对时,RegionServer跟Master进行对时。