hbase全分布式遇到的问题2--集群时间不同步

现象:start-abase.sh启动hbase集群后,web UI能访问到masternode:16010,但是过几秒钟后就发现原先的从节点的regionserver进程都自己死掉了,示意如下:

以某个从节点为例,刚开始时

#jps

14343 DataNode

7789   HQuorumPeer

78790 HRegionServer

...

过了几秒钟后

#jps

14343 DataNode

7789   HQuorumPeer

...

查询该节点的hbase--regionserver-.log(我这里是hbase-root-regionserver-hadoop.lsd3.com.log)如下:

2017-03-13 08:39:32,547 FATAL [regionserver/hadoop.lsd3.com/192.168.56.13:16020] regionserver.HRegionServer: Master rejected startup because clock is out of sync

org.apache.hadoop.hbase.ClockOutOfSyncException: org.apache.hadoop.hbase.ClockOutOfSyncException: Server hadoop.lsd3.com,16020,1489408762185 has been rejected; Reported time is too far out of sync with master.  Time difference of 3729977ms > max allowed of 30000ms

at org.apache.hadoop.hbase.master.ServerManager.checkClockSkew(ServerManager.java:409)

at org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:275)

at org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:361)

at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)

at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2180)

at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)

at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)

at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)

at java.lang.Thread.run(Thread.java:745)


at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)

at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

at java.lang.reflect.Constructor.newInstance(Constructor.java:408)

at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)

at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)

at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:329)

at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2298)

at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:906)

at java.lang.Thread.run(Thread.java:745)

Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.ClockOutOfSyncException): org.apache.hadoop.hbase.ClockOutOfSyncException: Server hadoop.lsd3.com,16020,1489408762185 has been rejected; Reported time is too far out of sync with master.  Time difference of 3729977ms > max allowed of 30000ms

at org.apache.hadoop.hbase.master.ServerManager.checkClockSkew(ServerManager.java:409)

at org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:275)

at org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:361)

at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)

at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2180)

at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)

at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)

at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)

at java.lang.Thread.run(Thread.java:745)


at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1267)

at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227)

at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:336)

at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:8982)

at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2296)

... 2 more


这还是比较好分辨的,开头错误的地方已经说了时间不同步,于是使用date命令检查一下各节点与master节点的时间是否一致,果然差好几个小时。

解决方法:

在每个节点中设置同样的时间:我这里是日期相同小时数不同而已,使用date -s xx:xx:xx就可以,因为要一个个设置不同的节点,时间肯定会差个几秒钟,没有关系的,从日志可以看出集群能容忍一定的时间差(这里是30000ms)

当然也可以用ntp来设置同步,不详述。



你可能感兴趣的:(hbase)