今天在启动HBase的时候莫名的HMaster启动不起来,查看日志后发现这个错误:
2018-09-06 23:05:49,385 FATAL [master:linux201:60000] master.HMaster: Unhandled exception. Starting shutdown.
org.apache.hadoop.hbase.TableExistsException: hbase:namespace
at org.apache.hadoop.hbase.master.handler.CreateTableHandler.prepare(CreateTableHandler.java:133)
at org.apache.hadoop.hbase.master.TableNamespaceManager.createNamespaceTable(TableNamespaceManager.java:232)
at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:86)
at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1046)
at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:925)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:605)
at java.lang.Thread.run(Thread.java:748)
2018-09-06 23:05:49,386 INFO [master:linux201:60000] master.HMaster: Aborting
2018-09-06 23:05:49,387 DEBUG [master:linux201:60000] master.HMaster: Stopping service threads
2018-09-06 23:05:49,387 INFO [linux201,60000,1536246333803-BalancerChore] balancer.BalancerChore: linux201,60000,1536246333803-BalancerChore exiting
2018-09-06 23:05:49,387 INFO [master:linux201:60000] ipc.RpcServer: Stopping server on 60000
2018-09-06 23:05:49,387 INFO [linux201,60000,1536246333803-ClusterStatusChore] balancer.ClusterStatusChore: linux201,60000,1536246333803-ClusterStatusChore exiting
2018-09-06 23:05:49,387 INFO [CatalogJanitor-linux201:60000] master.CatalogJanitor: CatalogJanitor-linux201:60000 exiting
2018-09-06 23:05:49,389 INFO [master:linux201:60000.archivedHFileCleaner] cleaner.HFileCleaner: master:linux201:60000.archivedHFileCleaner exiting
2018-09-06 23:05:49,388 INFO [RpcServer.responder] ipc.RpcServer: RpcServer.responder: stopped
2018-09-06 23:05:49,389 INFO [RpcServer.responder] ipc.RpcServer: RpcServer.responder: stopping
2018-09-06 23:05:49,388 INFO [master:linux201:60000] master.HMaster: Stopping infoServer
2018-09-06 23:05:49,387 INFO [RpcServer.listener,port=60000] ipc.RpcServer: RpcServer.listener,port=60000: stopping
2018-09-06 23:05:49,389 INFO [master:linux201:60000.oldLogCleaner] cleaner.LogCleaner: master:linux201:60000.oldLogCleaner exiting
2018-09-06 23:05:49,392 INFO [master:linux201:60000.oldLogCleaner] master.ReplicationLogCleaner: Stopping replicationLogCleaner-0x165af57ee71000b, quorum=linux202:2181,linux201:2181,linux203:2181, baseZNode=/hbase
2018-09-06 23:05:49,405 INFO [master:linux201:60000.oldLogCleaner] zookeeper.ZooKeeper: Session: 0x165af57ee71000b closed
2018-09-06 23:05:49,408 INFO [master:linux201:60000-EventThread] zookeeper.ClientCnxn: EventThread shut down
2018-09-06 23:05:49,408 INFO [master:linux201:60000] mortbay.log: Stopped [email protected]:60010
2018-09-06 23:05:49,420 DEBUG [master:linux201:60000] catalog.CatalogTracker: Stopping catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@23a6deb
2018-09-06 23:05:49,421 INFO [master:linux201:60000] client.HConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x165af57ee71000a
2018-09-06 23:05:49,430 INFO [master:linux201:60000] zookeeper.ZooKeeper: Session: 0x165af57ee71000a closed
2018-09-06 23:05:49,431 INFO [master:linux201:60000-EventThread] zookeeper.ClientCnxn: EventThread shut down
于是一步步排查问题,首先一般出现HBase启动失败会有以下几个主要的原因:
1、Zookeeper问题,要么是ZK启动出现异常,这时需要检查ZK三个节点进程是否都正常
2、HBase集群之间同步的问题,如果集群之间的节点时间不同步,会导致 regionserver 无法启动,抛出ClockOutOfSyncException 异常,解决办法,将HBase集群之间的时间同步,具体同步自行网上百度,还需要将HBase的hbase-site.xml增加一个属性:hbase.master.maxclockskew 设置更大的值
hbase.master.maxclockskew
180000
Time difference of regionserver from master
3、再一个就是hadoop集群的问题,也需要去检查集群是否健康
但小编遇到的这个问题比较奇葩,上述的这几个问题都是可以的,集群都健康,时间也都同步了,自己尝试重启了n次都不行,后来经过百度查阅资料,才定位到是zk节点管理冲突的问题,于是便在zk的bin目录下,查看zNode的信息:
执行命令:sh zkCli.sh
查看目录下信息:ls /
将这个节点下的Hbase删除,重启就好了:
博客就写到这里,如有问题欢迎指正,晚安了