在前面文章中搭建了高可用的hadoop集群,然后hbase使用这个集群的hdfs。但是我们d hbase.rootdir
配置的仍然是写死的机器:
<property>
<name>hbase.rootdirname>
<value>hdfs://node1:9000/hbasevalue>
property>
如果node1此时宕机,处于active状态的nameserver变成了node2,那么hbase集群将不可用,启动hbase集群时将会报错org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby
:
2019-07-22 11:11:24,431 ERROR [master/node1:16000:becomeActiveMaster] master.HMaster: ***** ABORTING master node1,16000,1563765070980: Unhandled exception. Starting shutdown. *****
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)
at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1802)
//省略部分日志
2019-07-22 11:11:24,431 INFO [master/node1:16000:becomeActiveMaster] regionserver.HRegionServer: ***** STOPPING region server 'node1,16000,1563765070980' *****
2019-07-22 11:11:24,432 INFO [master/node1:16000:becomeActiveMaster] regionserver.HRegionServer: STOPPED: Stopped by master/node1:16000:becomeActiveMaster
2019-07-22 11:11:27,078 INFO [master/node1:16000] ipc.NettyRpcServer: Stopping server on /192.168.229.128:16000
2019-07-22 11:11:27,097 WARN [master/node1:16000] regionserver.HRegionServer: Initialize abort timeout task failed
java.lang.IllegalAccessException: Class org.apache.hadoop.hbase.regionserver.HRegionServer can not access a member of class org.apache.hadoop.hbase.regionserver.HRegionServer$SystemExitWhenAbortTimeout with modifiers "private"
at sun.reflect.Reflection.ensureMemberAccess(Reflection.java:102)
at java.lang.reflect.AccessibleObject.slowCheckMemberAccess(AccessibleObject.java:296)
at java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:288)
at java.lang.reflect.Constructor.newInstance(Constructor.java:413)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1044)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:598)
at java.lang.Thread.run(Thread.java:745)
2019-07-22 11:11:27,097 INFO [master/node1:16000] regionserver.HRegionServer: Stopping infoServer
2019-07-22 11:11:27,128 INFO [master/node1:16000] handler.ContextHandler: Stopped o.e.j.w.WebAppContext@168cd36b{/,null,UNAVAILABLE}{file:/data/program/hbase-2.1.5/hbase-webapps/master}
2019-07-22 11:11:27,143 INFO [master/node1:16000] server.AbstractConnector: Stopped ServerConnector@319c3a25{HTTP/1.1,[http/1.1]}{0.0.0.0:16010}
2019-07-22 11:11:27,147 INFO [master/node1:16000] handler.ContextHandler: Stopped o.e.j.s.ServletContextHandler@3b2f4a93{/static,file:///data/program/hbase-2.1.5/hbase-webapps/static/,UNAVAILABLE}
2019-07-22 11:11:27,148 INFO [master/node1:16000] handler.ContextHandler: Stopped o.e.j.s.ServletContextHandler@3ffb3598{/logs,file:///data/program/hbase-2.1.5/logs/,UNAVAILABLE}
2019-07-22 11:11:27,151 INFO [master/node1:16000] regionserver.HRegionServer: aborting server node1,16000,1563765070980
2019-07-22 11:11:27,169 INFO [master/node1:16000] regionserver.HRegionServer: stopping server node1,16000,1563765070980; all regions closed.
2019-07-22 11:11:27,170 INFO [master/node1:16000] hbase.ChoreService: Chore service for: master/node1:16000 had [] on shutdown
2019-07-22 11:11:27,175 WARN [master/node1:16000] master.ActiveMasterManager: Failed get of master address: java.io.IOException: Can't get master address from ZooKeeper; znode data == null
2019-07-22 11:11:27,193 INFO [master/node1:16000] zookeeper.ZooKeeper: Session: 0x16c178f38b80009 closed
2019-07-22 11:11:27,194 INFO [master/node1:16000] regionserver.HRegionServer: Exiting; stopping=node1,16000,1563765070980; zookeeper connection closed.
2019-07-22 11:11:27,195 ERROR [main] master.HMasterCommandLine: Master exiting
java.lang.RuntimeException: HMaster Aborted
at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:244)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:140)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:3117)
这里需要配置HA高可用的hdfs集群地址,而不是写死的某台机器。
修改hbase-site.xml
<property>
<name>hbase.rootdirname>
<value>hdfs://ns1/hbasevalue>
property>
备注:这里ns1
来自于hdfs-site.xml
的配置dfs.nameservices
:
<configuration>
<property>
<name>dfs.nameservicesname>
<value>ns1value>
property>
//省略其他配置
同时将hadoop的配置文件hdfs-site.xml
和core-site.xml
复制到hbase的conf目录下。不然会报找不到myha的错误。
重启hbase集群即可。
网上很多都说遇到java.lang.RuntimeException: HMaster Aborted
错误需要清理zk:因为在CDH时重新添加删除HBASE导致的,需要清理zk中的hbase缓存,将zk的/hbase删除即可。
但是并不是所有问题都说由于zk造成的,看日志要多看一点,看到其根本原因。