搭建HBase 集群,执行启动命令后,住HMaster 进程自动消失:具体异常日志如下
。
// An highlighted block
2020-06-09 16:45:04,436 ERROR [master/master:16000:becomeActiveMaster] master.HMaster: Failed to become active master
java.lang.IllegalStateException: The procedure WAL relies on the ability to hsync for proper operation during component failures, but the underlying filesystem does not support doing so. Please check the config value of 'hbase.procedure.store.wal.use.hsync' to set the desired level of robustness and ensure the config value of 'hbase.wal.dir' points to a FileSystem mount that can provide it.
at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.rollWriter(WALProcedureStore.java:1092)
at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.recoverLease(WALProcedureStore.java:424)
at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:586)
at org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1522)
at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:937)
at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2114)
at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:579)
at java.lang.Thread.run(Thread.java:748)
2020-06-09 16:45:04,450 ERROR [master/master:16000:becomeActiveMaster] master.HMaster: ***** ABORTING master master,16000,1591692251649: Unhandled exception. Starting shutdown. *****
java.lang.IllegalStateException: The procedure WAL relies on the ability to hsync for proper operation during component failures, but the underlying filesystem does not support doing so. Please check the config value of 'hbase.procedure.store.wal.use.hsync' to set the desired level of robustness and ensure the config value of 'hbase.wal.dir' points to a FileSystem mount that can provide it.
at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.rollWriter(WALProcedureStore.java:1092)
at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.recoverLease(WALProcedureStore.java:424)
at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:586)
at org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1522)
at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:937)
at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2114)
at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:579)
at java.lang.Thread.run(Thread.java:748)
2020-06-09 16:45:04,451 INFO [master/master:16000:becomeActiveMaster] regionserver.HRegionServer: ***** STOPPING region server 'master,16000,1591692251649' *****
2020-06-09 16:45:04,451 INFO [master/master:16000:becomeActiveMaster] regionserver.HRegionServer: STOPPED: Stopped by master/master:16000:becomeActiveMaster
2020-06-09 16:45:04,716 INFO [master/master:16000] ipc.NettyRpcServer: Stopping server on /192.168.43.110:16000
2020-06-09 16:45:04,739 INFO [master/master:16000.splitLogManager..Chore.1] hbase.ScheduledChore: Chore: SplitLogManager Timeout Monitor was stopped
2020-06-09 16:45:04,795 INFO [master/master:16000] regionserver.HRegionServer: Stopping infoServer
2020-06-09 16:45:04,867 INFO [master/master:16000] handler.ContextHandler: Stopped o.e.j.w.WebAppContext@396639b{
/,null,UNAVAILABLE}{
file:/opt/hbase-2.2.5/hbase-webapps/master}
2020-06-09 16:45:04,912 INFO [master/master:16000] server.AbstractConnector: Stopped ServerConnector@19650aa6{
HTTP/1.1,[http/1.1]}{
0.0.0.0:16010}
2020-06-09 16:45:04,914 INFO [master/master:16000] handler.ContextHandler: Stopped o.e.j.s.ServletContextHandler@55a609dd{
/static,file:///opt/hbase-2.2.5/hbase-webapps/static/,UNAVAILABLE}
2020-06-09 16:45:04,916 INFO [master/master:16000] handler.ContextHandler: Stopped o.e.j.s.ServletContextHandler@7acfb656{
/logs,file:///opt/hbase-2.2.5/logs/,UNAVAILABLE}
2020-06-09 16:45:04,927 INFO [master/master:16000] regionserver.HRegionServer: aborting server master,16000,1591692251649
2020-06-09 16:45:04,928 INFO [master/master:16000] regionserver.HRegionServer: stopping server master,16000,1591692251649; all regions closed.
2020-06-09 16:45:04,928 INFO [master/master:16000] hbase.ChoreService: Chore service for: master/master:16000 had [] on shutdown
2020-06-09 16:45:04,951 INFO [ReadOnlyZKClient-master:2181,slave1:2181,slave2:2181@0x2e2d0a9d] zookeeper.ZooKeeper: Session: 0x300000d4e5a0002 closed
2020-06-09 16:45:04,958 INFO [ReadOnlyZKClient-master:2181,slave1:2181,slave2:2181@0x2e2d0a9d-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x300000d4e5a0002
2020-06-09 16:45:04,971 WARN [master/master:16000] master.ActiveMasterManager: Failed get of master address: java.io.IOException: Can't get master address from ZooKeeper; znode data == null
2020-06-09 16:45:04,980 INFO [master/master:16000] wal.WALProcedureStore: Stopping the WAL Procedure Store, isAbort=true
2020-06-09 16:45:04,982 INFO [master/master:16000] hbase.ChoreService: Chore service for: master/master:16000.splitLogManager. had [] on shutdown
2020-06-09 16:45:05,040 INFO [master/master:16000] zookeeper.ZooKeeper: Session: 0x300000d4e5a0001 closed
2020-06-09 16:45:05,040 INFO [master/master:16000] regionserver.HRegionServer: Exiting; stopping=master,16000,1591692251649; zookeeper connection closed.
2020-06-09 16:45:05,041 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x300000d4e5a0001
2020-06-09 16:45:05,042 ERROR [main] master.HMasterCommandLine: Master exiting
java.lang.RuntimeException: HMaster Aborted
at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:244)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:140)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2945)
解决方法:
一种方法是在hbase-site.xml配置文件里增加如下内容:
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
<description>
Controls whether HBase will check for stream capabilities (hflush/hsync).
Disable this if you intend to run on LocalFileSystem, denoted by a rootdir
with the 'file://' scheme, but be mindful of the NOTE below.
WARNING: Setting this to false blinds you to potential data loss and
inconsistent system state in the event of process and/or node failures. If
HBase is complaining of an inability to use hsync or hflush it's most
likely not a false positive.
控制HBase是否检查流功能(hflush/hsync)。
如果您打算在由rootdir表示的LocalFileSystem上运行,请禁用此选项
使用'file://'方案,但是要注意下面的注意事项。
警告:将此设置为false会使您看不到潜在的数据丢失和
进程和/或节点发生故障时系统状态不一致。如果
HBase抱怨不能使用hsync或hflush it's most
可能不是假阳性。
</description>
</property>
hbase.unsafe.stream.capability.enforce:使用本地文件系统设置为false,使用hdfs设置为true。但根据HBase 官方手册的说明:HBase 从2.0.0 开始默认使用的是asyncfs。
137.1.3. Master fails to become active due to lack of hsync for filesystem
HBase’s internal framework for cluster operations requires the ability to durably save state in a write ahead log. When using a version of Apache Hadoop Common’s filesystem API that supports checking on the availability of needed calls, HBase will proactively abort the cluster if it finds it can’t operate safely.
asyncfs: The default. New since hbase-2.0.0 (HBASE-15536, HBASE-14790). This AsyncFSWAL provider, as it identifies itself in RegionServer logs, is built on a new non-blocking dfsclient implementation. It is currently resident in the hbase codebase but intent is to move it back up into HDFS itself. WALs edits are written concurrently (“fan-out”) style to each of the WAL-block replicas on each DataNode rather than in a chained pipeline as the default client does. Latencies should be better. See Apache HBase Improements and Practices at Xiaomi at slide 14 onward for more detail on implementation.
137.1.3。HBase的内部集群操作框架需要能够在提前写日志中持久地保存状态。当使用支持检查所需调用可用性的Apache Hadoop Common s filesystem API时,如果HBase发现集群无法安全运行,它将主动中止集群。asyncfs:默认。新自hbase-2.0.0 (HBASE-15536, HBASE-14790)。这个AsyncFSWAL提供程序(它在RegionServer日志中标识自己)构建在一个新的非阻塞dfsclient实现上。它目前驻留在hbase代码基中,但目的是将其移回HDFS本身。WALs编辑被并发地(扇形输出)编写到每个DataNode上的每个WAL-block副本,而不是像默认客户机那样在一个链接的管道中编写。延迟应该更好。有关实现的更多细节,请参阅幻灯片14中的小米中的Apache HBase改进和实践。
我测试环境里用的是hbase-2.2.5, 所以这里虽然是集群环境,也直接将该参数设置false,然后重启Hbase Master,恢复正常。 或者使用版本小于2.0.0的HBase,也可以避免出现这种错误。