最近一客户环境由于Hadoop故障,恢复时出现“_MD_".OBJECTS处于RIT状态,且长期不能恢复。使用Trafodion用户执行HBCHECK输出如下,
HBase is available!
HBase version: 1.1.2.2.4.3.0-227
HMaster: namenode-2.esg.local,16000,1568772195966
Number of RegionServers available:4
RegionServer #1: datanode-3.esg.local,16020,1568772200461
RegionServer #2: datanode-4.esg.local,16020,1568772200097
RegionServer #3: datanode-2.esg.local,16020,1568772200709
RegionServer #4: datanode-1.esg.local,16020,1568772201234
Number of Dead RegionServers:0
Number of regions: 31014
Number of regions in transition: 1
RegionInTransition: TRAF_RSRVD_1:TRAFODION._MD_.OBJECTS,,1529534107727.e5b72fdd54857972797d8c6583964d0a. state=OPENING, ts=Tue Sep 17 22:12:38 EDT 2019 (420s ago), server=null
Average load: 7753.5
RegionInTransition: TRAF_RSRVD_1:TRAFODION._MD_.OBJECTS,1529534107727.e5b72fdd54857972797d8c6583964d0a. state=OPENING, ts=Tue Sep 17 22:12:38 EDT 2019 (420s ago), server=null 表示TRAFODION._MD_.OBJECTS处于RIT状态。
经过一番尝试与努力之后,问题得以解决,在此分享一下尝试过的方法,仅供参考。
步骤一
怀疑是HBase 自动split功能导致,把split关闭。方法为:修改split策略从org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy为org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy,将hbase.hregion.max.filesize从默认的10G调整为100G。
重启HBase,验证问题仍然存在。
步骤二
移除备份HBase的WAL目录,
hadoop fs -mkdir /hbase-wal-bak20190918
hadoop fs -mv /apps/hbase/data/WALs/* /hbase-wal-bak20190918/
完成以上命令后重启HBase,验证问题仍然存在。
步骤三
手动assign表的region,
hbase(main):002:0> assign ‘e5b72fdd54857972797d8c6583964d0a’
完成后验证问题仍然存在。
步骤四
删除表对应的recovered.edits
hadoop fs -rmr /apps/hbase/data/data/TRAF_RSRVD_1/TRAFODION._MD_.OBJECTS/e5b72fdd54857972797d8c6583964d0a/recovered.edits
重启HBase后,问题消失。
完成以上步骤后再执行hbcheck,输出如下,
HBase is available!
HBase version: 1.1.2.2.4.3.0-227
HMaster: namenode-1.esg.local,16000,1568791177953
Number of RegionServers available:4
RegionServer #1: datanode-1.esg.local,16020,1568791183192
RegionServer #2: datanode-2.esg.local,16020,1568791182670
RegionServer #3: datanode-4.esg.local,16020,1568791182598
RegionServer #4: datanode-3.esg.local,16020,1568791182581
Number of Dead RegionServers:0
Number of regions: 31015
Number of regions in transition: 0
Average load: 7753.75