HBase的两个HMaster都是Unknown的状态,health status均为Concerning。regionserver状态均正常,分别重启HMaster/HBase均无改善。
查看HMaster日志有以下报错:
2017-11-13 08:18:25,907 | FATAL | hdtdtest2:21300.activeMasterManager | Failed to become active master | org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1621)
java.lang.IllegalArgumentException: Table qualifier must not be empty
at org.apache.hadoop.hbase.TableName.isLegalTableQualifierName(TableName.java:179)
at org.apache.hadoop.hbase.TableName.isLegalTableQualifierName(TableName.java:149)
at org.apache.hadoop.hbase.TableName.
at org.apache.hadoop.hbase.TableName.createTableNameIfNecessary(TableName.java:357)
at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:417)
at org.apache.hadoop.hbase.HTableDescriptor.readFields(HTableDescriptor.java:1045)
at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:131)
at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:101)
at org.apache.hadoop.hbase.HTableDescriptor.parseFrom(HTableDescriptor.java:1562)
at org.apache.hadoop.hbase.util.FSTableDescriptors.readTableDescriptor(FSTableDescriptors.java:526)
at org.apache.hadoop.hbase.util.FSTableDescriptors.getTableDescriptorFromFs(FSTableDescriptors.java:511)
at org.apache.hadoop.hbase.util.FSTableDescriptors.getTableDescriptorFromFs(FSTableDescriptors.java:487)
at org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:172)
at org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:209)
at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:638)
at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:171)
at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1617)
at java.lang.Thread.run(Thread.java:745)
2017-11-13 08:18:25,910 | FATAL | hdtdtest2:21300.activeMasterManager | Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.JMXListener] | org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:1992)
2017-11-13 08:18:25,910 | FATAL | hdtdtest2:21300.activeMasterManager | Unhandled exception. Starting shutdown. | org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:1995)
java.lang.IllegalArgumentException: Table qualifier must not be empty
at org.apache.hadoop.hbase.TableName.isLegalTableQualifierName(TableName.java:179)
at org.apache.hadoop.hbase.TableName.isLegalTableQualifierName(TableName.java:149)
at org.apache.hadoop.hbase.TableName.
at org.apache.hadoop.hbase.TableName.createTableNameIfNecessary(TableName.java:357)
at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:417)
at org.apache.hadoop.hbase.HTableDescriptor.readFields(HTableDescriptor.java:1045)
at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:131)
at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:101)
at org.apache.hadoop.hbase.HTableDescriptor.parseFrom(HTableDescriptor.java:1562)
at org.apache.hadoop.hbase.util.FSTableDescriptors.readTableDescriptor(FSTableDescriptors.java:526)
at org.apache.hadoop.hbase.util.FSTableDescriptors.getTableDescriptorFromFs(FSTableDescriptors.java:511)
at org.apache.hadoop.hbase.util.FSTableDescriptors.getTableDescriptorFromFs(FSTableDescriptors.java:487)
at org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:172)
at org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:209)
at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:638)
at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:171)
at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1617)
at java.lang.Thread.run(Thread.java:745)
调整参数:hbase.master.preload.tabledescriptors 为 false/hbase.master.preload.tabledescriptors为 false重启HBase服务
修改配置项重启后,两个HMaster状态在(Unkown/Unknown)-----(Unknown/Active)之间切换。当一个HMaster处于Active状态时,HMaster原生页面也无法打开,在hbase shell中执行list等命令仍报HMaster is initializing
问题是Hbase的元数据中表描述信息存在异常,之前是在Hmaster启动过程中预加载表描述失败
调整参数关闭预加载后,目前是在启动过程中的balancer阶段读取表描述异常
hbase.master.loadbalancer.class 为 org.apache.hadoop.hbase.master.balancer.SimpleLoadBalancer
调整balancer参数后重启HBase服务,仍然存在HMaster状态均为Unknown的情况。
hadoop fs -ls /hbase/data/*/*/.tabledesc
这些.tableinfo文件大小有没有是0的?
修改并替换jar包hbase-server-1.0.2.jar
修改内容:
修改前
private static HTableDescriptor readTableDescriptor(FileSystem fs, FileStatus status,
boolean rewritePb) throws IOException {
int len = Ints.checkedCast(status.getLen());
byte [] content = new byte[len];
FSDataInputStream fsDataInputStream = fs.open(status.getPath());
try {
fsDataInputStream.readFully(content);
} finally {
fsDataInputStream.close();
}
HTableDescriptor htd = null;
try {
htd = HTableDescriptor.parseFrom(content);
} catch (DeserializationException e) {
// we have old HTableDescriptor here
try {
HTableDescriptor ohtd = HTableDescriptor.parseFrom(content);
LOG.warn("Found old table descriptor, converting to new format for table " +
ohtd.getTableName());
htd = new HTableDescriptor(ohtd);
if (rewritePb) rewriteTableDescriptor(fs, status, htd);
} catch (DeserializationException e1) {
throw new IOException("content=" + Bytes.toShort(content), e1);
}
}
if (rewritePb && !ProtobufUtil.isPBMagicPrefix(content)) {
// Convert the file over to be pb before leaving here.
rewriteTableDescriptor(fs, status, htd);
}
return htd;
}
修改后
private static HTableDescriptor readTableDescriptor(FileSystem fs, FileStatus status,
boolean rewritePb) throws IOException {
int len = Ints.checkedCast(status.getLen());
byte[] content = new byte[len];
FSDataInputStream fsDataInputStream = fs.open(status.getPath());
try {
fsDataInputStream.readFully(content);
} finally {
fsDataInputStream.close();
}
HTableDescriptor htd = null;
try {
htd = HTableDescriptor.parseFrom(content);
} catch (Throwable t) {
System.out.println("path is " + status.getPath());
LOG.error("path is " + status.getPath());
System.out.println("content=" + Bytes.toShort(content));
LOG.error("content=" + Bytes.toShort(content));
System.out.println("exception is " + t);
LOG.error("exception is ", t);
throw new IOException("content=" + Bytes.toShort(content), t);
}
if (rewritePb && !ProtobufUtil.isPBMagicPrefix(content)) {
// Convert the file over to be pb before leaving here.
rewriteTableDescriptor(fs, status, htd);
}
return htd;
}
日志打印出了具体错误文件:
2017-11-17 14:49:26,440 | ERROR | hdtdtest3:21300.activeMasterManager | path is hdfs://hacluster/hbase/data/RS6000_CW/biz_test_mi/.tabledesc/.tableinfo.0000000001.gz | org.apache.hadoop.hbase.util.FSTableDescriptors.readTableDescriptor(FSTableDescriptors.java:538)
-rw-r--r--+ 3 hbase_bk_user hadoop 495 2017-08-08 21:47 /hbase/data/RS6000_CW/biz_test_mi/.tabledesc/.tableinfo.0000000001.gz
-rw-rwxr--+ 3 hbase_bk_user hadoop 943 2017-06-06 09:49 /hbase/data/default/CUSTOMER/.tabledesc/.tableinfo.0000000001
-rw-r--r--+ 3 hbase_bk_user hadoop 528 2017-08-09 15:13 /hbase/data/default/bk_test14/.tabledesc/.tableinfo.0000000001
-rw-r--r--+ 3 admin supergroup 527 2017-07-18 09:32 /hbase/data/default/ucps/.tabledesc/.tableinfo.0000000001
同时发现轮流掉入rit状态的region都在同一个rs上,单独重启该rs后,服务恢复正常
目前服务正常,若再次失败,则考虑删除以上错误文件