HBase HMaster状态异常



HBase的两个HMaster都是Unknown的状态,health status均为Concerning。regionserver状态均正常,分别重启HMaster/HBase均无改善。
查看HMaster日志有以下报错:
2017-11-13 08:18:25,907 | FATAL | hdtdtest2:21300.activeMasterManager | Failed to become active master | org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1621)
java.lang.IllegalArgumentException: Table qualifier must not be empty
        at org.apache.hadoop.hbase.TableName.isLegalTableQualifierName(TableName.java:179)
        at org.apache.hadoop.hbase.TableName.isLegalTableQualifierName(TableName.java:149)
        at org.apache.hadoop.hbase.TableName.(TableName.java:321)
        at org.apache.hadoop.hbase.TableName.createTableNameIfNecessary(TableName.java:357)
        at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:417)
        at org.apache.hadoop.hbase.HTableDescriptor.readFields(HTableDescriptor.java:1045)
        at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:131)
        at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:101)
        at org.apache.hadoop.hbase.HTableDescriptor.parseFrom(HTableDescriptor.java:1562)
        at org.apache.hadoop.hbase.util.FSTableDescriptors.readTableDescriptor(FSTableDescriptors.java:526)
        at org.apache.hadoop.hbase.util.FSTableDescriptors.getTableDescriptorFromFs(FSTableDescriptors.java:511)
        at org.apache.hadoop.hbase.util.FSTableDescriptors.getTableDescriptorFromFs(FSTableDescriptors.java:487)
       at org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:172)
        at org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:209)
        at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:638)
        at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:171)
        at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1617)
        at java.lang.Thread.run(Thread.java:745)
2017-11-13 08:18:25,910 | FATAL | hdtdtest2:21300.activeMasterManager | Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.JMXListener] | org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:1992)
2017-11-13 08:18:25,910 | FATAL | hdtdtest2:21300.activeMasterManager | Unhandled exception. Starting shutdown. | org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:1995)
java.lang.IllegalArgumentException: Table qualifier must not be empty
        at org.apache.hadoop.hbase.TableName.isLegalTableQualifierName(TableName.java:179)
        at org.apache.hadoop.hbase.TableName.isLegalTableQualifierName(TableName.java:149)
        at org.apache.hadoop.hbase.TableName.(TableName.java:321)
        at org.apache.hadoop.hbase.TableName.createTableNameIfNecessary(TableName.java:357)
        at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:417)
        at org.apache.hadoop.hbase.HTableDescriptor.readFields(HTableDescriptor.java:1045)
        at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:131)
        at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:101)
        at org.apache.hadoop.hbase.HTableDescriptor.parseFrom(HTableDescriptor.java:1562)
        at org.apache.hadoop.hbase.util.FSTableDescriptors.readTableDescriptor(FSTableDescriptors.java:526)
        at org.apache.hadoop.hbase.util.FSTableDescriptors.getTableDescriptorFromFs(FSTableDescriptors.java:511)
        at org.apache.hadoop.hbase.util.FSTableDescriptors.getTableDescriptorFromFs(FSTableDescriptors.java:487)
        at org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:172)
        at org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:209)
        at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:638)
        at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:171)
        at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1617)
        at java.lang.Thread.run(Thread.java:745)

       
调整参数:hbase.master.preload.tabledescriptors  为 false/hbase.master.preload.tabledescriptors为 false重启HBase服务

修改配置项重启后,两个HMaster状态在(Unkown/Unknown)-----(Unknown/Active)之间切换。当一个HMaster处于Active状态时,HMaster原生页面也无法打开,在hbase shell中执行list等命令仍报HMaster is initializing

问题是Hbase的元数据中表描述信息存在异常,之前是在Hmaster启动过程中预加载表描述失败

调整参数关闭预加载后,目前是在启动过程中的balancer阶段读取表描述异常
hbase.master.loadbalancer.class 为 org.apache.hadoop.hbase.master.balancer.SimpleLoadBalancer

调整balancer参数后重启HBase服务,仍然存在HMaster状态均为Unknown的情况。

hadoop fs -ls /hbase/data/*/*/.tabledesc
这些.tableinfo文件大小有没有是0的?

修改并替换jar包hbase-server-1.0.2.jar
修改内容:
修改前
private static HTableDescriptor readTableDescriptor(FileSystem fs, FileStatus status,
      boolean rewritePb) throws IOException {
    int len = Ints.checkedCast(status.getLen());
    byte [] content = new byte[len];
    FSDataInputStream fsDataInputStream = fs.open(status.getPath());
    try {
      fsDataInputStream.readFully(content);
    } finally {
      fsDataInputStream.close();
    }
    HTableDescriptor htd = null;
    try {
      htd = HTableDescriptor.parseFrom(content);
    } catch (DeserializationException e) {
      // we have old HTableDescriptor here
      try {
        HTableDescriptor ohtd = HTableDescriptor.parseFrom(content);
        LOG.warn("Found old table descriptor, converting to new format for table " +
          ohtd.getTableName());
        htd = new HTableDescriptor(ohtd);
        if (rewritePb) rewriteTableDescriptor(fs, status, htd);
      } catch (DeserializationException e1) {
        throw new IOException("content=" + Bytes.toShort(content), e1);
      }
    }
    if (rewritePb && !ProtobufUtil.isPBMagicPrefix(content)) {
      // Convert the file over to be pb before leaving here.
      rewriteTableDescriptor(fs, status, htd);
    }
    return htd;
  }

修改后
private static HTableDescriptor readTableDescriptor(FileSystem fs, FileStatus status,
                                                        boolean rewritePb) throws IOException {
        int len = Ints.checkedCast(status.getLen());
        byte[] content = new byte[len];
        FSDataInputStream fsDataInputStream = fs.open(status.getPath());
        try {
            fsDataInputStream.readFully(content);
        } finally {
            fsDataInputStream.close();
        }
        HTableDescriptor htd = null;
        try {
            htd = HTableDescriptor.parseFrom(content);
        } catch (Throwable t) {
            System.out.println("path is " + status.getPath());
            LOG.error("path is " + status.getPath());
            System.out.println("content=" + Bytes.toShort(content));
            LOG.error("content=" + Bytes.toShort(content));
            System.out.println("exception is " + t);
            LOG.error("exception is ", t);
            throw new IOException("content=" + Bytes.toShort(content), t);
        }
        if (rewritePb && !ProtobufUtil.isPBMagicPrefix(content)) {
            // Convert the file over to be pb before leaving here.
            rewriteTableDescriptor(fs, status, htd);
        }
        return htd;
    }
   
   
    日志打印出了具体错误文件:
    2017-11-17 14:49:26,440 | ERROR | hdtdtest3:21300.activeMasterManager | path is hdfs://hacluster/hbase/data/RS6000_CW/biz_test_mi/.tabledesc/.tableinfo.0000000001.gz | org.apache.hadoop.hbase.util.FSTableDescriptors.readTableDescriptor(FSTableDescriptors.java:538)
-rw-r--r--+  3 hbase_bk_user hadoop        495 2017-08-08 21:47 /hbase/data/RS6000_CW/biz_test_mi/.tabledesc/.tableinfo.0000000001.gz
-rw-rwxr--+  3 hbase_bk_user hadoop        943 2017-06-06 09:49 /hbase/data/default/CUSTOMER/.tabledesc/.tableinfo.0000000001
-rw-r--r--+  3 hbase_bk_user hadoop        528 2017-08-09 15:13 /hbase/data/default/bk_test14/.tabledesc/.tableinfo.0000000001
-rw-r--r--+  3 admin         supergroup        527 2017-07-18 09:32 /hbase/data/default/ucps/.tabledesc/.tableinfo.0000000001
   
   
    同时发现轮流掉入rit状态的region都在同一个rs上,单独重启该rs后,服务恢复正常
    目前服务正常,若再次失败,则考虑删除以上错误文件
   

你可能感兴趣的:(HBase HMaster状态异常)