hmaster启动异常解决记录

环境背景
在单台云服务器上使用docker搭了一套简单的大数据开发测试环境,1个master,2个slave,服务启动使用docker-compose简单编排。

hmaster启动失败,查看日志出现如下异常信息

2018-08-19 13:49:29,121 FATAL [78e7a081d8b6:16000.activeMasterManager] master.HMaster: Unhandled exception. Starting shutdown.
org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-914767151-172.18.0.4-1533896696236:blk_1073741825_1001 file=/hbase/hbase.version
        at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:976)
        at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:632)
        at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:874)
        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:926)
        at java.io.DataInputStream.read(DataInputStream.java:149)
        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:200)
        at org.apache.hadoop.hbase.util.FSUtils.getVersion(FSUtils.java:608)
        at org.apache.hadoop.hbase.util.FSUtils.checkVersion(FSUtils.java:691)
        at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:509)
        at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:166)
        at org.apache.hadoop.hbase.master.MasterFileSystem.(MasterFileSystem.java:141)
        at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:741)
        at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:205)
        at org.apache.hadoop.hbase.master.HMaster$2.run(HMaster.java:2023)
        at java.lang.Thread.run(Thread.java:748)

查看hdfs上的文件块信息,发现有CORRUPT的情况

[root@78e7a081d8b6 logs]# hdfs fsck / -files -blocks
18/08/19 13:52:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Connecting to namenode via http://master:50070/fsck?ugi=root&files=1&blocks=1&path=%2F
FSCK started by root (auth:SIMPLE) from /172.18.0.3 for path / at Sun Aug 19 13:52:55 UTC 2018
/ 
/hbase 
/hbase/.tmp 
/hbase/MasterProcWALs 
/hbase/MasterProcWALs/state-00000000000000000015.log 0 bytes, 0 block(s):  OK

/hbase/WALs 
/hbase/WALs/zhaocan_slave1_1.zhaocan_default,16020,1534214426421 
/hbase/WALs/zhaocan_slave2_1.zhaocan_default,16020,1534214426422 
/hbase/archive 
/hbase/corrupt 
/hbase/data 
/hbase/data/default 
/hbase/data/default/soy_test1 
/hbase/data/default/soy_test1/.tabledesc 
/hbase/data/default/soy_test1/.tabledesc/.tableinfo.0000000001 289 bytes, 1 block(s): 
/hbase/data/default/soy_test1/.tabledesc/.tableinfo.0000000001: CORRUPT blockpool BP-914767151-172.18.0.4-1533896696236 block blk_1073741857
 MISSING 1 blocks of total size 289 B
0. BP-914767151-172.18.0.4-1533896696236:blk_1073741857_1033 len=289 MISSING!

/hbase/data/default/soy_test1/.tmp 
/hbase/data/default/soy_test1/e332a1b73e8bcac9b69e446999e834fb 
/hbase/data/default/soy_test1/e332a1b73e8bcac9b69e446999e834fb/.regioninfo 44 bytes, 1 block(s): 
/hbase/data/default/soy_test1/e332a1b73e8bcac9b69e446999e834fb/.regioninfo: CORRUPT blockpool BP-914767151-172.18.0.4-1533896696236 block blk_1073741858
 MISSING 1 blocks of total size 44 B
0. BP-914767151-172.18.0.4-1533896696236:blk_1073741858_1034 len=44 MISSING!

/hbase/data/default/soy_test1/e332a1b73e8bcac9b69e446999e834fb/data 
/hbase/data/default/soy_test1/e332a1b73e8bcac9b69e446999e834fb/recovered.edits 
/hbase/data/default/soy_test1/e332a1b73e8bcac9b69e446999e834fb/recovered.edits/4.seqid 0 bytes, 0 block(s):  OK

/hbase/data/default/t1 
/hbase/data/default/t1/.tabledesc 
/hbase/data/default/t1/.tabledesc/.tableinfo.0000000001 766 bytes, 1 block(s): 
/hbase/data/default/t1/.tabledesc/.tableinfo.0000000001: CORRUPT blockpool BP-914767151-172.18.0.4-1533896696236 block blk_1073741853
 MISSING 1 blocks of total size 766 B
0. BP-914767151-172.18.0.4-1533896696236:blk_1073741853_1029 len=766 MISSING!

/hbase/data/default/t1/.tmp 
/hbase/data/default/t1/9384715a6b039dd6db92d729703a01d8 
/hbase/data/default/t1/9384715a6b039dd6db92d729703a01d8/.regioninfo 37 bytes, 1 block(s): 
/hbase/data/default/t1/9384715a6b039dd6db92d729703a01d8/.regioninfo: CORRUPT blockpool BP-914767151-172.18.0.4-1533896696236 block blk_1073741854
 MISSING 1 blocks of total size 37 B
0. BP-914767151-172.18.0.4-1533896696236:blk_1073741854_1030 len=37 MISSING!

/hbase/data/default/t1/9384715a6b039dd6db92d729703a01d8/f1 
/hbase/data/default/t1/9384715a6b039dd6db92d729703a01d8/f1/37aeca77a262451f89bc9745de640e31 4925 bytes, 1 block(s): 
/hbase/data/default/t1/9384715a6b039dd6db92d729703a01d8/f1/37aeca77a262451f89bc9745de640e31: CORRUPT blockpool BP-914767151-172.18.0.4-1533896696236 block blk_1073741859
 MISSING 1 blocks of total size 4925 B
0. BP-914767151-172.18.0.4-1533896696236:blk_1073741859_1035 len=4925 MISSING!

/hbase/data/default/t1/9384715a6b039dd6db92d729703a01d8/f2 
/hbase/data/default/t1/9384715a6b039dd6db92d729703a01d8/f3 
/hbase/data/default/t1/9384715a6b039dd6db92d729703a01d8/recovered.edits 
/hbase/data/default/t1/9384715a6b039dd6db92d729703a01d8/recovered.edits/14.seqid 0 bytes, 0 block(s):  OK

/hbase/data/hbase 
/hbase/data/hbase/meta 
/hbase/data/hbase/meta/.tabledesc 
/hbase/data/hbase/meta/.tabledesc/.tableinfo.0000000001 397 bytes, 1 block(s): 
/hbase/data/hbase/meta/.tabledesc/.tableinfo.0000000001: CORRUPT blockpool BP-914767151-172.18.0.4-1533896696236 block blk_1073741828
 MISSING 1 blocks of total size 397 B
0. BP-914767151-172.18.0.4-1533896696236:blk_1073741828_1004 len=397 MISSING!

/hbase/data/hbase/meta/.tmp 
/hbase/data/hbase/meta/1588230740 
/hbase/data/hbase/meta/1588230740/.regioninfo 32 bytes, 1 block(s): 
/hbase/data/hbase/meta/1588230740/.regioninfo: CORRUPT blockpool BP-914767151-172.18.0.4-1533896696236 block blk_1073741827
 MISSING 1 blocks of total size 32 B
0. BP-914767151-172.18.0.4-1533896696236:blk_1073741827_1003 len=32 MISSING!

/hbase/data/hbase/meta/1588230740/.tmp 
/hbase/data/hbase/meta/1588230740/info 
/hbase/data/hbase/meta/1588230740/info/b4a2d55521614033bd88eeeb82c3fd4d 9013 bytes, 1 block(s): 
/hbase/data/hbase/meta/1588230740/info/b4a2d55521614033bd88eeeb82c3fd4d: CORRUPT blockpool BP-914767151-172.18.0.4-1533896696236 block blk_1073741928
 MISSING 1 blocks of total size 9013 B
0. BP-914767151-172.18.0.4-1533896696236:blk_1073741928_1110 len=9013 MISSING!

/hbase/data/hbase/meta/1588230740/recovered.edits 
/hbase/data/hbase/meta/1588230740/recovered.edits/31.seqid 0 bytes, 0 block(s):  OK

/hbase/data/hbase/namespace 
/hbase/data/hbase/namespace/.tabledesc 
/hbase/data/hbase/namespace/.tabledesc/.tableinfo.0000000001 312 bytes, 1 block(s): 
/hbase/data/hbase/namespace/.tabledesc/.tableinfo.0000000001: CORRUPT blockpool BP-914767151-172.18.0.4-1533896696236 block blk_1073741834
 MISSING 1 blocks of total size 312 B
0. BP-914767151-172.18.0.4-1533896696236:blk_1073741834_1010 len=312 MISSING!

/hbase/data/hbase/namespace/.tmp 
/hbase/data/hbase/namespace/23dc1dbbc536758979b2bcdfb7b6d556 
/hbase/data/hbase/namespace/23dc1dbbc536758979b2bcdfb7b6d556/.regioninfo 42 bytes, 1 block(s): 
/hbase/data/hbase/namespace/23dc1dbbc536758979b2bcdfb7b6d556/.regioninfo: CORRUPT blockpool BP-914767151-172.18.0.4-1533896696236 block blk_1073741835
 MISSING 1 blocks of total size 42 B
0. BP-914767151-172.18.0.4-1533896696236:blk_1073741835_1011 len=42 MISSING!

/hbase/data/hbase/namespace/23dc1dbbc536758979b2bcdfb7b6d556/info 
/hbase/data/hbase/namespace/23dc1dbbc536758979b2bcdfb7b6d556/info/f642f548417241e3a1fbbe34103507a2 4963 bytes, 1 block(s): 
/hbase/data/hbase/namespace/23dc1dbbc536758979b2bcdfb7b6d556/info/f642f548417241e3a1fbbe34103507a2: CORRUPT blockpool BP-914767151-172.18.0.4-1533896696236 block blk_1073741850
 MISSING 1 blocks of total size 4963 B
0. BP-914767151-172.18.0.4-1533896696236:blk_1073741850_1026 len=4963 MISSING!

/hbase/data/hbase/namespace/23dc1dbbc536758979b2bcdfb7b6d556/recovered.edits 
/hbase/data/hbase/namespace/23dc1dbbc536758979b2bcdfb7b6d556/recovered.edits/16.seqid 0 bytes, 0 block(s):  OK

/hbase/hbase.id 42 bytes, 1 block(s): 
/hbase/hbase.id: CORRUPT blockpool BP-914767151-172.18.0.4-1533896696236 block blk_1073741826
 MISSING 1 blocks of total size 42 B
0. BP-914767151-172.18.0.4-1533896696236:blk_1073741826_1002 len=42 MISSING!

/hbase/hbase.version 7 bytes, 1 block(s): 
/hbase/hbase.version: CORRUPT blockpool BP-914767151-172.18.0.4-1533896696236 block blk_1073741825
 MISSING 1 blocks of total size 7 B
0. BP-914767151-172.18.0.4-1533896696236:blk_1073741825_1001 len=7 MISSING!

/hbase/oldWALs 
/root 
/root/hive 
/root/hive/root 
/root/hive/root/16e2f965-2d21-4504-992e-4d64079f5a51 
/root/hive/root/16e2f965-2d21-4504-992e-4d64079f5a51/_tmp_space.db 
/root/hive/root/3a835fae-a5d9-4cfd-8f14-b2fc1b35e984 
/root/hive/root/3a835fae-a5d9-4cfd-8f14-b2fc1b35e984/_tmp_space.db 
/root/hive/root/4e754087-66f8-471e-9717-5ecc7bebc29b 
/root/hive/root/4e754087-66f8-471e-9717-5ecc7bebc29b/_tmp_space.db 
/root/hive/root/e6fe8961-9813-4e26-9366-b323d5a16479 
/root/hive/root/e6fe8961-9813-4e26-9366-b323d5a16479/_tmp_space.db 
/root/hive/warehouse 
/root/hive/warehouse/t1 
/root/hive/warehouse/testdb.db 
/root/hive/warehouse/testdb.db/soy_test1 
/test 
Status: CORRUPT
 Total size:    20869 B (Total open files size: 249 B)
 Total dirs:    56
 Total files:   18
 Total symlinks:        0 (Files currently being written: 3)
 Total blocks (validated):  13 (avg. block size 1605 B) (Total open file blocks (not validated): 3)
  ********************************
  UNDER MIN REPL'D BLOCKS:  13 (100.0 %)
  dfs.namenode.replication.min: 1
  CORRUPT FILES:    13
  MISSING BLOCKS:   13
  MISSING SIZE:     20869 B
  CORRUPT BLOCKS:   13
  ********************************
 Minimally replicated blocks:   0 (0.0 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:   0 (0.0 %)
 Mis-replicated blocks:     0 (0.0 %)
 Default replication factor:    3
 Average block replication: 0.0
 Corrupt blocks:        13
 Missing replicas:      0
 Number of data-nodes:      2
 Number of racks:       1
FSCK ended at Sun Aug 19 13:52:55 UTC 2018 in 8 milliseconds


The filesystem under path '/' is CORRUPT

由于是开发环境,直接执行问题块的删除操作,然后再次启动hbase

[root@78e7a081d8b6 logs]# hdfs fsck -delete
18/08/19 13:54:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Connecting to namenode via http://master:50070/fsck?ugi=root&delete=1&path=%2F
FSCK started by root (auth:SIMPLE) from /172.18.0.3 for path / at Sun Aug 19 13:54:55 UTC 2018
.....Status: HEALTHY
 Total size:    0 B (Total open files size: 249 B)
 Total dirs:    56
 Total files:   5
 Total symlinks:        0 (Files currently being written: 3)
 Total blocks (validated):  0 (Total open file blocks (not validated): 3)
 Minimally replicated blocks:   0
 Over-replicated blocks:    0
 Under-replicated blocks:   0
 Mis-replicated blocks:     0
 Default replication factor:    3
 Average block replication: 0.0
 Corrupt blocks:        0
 Missing replicas:      0
 Number of data-nodes:      2
 Number of racks:       1
FSCK ended at Sun Aug 19 13:54:55 UTC 2018 in 4 milliseconds


The filesystem under path '/' is HEALTHY

此时出现如下异常,可以直接使用hdfs -rmr /hbase删除整个目录的数据

2018-08-19 13:56:13,652 FATAL [78e7a081d8b6:16000.activeMasterManager] master.HMaster: Failed to become active master
org.apache.hadoop.hbase.util.FileSystemVersionException: HBase file layout needs to be upgraded. You have version null and I want version 8. Consult http://hbase.apache.org/book.html for further information about upgrading HBase. Is your hbase.rootdir valid? If so, you may need to run 'hbase hbck -fixVersionFile'.
        at org.apache.hadoop.hbase.util.FSUtils.checkVersion(FSUtils.java:712)
        at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:509)
        at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:166)
        at org.apache.hadoop.hbase.master.MasterFileSystem.(MasterFileSystem.java:141)
        at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:741)
        at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:205)
        at org.apache.hadoop.hbase.master.HMaster$2.run(HMaster.java:2023)
        at java.lang.Thread.run(Thread.java:748)

你可能感兴趣的:(hmaster启动异常解决记录)