HDFS block块损坏以及修复

模拟block 损坏之后 如何定位以及修复.

1.创建一个文件并上传至hdfs

[root@ruozedata001 ~]# hdfs dfs -mkdir /blockrecover
[root@ruozedata001 ~]# echo "xiaolinzi" > blocktest.md
[root@ruozedata001 ~]# hdfs dfs -put blocktest.md /blockrecover
[root@ruozedata001 ~]# hdfs dfs -ls /blockrecover
Found 2 items
-rw-r--r--   3 root hadoop         10 2019-08-22 15:00 /blockrecover/blocktest.md
-rw-r--r--   3 root hadoop         18 2019-08-21 10:52 /blockrecover/ruozedata.md
#检查hdfs的健康状况 
[root@ruozedata001 subdir0]# hdfs fsck /
Connecting to namenode via http://ruozedata002:50070/fsck?ugi=root&path=%2F
FSCK started by root (auth:SIMPLE) from /172.16.128.58 for path / at Thu Aug 22 15:12:55 CST 2019
...Status: HEALTHY
 Total size:	11033 B
 Total dirs:	2
 Total files:	3
 Total symlinks:		0
 Total blocks (validated):	3 (avg. block size 3677 B)
 Minimally replicated blocks:	3 (100.0 %)
 Over-replicated blocks:	0 (0.0 %)
 Under-replicated blocks:	0 (0.0 %)
 Mis-replicated blocks:		0 (0.0 %)
 Default replication factor:	3
 Average block replication:	3.0
 Corrupt blocks:		0
 Missing replicas:		0 (0.0 %)
 Number of data-nodes:		3
 Number of racks:		1
FSCK ended at Thu Aug 22 15:12:55 CST 2019 in 2 milliseconds

**2.直接DN节点上删除⽂件⼀个block的⼀个副本 **

#获取block名称
[root@ruozedata001 ~]# hdfs fsck /blockrecover/blocktest.md -files -blocks 
Connecting to namenode via http://ruozedata002:50070/fsck?ugi=root&files=1&blocks=1&path=%2Fblockrecover%2Fblocktest.md
FSCK started by root (auth:SIMPLE) from /172.16.128.58 for path /blockrecover/blocktest.md at Thu Aug 22 15:00:59 CST 2019
/blockrecover/blocktest.md 10 bytes, 1 block(s):  OK
0. BP-1856248125-172.16.128.58-1566189843078:blk_1073741827_1003 len=10 Live_repl=3

Status: HEALTHY
 Total size:	10 B
 Total dirs:	0
 Total files:	1
 Total symlinks:		0
 Total blocks (validated):	1 (avg. block size 10 B)
 Minimally replicated blocks:	1 (100.0 %)
 Over-replicated blocks:	0 (0.0 %)
 Under-replicated blocks:	0 (0.0 %)
 Mis-replicated blocks:		0 (0.0 %)
 Default replication factor:	3
 Average block replication:	3.0
 Corrupt blocks:		0
 Missing replicas:		0 (0.0 %)
 Number of data-nodes:		3
 Number of racks:		1
FSCK ended at Thu Aug 22 15:00:59 CST 2019 in 6 milliseconds
The filesystem under path '/blockrecover/blocktest.md' is HEALTHY
#获取block所在位置
root@ruozedata001 ~]# find ./ -name "*blk_1073741827_1003*"
./data/dfs/data/current/BP-1856248125-172.16.128.58-1566189843078/current/finalized/subdir0/subdir0/blk_1073741827_1003.meta
#进入
[root@ruozedata001 ~]# cd ./data/dfs/data/current/BP-1856248125-172.16.128.58-1566189843078/current/finalized/subdir0/subdir0
#查看
[root@ruozedata001 subdir0]# ll
total 32
-rw-r--r-- 1 root root 11005 Aug 21 10:04 blk_1073741825
-rw-r--r-- 1 root root    95 Aug 21 10:04 blk_1073741825_1001.meta
-rw-r--r-- 1 root root    18 Aug 21 10:52 blk_1073741826
-rw-r--r-- 1 root root    11 Aug 21 10:52 blk_1073741826_1002.meta
-rw-r--r-- 1 root root    10 Aug 22 14:59 blk_1073741827
-rw-r--r-- 1 root root    11 Aug 22 14:59 blk_1073741827_1003.meta
#删除blk_1073741827_1003 block 与副本
[root@ruozedata001 subdir0]# rm -rf blk_1073741827 blk_1073741827_1003.meta
[root@ruozedata001 subdir0]# ll
total 24
-rw-r--r-- 1 root root 11005 Aug 21 10:04 blk_1073741825
-rw-r--r-- 1 root root    95 Aug 21 10:04 blk_1073741825_1001.meta
-rw-r--r-- 1 root root    18 Aug 21 10:52 blk_1073741826
-rw-r--r-- 1 root root    11 Aug 21 10:52 blk_1073741826_1002.meta
#直接重启HDFS,直接模拟损坏效果,然后fsck检查:
-bash-4.2$ hdfs fsck /
Connecting to namenode via  http://ruozedata002:50070/fsck?ugi=root&path=%2F
FSCK started by hdfs (auth:SIMPLE) from /172.16.128.58 for path / at Thu Aug 22 15:15:55 CST 2019
.
/blockrecover/blocktest.md: Under replicated BP-1856248125-172.16.128.58-1566189843078:blk_1073741827_1003. Target Replicas is 3 but found 2 live replica(s), 0 d
ecommissioned replica(s), 0 decommissioning replica(s).
...............................................................................Sta
tus: HEALTHY
 Total size: 50194618424 B
 Total dirs: 354
 Total files: 1079
 Total symlinks: 0
 Total blocks (validated): 992 (avg. block size 50599413 B)
 Minimally replicated blocks: 992 (100.0 %)
 Over-replicated blocks: 0 (0.0 %)
 Under-replicated blocks: 1 (0.10080645 %)
 Mis-replicated blocks: 0 (0.0 %)
 Default replication factor: 3
 Average block replication: 2.998992
 Corrupt blocks: 0
 Missing replicas: 1 (0.033602152 %)
 Number of data-nodes: 3
 Number of racks: 1
FSCK ended at Sun Mar 03 16:02:04 CST 2019 in 148 milliseconds
The filesystem under path '/' is HEALTHY

3.手动修复

#修复前
[root@ruozedata001 subdir0]# ll
total 24
-rw-r--r-- 1 root root 11005 Aug 22 15:17 blk_1073741825
-rw-r--r-- 1 root root     6 Aug 22 15:23 blk_1073741825_1001.meta
-rw-r--r-- 1 root root    18 Aug 22 15:17 blk_1073741826
-rw-r--r-- 1 root root    11 Aug 22 15:17 blk_1073741826_1002.meta
#使用命令 hdfs debug 手动修复数据 
[root@ruozedata001 subdir0]# hdfs debug recoverLease -path /blockrecover/blocktest.md -retries 10
recoverLease SUCCEEDED on /blockrecover/blocktest.md
#修复后数据
[root@ruozedata001 subdir0]# ll
total 32
-rw-r--r-- 1 root root 11005 Aug 22 15:17 blk_1073741825
-rw-r--r-- 1 root root    95 Aug 22 15:23 blk_1073741825_1001.meta
-rw-r--r-- 1 root root    18 Aug 22 15:17 blk_1073741826
-rw-r--r-- 1 root root    11 Aug 22 15:17 blk_1073741826_1002.meta
-rw-r--r-- 1 root root    10 Aug 22 15:25 blk_1073741827
-rw-r--r-- 1 root root    11 Aug 22 15:25 blk_1073741827_1003.meta

4.hdfs自动修复

当数据块损坏后,DN节点执行directoryscan操作之前,都不会发现损坏;
也就是directoryscan操作是间隔6h
dfs.datanode.directoryscan.interval : 21600
在DN向NN进行blockreport前,都不会恢复数据块;
也就是blockreport操作是间隔6h
dfs.blockreport.intervalMsec : 21600000
当NN收到blockreport才会进行恢复操作。

总结

 ⽣产上本⼈⼀般倾向于使⽤ ⼿动修复⽅式,但是前提要⼿动删除损坏的block块。
切记,是删除损坏block⽂件和meta⽂件,⽽不是删除hdfs⽂件。
当然还可以先把⽂件get下载,然后hdfs删除,再对应上传。
切记删除不要执⾏: hdfs fsck / -delete 这是删除损坏的⽂件, 那么数据不就丢了嘛;除⾮⽆所谓丢数据,或
者有信⼼从其他地⽅可以补数据到hdfs!

你可能感兴趣的:(HDFS)