记录一次hdfs存储异常

报错信息

[2022-03-02 09:54:52,932] {bash_operator.py:123} INFO - 22/03/02 09:54:52 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on hadoop-spark2:38546 (size: 4.3 KB, free: 366.3 MB)
[2022-03-02 09:54:52,933] {bash_operator.py:123} INFO - 22/03/02 09:54:52 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on hadoop-spark2:35910 (size: 4.3 KB, free: 366.3 MB)
[2022-03-02 09:54:53,155] {bash_operator.py:123} INFO - 22/03/02 09:54:53 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on hadoop-spark2:35824 (size: 4.3 KB, free: 366.3 MB)
[2022-03-02 09:54:53,773] {bash_operator.py:123} INFO - 22/03/02 09:54:53 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 6332 ms on hadoop-spark6 (executor 1) (1/12)
[2022-03-02 09:54:53,774] {bash_operator.py:123} INFO - 22/03/02 09:54:53 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 5678 ms on hadoop-spark6 (executor 2) (2/12)
[2022-03-02 09:54:54,584] {bash_operator.py:123} INFO - 22/03/02 09:54:54 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on hadoop-spark2:38546 (size: 31.7 KB, free: 366.3 MB)
[2022-03-02 09:54:54,732] {bash_operator.py:123} INFO - 22/03/02 09:54:54 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on hadoop-spark2:35910 (size: 31.7 KB, free: 366.3 MB)
[2022-03-02 09:54:54,914] {bash_operator.py:123} INFO - 22/03/02 09:54:54 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on hadoop-spark2:35824 (size: 31.7 KB, free: 366.3 MB)
[2022-03-02 09:54:55,028] {bash_operator.py:123} INFO - 22/03/02 09:54:55 INFO scheduler.TaskSetManager: Finished task 6.0 in stage 0.0 (TID 6) in 4827 ms on hadoop-spark3 (executor 12) (3/12)
[2022-03-02 09:54:55,989] {bash_operator.py:123} INFO - 22/03/02 09:54:55 INFO scheduler.TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 5825 ms on hadoop-spark3 (executor 11) (4/12)
[2022-03-02 09:54:56,153] {bash_operator.py:123} INFO - 22/03/02 09:54:56 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 7812 ms on hadoop-spark2 (executor 3) (5/12)
[2022-03-02 09:54:57,482] {bash_operator.py:123} INFO - 22/03/02 09:54:57 INFO scheduler.TaskSetManager: Finished task 7.0 in stage 0.0 (TID 7) in 7125 ms on hadoop-spark2 (executor 5) (6/12)
[2022-03-02 09:54:57,640] {bash_operator.py:123} INFO - 22/03/02 09:54:57 WARN scheduler.TaskSetManager: Lost task 11.0 in stage 0.0 (TID 11, hadoop-spark2, executor 9): java.io.IOException: Cannot obtain block length for LocatedBlock{BP-779817639-172.16.7.104-1520944478795:blk_1664682890_590950773; getBlockSize()=37182; corrupt=false; offset=0; locs=[DatanodeInfoWithStorage[172.16.7.107:50010,DS-47215880-d30c-430d-a939-774339e8c1e7,DISK], DatanodeInfoWithStorage[172.16.7.108:50010,DS-3ef8e36a-fb16-4799-ac8d-a5ef7b717dbb,DISK], DatanodeInfoWithStorage[172.16.17.214:50010,DS-0db3957b-91f9-4f07-a741-76ee95660781,DISK]]}
[2022-03-02 09:54:57,640] {bash_operator.py:123} INFO -     at org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:431)
[2022-03-02 09:54:57,640] {bash_operator.py:123} INFO -     at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:337)
[2022-03-02 09:54:57,640] {bash_operator.py:123} INFO -     at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:273)
[2022-03-02 09:54:57,640] {bash_operator.py:123} INFO -     at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:265)
[2022-03-02 09:54:57,640] {bash_operator.py:123} INFO -     at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1616)
[2022-03-02 09:54:57,640] {bash_operator.py:123} INFO -     at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:338)
[2022-03-02 09:54:57,641] {bash_operator.py:123} INFO -     at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:334)
[2022-03-02 09:54:57,641] {bash_operator.py:123} INFO -     at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)

处理过程

1.通过命令hdfs fsck /data/logs/ -openforwrite 检查哪些文件处于打开写的过程一直未关闭;发现昨天重启的flume导致hdfs块文件异常。

hdfs fsck /flume_log/backend/ -openforwrite
如果冗余信息过多,可以使用grep命令过滤

2.通过命令hdfs debug recoverLease -path /flume_log/backend/2021-09-12/test.1631414795673.log -retries 3
修复该文件即可。

hdfs debug recoverLease -path /flume_log/backend/2021-09-12/test.1631414795673.log -retries 3

参考文档:https://blog.csdn.net/jiandequn/article/details/103292966

你可能感兴趣的:(记录一次hdfs存储异常)