Hive执行报错CannotObtainBlockLengthException: Cannot obtain block length for LocatedBlock

报错日志如下

Caused by: org.apache.hadoop.hdfs.CannotObtainBlockLengthException: Cannot obtain block length for LocatedBlock{BP-438308737--1615993069368:blk_1073893685_152906; getBlockSize()=949; corrupt=false; offset=0; locs=[DatanodeInfoWithStorage[:9866,DS-3c8778e1-fc6a-46f8-b774-a270aa727cce,DISK], DatanodeInfoWithStorage[:9866,DS-1cbdb041-f016-4b4a-8054-2f10171f2856,DISK], DatanodeInfoWithStorage[:9866,DS-843c0421-6683-4070-a6a1-1b761ac0ad28,DISK]]}
        at org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:364)

原因分析:
上面这个报错是我在执行hive查询的时候报的错,这是一个map reduce阶段读取数据时候报错,是读取文件的一个数据块异常的问题

我这里出现这个问题是flume写数据到hdfs要注意的一个地方,因为我之前关闭了hdfs一段时间,导致flume数据到hdfs文件没有正常关闭导致的,我们需要找到这些数据块,修复他们。

解决方法:


1. 通过命令hdfs fsck /user/hive -openforwrite 检查哪些文件处于打开写的过程一直未关闭(根据自己实际要检查的目录来,如果确定是哪个目录下的话,范围可以更精准),可以看到有一些文件处于打开状态。但是这里有些文件是当前正在写的文件,我们要处理的是前一天的异常文件,当天的文件是正常的不要处理。

hdfs fsck /user/hive -openforwrite
[root@bigdata-agent01 usr]# hdfs fsck /user/hive/warehouse/ods_tmp_t.db/o_flume_kafka_data_origin -openforwrite
Connecting to namenode via http://online-bigdata-manager:9870/fsck?ugi=root&openforwrite=1&path=%2Fuser%2Fhive%2Fwarehouse%2Fods_tmp_t.db%2Fo_flume_kafka_data_origin
FSCK started by root (auth:SIMPLE) from /10.88.10.65 for path /user/hive/warehouse/ods_tmp_t.db/o_flume_kafka_data_origin at Sun Mar 27 18:34:21 CST 2022
/user/hive/warehouse/ods_tmp_t.db/o_flume_kafka_data_origin/dt=2022-03-26/log_20220326_11.1648377005444.tmp 18469 bytes, replicated: replication=3, 1 block(s), OPENFORWRITE: /user/hive/warehouse/ods_tmp_t.db/o_flume_kafka_data_origin/dt=2022-03-27/log_20220327_18.1648377005927.tmp 19902 bytes, replicated: replication=3, 1 block(s), OPENFORWRITE:
Status: HEALTHY
 Number of data-nodes:  4
 Number of racks:               1
 Total dirs:                    69
 Total symlinks:                0

Replicated Blocks:
 Total size:    100606289256 B
 Total files:   18528
 Total blocks (validated):      18544 (avg. block size 5425274 B)
 Minimally replicated blocks:   18542 (99.98921 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     2.9996765
 Missing blocks:                0
 Corrupt blocks:                0
 Missing replicas:              0 (0.0 %)
 Blocks queued for replication: 0

Erasure Coded Block Groups:
 Total size:    0 B
 Total files:   0
 Total block groups (validated):        0
 Minimally erasure-coded block groups:  0
 Over-erasure-coded block groups:       0
 Under-erasure-coded block groups:      0
 Unsatisfactory placement block groups: 0
 Average block group size:      0.0
 Missing block groups:          0
 Corrupt block groups:          0
 Missing internal blocks:       0
 Blocks queued for replication: 0
FSCK ended at Sun Mar 27 18:34:21 CST 2022 in 210 milliseconds


The filesystem under path '/user/hive/warehouse/ods_tmp_t.db/o_flume_kafka_data_origin' is HEALTHY

2. 通过命令修复对应文件

hdfs debug recoverLease -path /user/hive/warehouse/ods_tmp_t.db/o_flume_kafka_data_origin/dt=2022-03-26/log_20220326_11.1648377005444.tmp  -retries 3

参考1

参考2

你可能感兴趣的:(数仓,大数据,hadoop,大数据,hadoop,hdfs,数据块,flume)