gz压缩文件上传到hdfs,hive读取处理
Task with the most failures(4):
-----
Task ID:
task_1456816082333_1354_m_000339
URL:
http://xxxx:8088/taskdetails.jsp?jobid=job_1456816082333_1354&tipid=task_1456816082333_1354_m_000339
-----
Diagnostic Messages for this Task:
Error: java.io.IOException: java.io.EOFException: Unexpected end of input stream
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:344)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:122)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:197)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:183)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
Caused by: java.io.EOFException: Unexpected end of input stream
at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:145)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
at java.io.InputStream.read(InputStream.java:101)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:211)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:206)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:45)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:339)
... 13 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 358 Cumulative CPU: 5067.56 sec HDFS Read: 2749408714 HDFS Write: 2747850422 FAIL
Total MapReduce CPU Time Spent: 0 days 1 hours 24 minutes 27 seconds 560 msec
出现这个错误,明细是压缩文件损坏,处理方式应是删除hdfs对应的文件,重新上传
可hive表或者分区内,有几万个文件,识别不了是哪个压缩文件,想办法找出来,寻找方式:
执行hive任务的时候,进入到8088的map详细进度列表,即是RUNNING MAP attempts in job_1456816082333_1354,查看最后出错的map是哪个节点或者在页面直接点击logs进入详细log日志查看,或者进入到节点的hadoop的logs/userlogs目录,根据jobid找到对应的目录: application_1456816082333_1354,里面有错误的文件id,然后删除掉hdfs的对应的损坏文件。