16/02/19 01:45:51 INFO mapreduce.Job: map 100% reduce 14%
16/02/19 01:47:13 INFO mapreduce.Job: map 100% reduce 15%
16/02/19 01:56:38 INFO mapreduce.Job: map 100% reduce 16%
16/02/19 02:03:24 INFO mapreduce.Job: Task Id : attempt_1455782779710_0002_r_000000_2, Status : FAILED
Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in OnDiskMerger - Thread to merge on-disk map-outputs
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for hadoop/local/usercache/hadoop/appcache/application_1455782779710_0002/output/attempt_1455782779710_0002_r_000000_2/map_1261.out.merged
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:402)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)
at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl$OnDiskMerger.merge(MergeManagerImpl.java:536)
at org.apache.hadoop.mapreduce.task.reduce.MergeThread.run(MergeThread.java:94)
16/02/19 02:03:25 INFO mapreduce.Job: map 100% reduce 0%
16/02/19 02:07:01 INFO mapreduce.Job: map 100% reduce 100%
16/02/19 02:07:03 INFO mapreduce.Job: Job job_1455782779710_0002 failed with state FAILED due to: Task failed task_1455782779710_0002_r_000000
Job failed as tasks failed. failedMaps:0 failedReduces:1
16/02/19 02:07:03 INFO mapreduce.Job: Counters: 41
File System Counters
FILE: Number of bytes read=593719782228
FILE: Number of bytes written=1187744681008
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=419668257188
HDFS: Number of bytes written=0
HDFS: Number of read operations=9384
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Failed map tasks=4
Failed reduce tasks=4
Killed map tasks=1
Launched map tasks=3133
Launched reduce tasks=4
Other local map tasks=4
Data-local map tasks=3128
Rack-local map tasks=1
Total time spent by all maps in occupied slots (ms)=1313120912
Total time spent by all reduces in occupied slots (ms)=233555400
Total time spent by all map tasks (ms)=164140114
Total time spent by all reduce tasks (ms)=29194425
Total vcore-seconds taken by all map tasks=164140114
Total vcore-seconds taken by all reduce tasks=29194425
Total megabyte-seconds taken by all map tasks=1231050855000
Total megabyte-seconds taken by all reduce tasks=239160729600
Map-Reduce Framework
Map input records=17406457000
Map output records=17406457000
Map output bytes=558906793186
Map output materialized bytes=593719725954
Input split bytes=320002
Combine input records=0
Spilled Records=34812914000
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=880264
CPU time spent (ms)=160551590
Physical memory (bytes) snapshot=6802434662400
Virtual memory (bytes) snapshot=26562819182592
Total committed heap usage (bytes)=5921830338560
File Input Format Counters
Bytes Read=419667937186
任务名称:ConvertTime
任务成功:否
输入行数:17406457000
输出行数:17406457000
跳过的行:0
任务开始:2016-02-18 16:25:07
任务结束:2016-02-19 02:07:03
任务耗时:581.9432 分钟
real 581m58.079s
user 1m3.796s
sys 1m43.812s
出错后,我将文件分成三份,并以三个任务分别处理, 没有报错,但是按我理解应该是有办法一次处理的,哪怕慢一些,这个问题不知道有没有人遇到过?