Examining task ID: task_1468807885116_444914_r_000469 (and more) from job job_1468807885116_444914
Examining task ID: task_1468807885116_444914_r_000490 (and more) from job job_1468807885116_444914
Task with the most failures(1):
-----
Task ID:
task_1468807885116_444914_r_000075
URL:
http://hadoop-jr-nn02.pekdc1.jdfin.local:8088/taskdetails.jsp?jobid=job_1468807885116_444914&tipid=task_1468807885116_444914_r_000075
-----
Diagnostic Messages for this Task:
Task KILL is received. Killing attempt!
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 19008 Reduce: 501 Cumulative CPU: 173568.6 sec HDFS Read: 128356858364 HDFS Write: 2411144083 FAIL
Total MapReduce CPU Time Spent: 2 days 0 hours 12 minutes 48 seconds 600 msec
网上查阅资料发现code 143 一般是由于内存不足造成的,于是加大内存
set mapreduce.map.memory.mb=16384;
set mapreduce.map.java.opts=-Xmx13106M;
set mapred.map.child.java.opts=-Xmx13106M;
set mapreduce.reduce.memory.mb=16384;
set mapreduce.reduce.java.opts=-Xmx13106M;--reduce.memory*0.8
set mapreduce.task.io.sort.mb=512
修改之后发现还是报143,直觉感觉应该不是内存问题,查看日志发现Reduce任务很早就开始运行,最后一堆任务被master kill掉,怀疑是不是因为任务并行度太大的问题,于是修改reduce任务开始的执行时间
2016-07-29 11:01:55,483 Stage-1 map = 47%, reduce = 0%, Cumulative CPU 70616.67 sec
2016-07-29 11:02:02,433 Stage-1 map = 48%, reduce = 0%, Cumulative CPU 73860.9 sec
2016-07-29 11:02:03,594 Stage-1 map = 49%, reduce = 2%, Cumulative CPU 75543.31 sec
2016-07-29 11:02:04,735 Stage-1 map = 51%, reduce = 8%, Cumulative CPU 80289.2 sec
2016-07-29 11:02:11,624 Stage-1 map = 61%, reduce = 18%, Cumulative CPU 95621.34 sec
2016-07-29 11:02:12,772 Stage-1 map = 64%, reduce = 18%, Cumulative CPU 98355.46 sec
2016-07-29 11:02:30,284 Stage-1 map = 82%, reduce = 27%, Cumulative CPU 121605.43 sec
2016-07-29 11:02:33,633 Stage-1 map = 85%, reduce = 28%, Cumulative CPU 124751.68 sec
2016-07-29 11:02:39,631 Stage-1 map = 88%, reduce = 29%, Cumulative CPU 127854.76 sec
2016-07-29 11:02:44,473 Stage-1 map = 90%, reduce = 30%, Cumulative CPU 130540.72 sec
2016-07-29 11:02:48,743 Stage-1 map = 94%, reduce = 31%, Cumulative CPU 133265.67 sec
2016-07-29 11:05:10,438 Stage-1 map = 100%, reduce = 96%, Cumulative CPU 188791.68 sec
[Fatal Error] total number of created files now is 50062, which exceeds 50000. Killing the job.
MapReduce Total cumulative CPU time: 2 days 4 hours 26 minutes 31 seconds 680 msec
Ended Job = job_1468807885116_443473 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1468807885116_443473_m_005492 (and more) from job job_1468807885116_443473
Examining task ID: task_1468807885116_443473_m_003818 (and more) from job job_1468807885116_443473
Examining task ID: task_1468807885116_443473_m_001767 (and more) from job job_1468807885116_443473
………………
Examining task ID: task_1468807885116_443473_m_002210 (and more) from job job_1468807885116_443473
修改reduce任务从map完成80%后开始执行。
set mapreduce.job.reduce.slowstart.completedmaps=0.8
修改完成后,被maste kill的作业数量大幅减少,但是出现了新的错误。。。如下:
2016-07-29 12:44:32,119 Stage-1 map = 100%, reduce = 97%, Cumulative CPU 178926.55 sec
[Fatal Error] total number of created files now is 50074, which exceeds 50000. Killing the job.
MapReduce Total cumulative CPU time: 2 days 1 hours 42 minutes 6 seconds 550 msec
Ended Job = job_1468807885116_449213 with errors
Error during job, obtaining debugging information...
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
由于本任务是动态插入分区的任务,而由于数据量太大,动态分区产生的文件超过了集群的限制。
set hive.exec.max.created.files=60000;
最后增大生成文件数,成功解决问题。
附作业最终参数:
set mapred.output.compress=true;
set hive.exec.compress.output=true;
set mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec ;
set io.compression.codecs=com.hadoop.compression.lzo.LzopCodec ;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set mapreduce.map.memory.mb=16384;
set mapreduce.map.java.opts=-Xmx13106M;
set mapred.map.child.java.opts=-Xmx13106M;
set mapreduce.reduce.memory.mb=16384;
set mapreduce.reduce.java.opts=-Xmx13106M;
set mapreduce.job.reduce.slowstart.completedmaps=0.8;
set hive.exec.max.created.files=60000;
set hive.merge.mapredfiles=true;--在Map-Reduce的任务结束时合并小文件