跑hive脚本也就是mr程序时遇到以下错误:
Ended Job = job_1406698610363_0394 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1406698610363_0394_m_000014 (and more) from job job_1406698610363_0394
Examining task ID: task_1406698610363_0394_m_000006 (and more) from job job_1406698610363_0394
Examining task ID: task_1406698610363_0394_m_000022 (and more) from job job_1406698610363_0394
Examining task ID: task_1406698610363_0394_m_000030 (and more) from job job_1406698610363_0394
Examining task ID: task_1406698610363_0394_m_000034 (and more) from job job_1406698610363_0394
Examining task ID: task_1406698610363_0394_m_000040 (and more) from job job_1406698610363_0394
Examining task ID: task_1406698610363_0394_m_000041 (and more) from job job_1406698610363_0394
Examining task ID: task_1406698610363_0394_m_000039 (and more) from job job_1406698610363_0394
Task with the most failures(4):
-----
Task ID:
task_1406698610363_0394_m_000008
URL:
http://e3basemaster2:50030/taskdetails.jsp?jobid=job_1406698610363_0394&tipid=task_1406698610363_0394_m_000008
-----
Diagnostic Messages for this Task:
Error: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-965200530-172.21.3.170-1400216975207:blk_-1762002543329523353_80392 file=/user/bdsdata/.staging/job_1406698610363_0394/job.split
at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:734)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:448)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:645)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:689)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:518)
at java.io.DataInputStream.readByte(DataInputStream.java:248)
at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329)
at org.apache.hadoop.io.Text.readString(Text.java:446)
at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:345)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:375)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:334)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:152)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:147)
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 159 Reduce: 10 Cumulative CPU: 551.26 sec HDFS Read: 2885868030 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 9 minutes 11 seconds 260 msec
错误很明显无法获得BP-965200530-172.21.3.170-1400216975207:blk_-1762002543329523353_80392 file=/user/bdsdata/.staging/job_1406698610363_0394/job.split
的数据,然后节点监控页面发现datanode已经死了两个(本来有三个datanode两个namenode),然后看了下hdfs-site.xml发现replication的值为2,可见是由于节点坏死导致无法获取数据,重启死掉的节点问题解决,当然也有可能是datanode的通信出现问题,但可能性比较小。