Hive code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask问题修复

概述

当CDH升级到5.7.1时候引入HIVE BUG。具体情况如下:

对于一个存储格式为ORC的分区表,并且该表在填入数据以后还新增加了列。

场景如下:

create table foobar ( foo string, bar string ) partitioned by (dt string) stored as orc;
alter table foobar add partition( dt='20160620' ) ;
alter table foobar add columns(goo string );

错误复现后需当我们执行诸如如下SQL时候将会引发血案

--精确查询
select create_time, 
       real_app_id, 
       channel_id, 
       plugin_ver, 
       network_type, 
       plugin_package_name, 
       users
from dim.test a
where day_key='20160601'
and plugin_package_name='yahu';
--聚合操作
select count(1)
from dim.test a
where day_key='20160601'
and plugin_package_name='yahu';

诸如以上的操作都会引发如下错误:

Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

--查看日志错误信息如下:

Query ID = dbs_20160704162222_91d0eceb-c25b-4c68-a182-a7a5580bed2c
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1467533369404_5863, Tracking URL = http://master:8088/proxy/application_1467533369404_5863/
Kill Command = /opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/hadoop/bin/hadoop job  -kill job_1467533369404_5863
Hadoop job information for Stage-1: number of mappers: 16; number of reducers: 0
2016-07-04 16:23:01,939 Stage-1 map = 0%,  reduce = 0%
2016-07-04 16:23:30,006 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_1467533369404_5863 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1467533369404_5863_m_000009 (and more) from job job_1467533369404_5863
Examining task ID: task_1467533369404_5863_m_000012 (and more) from job job_1467533369404_5863
Examining task ID: task_1467533369404_5863_m_000006 (and more) from job job_1467533369404_5863
Examining task ID: task_1467533369404_5863_m_000010 (and more) from job job_1467533369404_5863
Examining task ID: task_1467533369404_5863_m_000002 (and more) from job job_1467533369404_5863

Task with the most failures(4): 
-----
Task ID:
  task_1467533369404_5863_m_000011

URL:
  http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1467533369404_5863&tipid=task_1467533369404_5863_m_000011
-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: Error creating a batch
    at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.createValue(VectorizedOrcInputFormat.java:111)
    at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.createValue(VectorizedOrcInputFormat.java:49)
    at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.createValue(CombineHiveRecordReader.java:83)
    at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.createValue(CombineHiveRecordReader.java:41)
    at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.createValue(HadoopShimsSecure.java:154)
    at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.createValue(MapTask.java:180)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: No type found for column type entry 13
    at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.addScratchColumnsToBatch(VectorizedRowBatchCtx.java:604)
    at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.createVectorizedRowBatch(VectorizedRowBatchCtx.java:339)
    at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.createValue(VectorizedOrcInputFormat.java:109)
    ... 13 more


FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 16   HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec

问题分析##:

参照大量文档资料后得出, 该问题是由于hive对ORC解析出错导致(进一步需要追踪分析源码验证),参照如下:

  • HIVE-10598
  • HIVE-11981

解决方案

对于涉及到的聚合操作

将count(1)改为count(*)或者count(1)

对于where过滤操作

这里暂时没有好的办法

详细错误日志:


2016-07-04 17:22:57,241 FATAL [IPC Server handler 4 on 60625] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1467533369404_5970_m_000001_0 - exited : java.lang.RuntimeException: Error creating a batch
    at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.createValue(VectorizedOrcInputFormat.java:111)
    at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.createValue(VectorizedOrcInputFormat.java:49)
    at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.createValue(CombineHiveRecordReader.java:83)
    at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.createValue(CombineHiveRecordReader.java:41)
    at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.createValue(HadoopShimsSecure.java:154)
    at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.createValue(MapTask.java:180)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: No type found for column type entry 13
    at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.addScratchColumnsToBatch(VectorizedRowBatchCtx.java:604)
    at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.createVectorizedRowBatch(VectorizedRowBatchCtx.java:339)
    at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.createValue(VectorizedOrcInputFormat.java:109)
    ... 13 more

2016-07-04 17:22:57,241 INFO [IPC Server handler 4 on 60625] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1467533369404_5970_m_000001_0: Error: java.lang.RuntimeException: Error creating a batch
    at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.createValue(VectorizedOrcInputFormat.java:111)
    at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.createValue(VectorizedOrcInputFormat.java:49)
    at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.createValue(CombineHiveRecordReader.java:83)
    at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.createValue(CombineHiveRecordReader.java:41)
    at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.createValue(HadoopShimsSecure.java:154)
    at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.createValue(MapTask.java:180)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: No type found for column type entry 13
    at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.addScratchColumnsToBatch(VectorizedRowBatchCtx.java:604)
    at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.createVectorizedRowBatch(VectorizedRowBatchCtx.java:339)
    at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.createValue(VectorizedOrcInputFormat.java:109)
    ... 13 more

你可能感兴趣的:(Hadoop-Hive)