hadoop-2.2.0配合hive-0.12.0使用orc存储引发的bug

环境:
hadoop版本:hadoop-2.2.0 (官网下载并编译为64位版本)
hive版本:hive-0.12.0(官网下载后解压)
集群状态良好,尝试普通hive以及mapreduce均成功。

测试新版hive的orc存储格式,步骤如下:

create external table text_test (id string,text string)  row format delimited fields terminated by '\t' STORED AS textfile LOCATION '/user/hive/warehouse/text_test';

create external table orc_test (id string,text string) row format delimited fields terminated by '\t' STORED AS orc LOCATION '/user/hive/warehouse/orc_test';

hive> desc text_test;
OK
id                      string                  None                
text                    string                  None    

hive> desc orc_test;
OK
id                      string                  from deserializer   
text                    string                  from deserializer 

hive> select * from text_test;
OK
1       myew
2       ccsd
3       33

hive> insert overwrite table orc_test select * from text_test;
Total MapReduce jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1394433490694_0016, Tracking URL = http://zw-34-69:8088/proxy/application_1394433490694_0016/
Kill Command = /opt/hadoop/hadoop/bin/hadoop job  -kill job_1394433490694_0016
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-03-13 17:00:49,899 Stage-1 map = 0%,  reduce = 0%
2014-03-13 17:01:10,097 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_1394433490694_0016 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1394433490694_0016_m_000000 (and more) from job job_1394433490694_0016

Task with the most failures(4):
-----
Task ID:
  task_1394433490694_0016_m_000000

URL:
  http://zw-34-69:8088/taskdetails.jsp?jobid=job_1394433490694_0016&tipid=task_1394433490694_0016_m_000000
-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: Hive Runtime Error while closing operators
        at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:240)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
Caused by: java.lang.UnsupportedOperationException: This is supposed to be overridden by subclasses.
        at com.google.protobuf.GeneratedMessage.getUnknownFields(GeneratedMessage.java:180)
        at org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics.getSerializedSize(OrcProto.java:3046)
        at com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
        at com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
        at org.apache.hadoop.hive.ql.io.orc.OrcProto$RowIndexEntry.getSerializedSize(OrcProto.java:4129)
        at com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
        at com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
        at org.apache.hadoop.hive.ql.io.orc.OrcProto$RowIndex.getSerializedSize(OrcProto.java:4641)
        at com.google.protobuf.AbstractMessageLite.writeTo(AbstractMessageLite.java:75)
        at org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:548)
        at org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1328)
        at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1699)
        at org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:1868)
        at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:95)
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:181)
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:866)
        at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:596)
        at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613)
        at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613)
        at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613)
        at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:207)
        ... 8 more


FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 1   HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec


随后开始漫长的google、baidu、bing之旅,终于找到了解决办法: http://web.archiveorange.com/archive/v/S2z2uV6yqpmtC3rgpsrs
感谢两位前辈辛勤的研究。

总结一下问题原因:
编译hadoop-2.2.0时用的protobuf-2.5.0版本,而编译hive-0.12.0时用的protobuf-2.4.1版本,从而造成了冲突。
解决办法:
重新使用protobuf-2.5.0来编译hive-0.12.0

1. 安装protobuf:下载: https://code.google.com/p/protobuf/downloads/detail?name=protobuf-2.5.0.tar.gz
                               解压:tar -xzvf protobuf-2.5.0.tar.gz
                               进入:cd protobuf-2.5.0
                               编译安装:
  1. ./configure  
  2. make   
  3. make check   
  4. make install  (root权限)
2. 下载hive源码:svn checkout   http://svn.apache.org/repos/asf/hive/tags/release-0.12.0/
3. 安装ant:下载地址 http://ant.apache.org/bindownload.cgi  我下载的1.9.2版本 apache-ant-1.9.2-bin.tar.gz。
                 (1.9.3版本编译会报错 http://www.mailinglistarchive.com/html/[email protected]/2014-01/msg00009.html)
                   解压: tar -xzvf apache-ant-1.9.2-bin.tar.gz
                   配置Path:vi ~/.bash_profile
                                    export ANT_HOME=/opt/hadoop/apache-ant-1.9.2
                                    PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$ANT_HOME/bin:$PATH
                                    export PATH
                                    保存退出后执行: . ~/.bash_profile  使配置生效。
4. 更改ant编译时使用的protobuf版本:
                   修改release-0.12.0/ivy/libraries.properties文件,将protobuf.version=2.4.1修改为protobuf.version=2.5.0
5. 在hive目录中编译protobuf:cd release-0.12.0
                                                       ant protobuf
6. 编译hive:
                   在release-0.12.0目录下执行:ant clean package
                   漫长的等待......(要联网)
7. 编译好的内容在release-0.12.0/build/dist/中

回头执行:insert overwrite table orc_test select * from text_test;成功。

hive --orcfiledump

hive> select * from orc_test;
OK
1       myew
2       ccsd
3       33

hive --orcfiledump /user/hive/warehouse/orc_test/000000_0

Rows: 3
Compression: ZLIB
Compression size: 262144
Type: struct<_col0:string,_col1:string>

Statistics:
  Column 0: count: 3
  Column 1: count: 3 min: 1 max: 3
  Column 2: count: 3 min: 33 max: myew

Stripes:
  Stripe: offset: 3 data: 31 rows: 3 tail: 50 index: 59
    Stream: column 0 section ROW_INDEX start: 3 length 9
    Stream: column 1 section ROW_INDEX start: 12 length 23
    Stream: column 2 section ROW_INDEX start: 35 length 27
    Stream: column 1 section DATA start: 62 length 6
    Stream: column 1 section LENGTH start: 68 length 5
    Stream: column 2 section DATA start: 73 length 13
    Stream: column 2 section LENGTH start: 86 length 7
    Encoding column 0: DIRECT
    Encoding column 1: DIRECT_V2
    Encoding column 2: DIRECT_V2

你可能感兴趣的:(hadoop,hive)