线上使用cdh4.6.0和hive0.13.1,在hive0.11之后开始支持orcfile,hive0.13.1使用的是pb2.5.0,cdh4.6.0是用的2.4.0a,线上测试orcfile,建表正常,但是插入数据时报错报错:

java.lang.VerifyError: class org.apache.hadoop.hive.ql.io.orc.OrcProto$RowIndex overrides final method getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet;
                at java.lang.ClassLoader.defineClass1(Native Method)
                at java.lang.ClassLoader.defineClass(Unknown Source)
                at java.security.SecureClassLoader.defineClass(Unknown Source)
                at java.net.URLClassLoader.defineClass(Unknown Source)
                at java.net.URLClassLoader.access$100(Unknown Source)
                at java.net.URLClassLoader$1.run(Unknown Source)
                at java.net.URLClassLoader$1.run(Unknown Source)
                at java.security.AccessController.doPrivileged(Native Method)
                at java.net.URLClassLoader.findClass(Unknown Source)
                at java.lang.ClassLoader.loadClass(Unknown Source)
                at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
                at java.lang.ClassLoader.loadClass(Unknown Source)
                at org.apache.hadoop.hive.ql.io.orc.WriterImpl.(WriterImpl.java:129)
                at org.apache.hadoop.hive.ql.io.orc.OrcFile.createWriter(OrcFile.java:369)
                at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:103)
                at org.apache.hadoop.hive.ql.exec.Utilities.createEmptyFile(Utilities.java:3065)
                at org.apache.hadoop.hive.ql.exec.Utilities.createDummyFileForEmptyPartition(Utilities.java:3089)
                at org.apache.hadoop.hive.ql.exec.Utilities.getInputPaths(Utilities.java:3013)
                at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:369)
                at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136)
                at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
                at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
                at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:72)

从错误可以看出是由于pb兼容问题导致,尝试使用pb2.4.0a编译hive0.13.1:

1.pom.xml文件:

更改

2.5.0

2.4.0a

2.运行mvn进行编译:

mvn clean package -DskipTests -Phadoop-2 -Pdist

发现报如下错误:

[ERROR] /home/caiguangguang/hive_0.13_debug/ql/src/gen/protobuf/gen-java/
org/apache/hadoop/hive/ql/io/orc/OrcProto.java:[9828,38] cannot find symbolsymbol  
: class Parserlocation: package com.google.protobuf[ERROR] 
/home/caiguangguang/hive_0.13_debug/ql/src/gen/protobuf/gen-java/org/apache/hadoop
/hive/ql/io/orc/OrcProto.java:[9839,31] cannot find symbolsymbol  : 
class Parserlocation: package com.google.protobuf
....
[ERROR] /home/caiguangguang/hive_0.13_debug/ql/src/gen/protobuf/gen-java/
org/apache/hadoop/hive/ql/io/orc/OrcProto.java:[169,9] getUnknownFields() 
in org.apache.hadoop.hive.ql.io.orc.OrcProto.IntegerStatistics cannot override 
getUnknownFields() in com.google.protobuf.GeneratedMessage; overridden method 
is final[ERROR] /home/caiguangguang/hive_0.13_debug/ql/src/gen/protobuf/
gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java:[189,20] 
cannot find symbolsymbol  : method parseUnknownField
(com.google.protobuf.CodedInputStream,com.google.protobuf.UnknownFieldSet.
Builder,com.google.protobuf.ExtensionRegistryLite,int)

3.这里

hive_0.13_debug/ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java

是由protoc2.5.0生成的,但是jar包却是2.4.0a的,因此会报错

4.考虑使用2.4.0a的protoc手动生成java代码

protoc --version
libprotoc 2.4.0a

这里主要涉及两个proto文件:

protoc ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto --java_out=ql/src/gen/protobuf/gen-java/
protoc ./hcatalog/storage-handlers/hbase/src/protobuf/org/apache/hcatalog/hbase/snapshot/RevisionManagerEndpoint.proto
 --java_out=hcatalog/storage-handlers/hbase/src/gen-java/

5.重新编译成功,将hive-exec jar包解压后,可以看到pom.properties中pb的版本已经是2.4.0a了:

META-INF/maven/com.google.protobuf/protobuf-java/pom.properties
#Generated by Maven
#Thu Mar 10 11:50:31 CST 2011
version=2.4.0a
groupId=com.google.protobuf
artifactId=protobuf-java

更换hive-exec jar包之后,测试向orc file的表插入数据也正常。