hive 优化遇到的一个问题:hive.auto.convert.join

hive的join 有一种优化的方式:map join

但是,使用这种优化的时候要小心一点,先说一下优化配置的参数:

set hive.optimize.correlation=true
set hive.auto.convert.join=true

当运行一个比较大的join时候,出现了下面的问题:

at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671)
        at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
        ... 8 more
Caused by: java.lang.ArrayIndexOutOfBoundsException
        at java.lang.System.arraycopy(Native Method)
        at org.apache.hadoop.io.Text.set(Text.java:225)
        at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryString.init(LazyBinaryString.java:48)
        at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.uncheckedGetField(LazyBinaryStruct.java:216)
        at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:197)
        at org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:61)
        at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.evaluate(ExprNodeColumnEvaluator.java:98)
        at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:234)
        at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832)
        at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:90)
        at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832)
        at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:652)
        ... 9 more
网上查了一圈,貌似这还是个bug:

https://issues.apache.org/jira/i#browse/HIVE-4502

将 hive.auto.convert.join 设置成false,重新运行,问题就不出现了。

有一篇文件可以看一下:

http://www.gemini5201314.net/hadoop/hadoop-%E4%B8%AD%E7%9A%84%E4%B8%A4%E8%A1%A8join.html

hive 0.11 版的bug 也要注意一下。



你可能感兴趣的:(hive)