hive GenericUDAF中的四种模式解析

模式的定义

apache-hive-1.2.1-src\apache-hive-1.2.1-src\ql\src\Java\org\apache\Hadoop\hive\ql\udf\generic\GenericUDAFEvaluator.java
原码如下:

 public static enum Mode {
    /**
     * PARTIAL1: from original data to partial aggregation data: iterate() and
     * terminatePartial() will be called.
     */
    PARTIAL1,
        /**
     * PARTIAL2: from partial aggregation data to partial aggregation data:
     * merge() and terminatePartial() will be called.
     */
    PARTIAL2,
        /**
     * FINAL: from partial aggregation to full aggregation: merge() and
     * terminate() will be called.
     */
    FINAL,
        /**
     * COMPLETE: from original data directly to full aggregation: iterate() and
     * terminate() will be called.
     */
    COMPLETE
  };

UDAF中需要实现的数据处理函数

iterate() 无返回值
terminatePartial() 有返回值
merge() 无返回值
terminate() 有返回值
无返回值的函数叫作aggregate
有返回值的函数叫作evaluate
在代码中的体现是

  /**
   * This function will be called by GroupByOperator when it sees a new input
   * row.
   * 
   * @param agg
   *          The object to store the aggregation result.
   * @param parameters
   *          The row, can be inspected by the OIs passed in init().
   */
  public void aggregate(AggregationBuffer agg, Object[] parameters) throws HiveException {
    if (mode == Mode.PARTIAL1 || mode == Mode.COMPLETE) {
      iterate(agg, parameters);
    } else {
      assert (parameters.length == 1);
      merge(agg, parameters[0]);
    }
  }


  /**
   * This function will be called by GroupByOperator when it sees a new input
   * row.
   * 
   * @param agg
   *          The object to store the aggregation result.
   */
  public Object evaluate(AggregationBuffer agg) throws HiveException {
    if (mode == Mode.PARTIAL1 || mode == Mode.PARTIAL2) {
      return terminatePartial(agg);
    } else {
      return terminate(agg);
    }
  }

四种模式下的输入数据,与可能调用的数据处理函数的关系如下图
hive GenericUDAF中的四种模式解析_第1张图片

每个模式下,输入数据的类型是不会变的,而调用的数据处理函数都有两种可能。
partial1的输入只可能是原始数据;
partial2的输入只可能是部分聚合结果;
final的输入是部分聚合数据;
complete的输入是原始数据;

terminatePartial()与terminate()的输入是有两种可能性的,要按照模式来区分处理。

你可能感兴趣的:(大数据技术)