hive--UDF、UDAF

1、UDF
package com.example.hive.udf;



import org.apache.hadoop.hive.ql.exec.UDF;

import org.apache.hadoop.io.Text;



public final class Lower extends UDF {

  public Text evaluate(final Text s) {

    if (s == null) { return null; }

    return new Text(s.toString().toLowerCase());

  }

}
View Code

add jar my_jar.jar; 

create temporary function my_lower as 'com.example.hive.udf.Lower';  

主要描述了实现一个udf的过程,首先自然是实现一个UDF函数,然后编译为jar并加入到hive的classpath中,最后创建一个临时变量名字让hive中调用。

2、UDAF
package org.apache.hadoop.hive.contrib.udaf.example;



import org.apache.hadoop.hive.ql.exec.UDAF;

import org.apache.hadoop.hive.ql.exec.UDAFEvaluator;



/**

 * This is a simple UDAF that calculates average.

 * 

 * It should be very easy to follow and can be used as an example for writing

 * new UDAFs.

 * 

 * Note that Hive internally uses a different mechanism (called GenericUDAF) to

 * implement built-in aggregation functions, which are harder to program but

 * more efficient.

 * 

 */

public final class UDAFExampleAvg extends UDAF {



  /**

   * The internal state of an aggregation for average.

   * 

   * Note that this is only needed if the internal state cannot be represented

   * by a primitive.

   * 

   * The internal state can also contains fields with types like

   * ArrayList<String> and HashMap<String,Double> if needed.

   */

  public static class UDAFAvgState {

    private long mCount;

    private double mSum;

  }



  /**

   * The actual class for doing the aggregation. Hive will automatically look

   * for all internal classes of the UDAF that implements UDAFEvaluator.

   */

  public static class UDAFExampleAvgEvaluator implements UDAFEvaluator {



    UDAFAvgState state;



    public UDAFExampleAvgEvaluator() {

      super();

      state = new UDAFAvgState();

      init();

    }



    /**

     * Reset the state of the aggregation.

     */

    public void init() {

      state.mSum = 0;

      state.mCount = 0;

    }



    /**

     * Iterate through one row of original data.

     * 

     * The number and type of arguments need to the same as we call this UDAF

     * from Hive command line.

     * 

     * This function should always return true.

     */

    public boolean iterate(Double o) {

      if (o != null) {

        state.mSum += o;

        state.mCount++;

      }

      return true;

    }



    /**

     * Terminate a partial aggregation and return the state. If the state is a

     * primitive, just return primitive Java classes like Integer or String.

     */

    public UDAFAvgState terminatePartial() {

      // This is SQL standard - average of zero items should be null.

      return state.mCount == 0 ? null : state;

    }



    /**

     * Merge with a partial aggregation.

     * 

     * This function should always have a single argument which has the same

     * type as the return value of terminatePartial().

     */

    public boolean merge(UDAFAvgState o) {

      if (o != null) {

        state.mSum += o.mSum;

        state.mCount += o.mCount;

      }

      return true;

    }



    /**

     * Terminates the aggregation and return the final result.

     */

    public Double terminate() {

      // This is SQL standard - average of zero items should be null.

      return state.mCount == 0 ? null : Double.valueOf(state.mSum

          / state.mCount);

    }

  }



  private UDAFExampleAvg() {

    // prevent instantiation

  }



}
View Code

关于UDAF开发注意点:

1.需要import org.apache.hadoop.hive.ql.exec.UDAF以及org.apache.hadoop.hive.ql.exec.UDAFEvaluator,这两个包都是必须的

2.函数类需要继承UDAF类,内部类Evaluator实现UDAFEvaluator接口

3.Evaluator需要实现 init、iterate、terminatePartial、merge、terminate这几个函数

    1)init函数类似于构造函数,用于UDAF的初始化

    2)iterate接收传入的参数,并进行内部的轮转。其返回类型为boolean

    3)terminatePartial无参数,其为iterate函数轮转结束后,返回乱转数据,iterate和terminatePartial类似于hadoop的Combiner

    4)merge接收terminatePartial的返回结果,进行数据merge操作,其返回类型为boolean

    5)terminate返回最终的聚集函数结果

你可能感兴趣的:(hive)