Hive+UDAF简单示例

       在之前的一篇博文中,演示了一个使用通用UDTF来计算总分的小示例,下面用UDAF来做这个工作。

       1.编写UDAF。

         

package com.wz.udf;

import org.apache.hadoop.hive.ql.exec.UDAF;
import org.apache.hadoop.hive.ql.exec.UDAFEvaluator;
import org.apache.hadoop.io.Text;
import java.util.HashMap;
import java.util.Map;
public class helloUDAF extends UDAF {
    public static class Evaluator implements UDAFEvaluator
    {
       //存放不同学生的总分
       private static Map<String,Integer> ret;

       public Evaluator()
       {
	   super();
           init();
       }

       //初始化
       public void init()
       {
	  ret = new HashMap<String,Integer>();
       }

       //map阶段,遍历所有记录
       public boolean iterate(String strStudent,int nScore)
       { 
         if(ret.containsKey(strStudent))
         {
            int nValue = ret.get(strStudent);
            nValue +=nScore;
            ret.put(strStudent,nValue);
         }
         else
         {
           ret.put(strStudent,nScore);
         }
         return true;
       }
    
       //返回最终结果 
       public Map<String,Integer> terminate()
       {
         return ret;
       }

       //combiner阶段,本例不需要
       public Map<String,Integer> terminatePartial() 
       {
          return ret;
       }

       //reduce阶段
       public boolean merge(Map<String,Integer> other)
       {
            for (Map.Entry<String, Integer> e : other.entrySet()) {
                ret.put(e.getKey(),e.getValue());
            }
            return true;
       }
    }   
}


       2.编译并打包成jar包。

           javac -classpath /home/wangzhun/hadoop/hadoop-0.20.2/hadoop-0.20.2-core.jar:/home/wangzhun/hive/hive-0.8.1/lib/hive-exec-0.8.1.jar helloUDAF.java

          jar cvf helloUDAF.jar com/wz/udf/helloUDAF*.class

      3.在hive下面调用,创建临时函数,并执行查询得到结果。

        

hive> add jar /home/wangzhun/hive/hive-0.8.1/lib/helloUDAF.jar;                
Added /home/wangzhun/hive/hive-0.8.1/lib/helloUDAF.jar to class path
Added resource: /home/wangzhun/hive/hive-0.8.1/lib/helloUDAF.jar
hive> create temporary function helloudaf as 'com.wz.udf.helloUDAF';           
OK
Time taken: 0.02 seconds
hive> select helloudaf(studentScore.name,studentScore.score) from studentScore;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Starting Job = job_201311282251_0009, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201311282251_0009
Kill Command = /home/wangzhun/hadoop/hadoop-0.20.2/bin/../bin/hadoop job  -Dmapred.job.tracker=localhost:9001 -kill job_201311282251_0009
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2013-11-29 00:34:01,290 Stage-1 map = 0%,  reduce = 0%
2013-11-29 00:34:04,316 Stage-1 map = 100%,  reduce = 0%
2013-11-29 00:34:13,403 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201311282251_0009
MapReduce Jobs Launched: 
Job 0: Map: 1  Reduce: 1   HDFS Read: 40 HDFS Write: 12 SUCESS
Total MapReduce CPU Time Spent: 0 msec
OK
{"A":290,"B":325}
Time taken: 32.275 seconds



你可能感兴趣的:(java,hive,udaf)