在之前的一篇博文中,演示了一个使用通用UDTF来计算总分的小示例,下面用UDAF来做这个工作。
1.编写UDAF。
package com.wz.udf; import org.apache.hadoop.hive.ql.exec.UDAF; import org.apache.hadoop.hive.ql.exec.UDAFEvaluator; import org.apache.hadoop.io.Text; import java.util.HashMap; import java.util.Map; public class helloUDAF extends UDAF { public static class Evaluator implements UDAFEvaluator { //存放不同学生的总分 private static Map<String,Integer> ret; public Evaluator() { super(); init(); } //初始化 public void init() { ret = new HashMap<String,Integer>(); } //map阶段,遍历所有记录 public boolean iterate(String strStudent,int nScore) { if(ret.containsKey(strStudent)) { int nValue = ret.get(strStudent); nValue +=nScore; ret.put(strStudent,nValue); } else { ret.put(strStudent,nScore); } return true; } //返回最终结果 public Map<String,Integer> terminate() { return ret; } //combiner阶段,本例不需要 public Map<String,Integer> terminatePartial() { return ret; } //reduce阶段 public boolean merge(Map<String,Integer> other) { for (Map.Entry<String, Integer> e : other.entrySet()) { ret.put(e.getKey(),e.getValue()); } return true; } } }
2.编译并打包成jar包。
javac -classpath /home/wangzhun/hadoop/hadoop-0.20.2/hadoop-0.20.2-core.jar:/home/wangzhun/hive/hive-0.8.1/lib/hive-exec-0.8.1.jar helloUDAF.java
jar cvf helloUDAF.jar com/wz/udf/helloUDAF*.class
3.在hive下面调用,创建临时函数,并执行查询得到结果。
hive> add jar /home/wangzhun/hive/hive-0.8.1/lib/helloUDAF.jar; Added /home/wangzhun/hive/hive-0.8.1/lib/helloUDAF.jar to class path Added resource: /home/wangzhun/hive/hive-0.8.1/lib/helloUDAF.jar hive> create temporary function helloudaf as 'com.wz.udf.helloUDAF'; OK Time taken: 0.02 seconds hive> select helloudaf(studentScore.name,studentScore.score) from studentScore; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapred.reduce.tasks=<number> Starting Job = job_201311282251_0009, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201311282251_0009 Kill Command = /home/wangzhun/hadoop/hadoop-0.20.2/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:9001 -kill job_201311282251_0009 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2013-11-29 00:34:01,290 Stage-1 map = 0%, reduce = 0% 2013-11-29 00:34:04,316 Stage-1 map = 100%, reduce = 0% 2013-11-29 00:34:13,403 Stage-1 map = 100%, reduce = 100% Ended Job = job_201311282251_0009 MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 HDFS Read: 40 HDFS Write: 12 SUCESS Total MapReduce CPU Time Spent: 0 msec OK {"A":290,"B":325} Time taken: 32.275 seconds