(1)eclipse上创建Map/Reduce工程,命名为GeoMeanPro,在创建前,先把hive/lib目录下的jar包复制到hadoop/lib目录下面;
(2)在创建的工程上添加class,新建包com.hive.geomean.udaf,并在包下建立GeoMean.java;
(3)GeoMean.java代码为:
package com.hive.geomean.udaf;
import org.apache.hadoop.hive.ql.exec.UDAF;
import org.apache.hadoop.hive.ql.exec.UDAFEvaluator;
import org.apache.hadoop.io.IntWritable;
public class GeoMean extends UDAF {
public static class GeoMeanUDAFEval implements UDAFEvaluator {
public static class PartialResult {
double sum;
long count;
}
private PartialResult pResult;
@Override
public void init() {
pResult = null;
}
//参数的入口函数
public boolean iterate(IntWritable value) {
if (value == null) {
return true;
}
if (pResult == null) {
pResult = new PartialResult();
pResult.sum = 1;
pResult.count = 0;
}
pResult.sum *= value.get();
pResult.count++;
return true;
}
public PartialResult terminatePartial() {
return pResult;
}
public boolean merge(PartialResult other) {
if (other == null) {
return true;
}
if (pResult == null) {
pResult = new PartialResult();
pResult.sum = 1;
pResult.count = 0;
}
pResult.sum *= other.sum;
pResult.count +=other.count;
return true;
}
public Double terminate() {
if (pResult == null) {
return null;
}
return new Double (Math.pow(pResult.sum, 1.0/pResult.count));
}
}
}
(4)将工程export出jar包,并命名为geomean.jar,然后上传到/home/hadoop/class目录下:
(5)Hive的UDAF使用方法如下:
hive> add jar /home/hadoop/class/geomean.jar;
Added /home/hadoop/class/geomean.jar to class path
Added resource: /home/hadoop/class/geomean.jar
hive> create temporary function geomean as 'com.hive.geomean.udaf.GeoMean';
OK
Time taken: 0.038 seconds
hive> select * from grade;
OK
1 90
2 80
3 70
Time taken: 0.112 seconds
hive> select geomean (grade) from grade;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapred.reduce.tasks=
Starting Job = job_201503221120_0057, Tracking URL = http://Masterpc.hadoop:50030/jobdetails.jsp?jobid=job_201503221120_0057
Kill Command = /usr/hadoop/libexec/../bin/hadoop job -kill job_201503221120_0057
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2015-03-23 22:36:57,988 Stage-1 map = 0%, reduce = 0%
2015-03-23 22:37:04,042 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.39 sec
2015-03-23 22:37:05,063 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.39 sec
。。。。。。
2015-03-23 22:37:22,264 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.87 sec
MapReduce Total cumulative CPU time: 3 seconds 870 msec
Ended Job = job_201503221120_0057
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 1 Cumulative CPU: 3.87 sec HDFS Read: 228 HDFS Write: 18 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 870 msec
OK
79.58114415792782
Time taken: 44.677 seconds
hive>