hive udaf 例子

查找单列数据的最大值


1、源码

package com.hive.udaf;

import org.apache.hadoop.hive.ql.exec.UDAF;
import org.apache.hadoop.hive.ql.exec.UDAFEvaluator;
import org.apache.hadoop.io.IntWritable;

public class Maximum extends UDAF {

	public static class MaximumIntUDAFEvaluator implements UDAFEvaluator {
		private IntWritable result;

		public void init() {
			result = null;
		}

		public boolean iterate(IntWritable value) {
			if (value == null) {
				return true;
			}
			if (result == null) {
				result = new IntWritable(value.get());
			} else {
				result.set(Math.max(result.get(), value.get()));
			}
			return true;
		}

		public IntWritable terminatePartial() {
			return result;
		}

		public boolean merge(IntWritable other) {
			return iterate(other);
		}

		public IntWritable terminate() {
			return result;
		}
	}

}

2、导出jar包


3、在hive中注册jar包

hive --auxpath /usr/hive/myjar/


4、为Java的类名起一个别名

hive> create temporary function maxvalue as "com.hive.udaf.Maximum";


5、测试方法

hive> select age from userinfos;

10
20
30
56
60
70
80
88


hive> select maxvalue(age) from userinfos;

结果为:88


6、错误总结

刚开始userinfos表结构直接由sqoop导入,在执行时报错误信息如下:

hive> select maxvalue(age) from userinfos;
FAILED: NoMatchingMethodException No matching method for class com.hive.udaf.Maximum with (bigint). Possible choices: _FUNC_(int)


该信息提示参数类型不正确,查看hive表结构,发现直接由sqoop导入的表结构,字段类型发生了变化,age字段在mysql表中的类型为integer,同步到hive表后,age字段类型变成了bigint,将userinfos表删除后,手工创建,导入数据后测试通过!


用sqoop把mysql表结构同步到hive后,字段类型发生变化,有待研究,哪位同仁知道,请留言,thanks;


你可能感兴趣的:(hive udaf 例子)