在大数据项目,使用hive时会遇到某些场景需要自定义函数,如日期处理:
那么如何在hive中要自定udf函数呢? 只要定义一个类Myudf 继承org.apache.hadoop.hive.ql.udf.generic.GenericUDF,然后打成Jar包, 发送到hive所在机器或hdfs, 登录hive, 使用命令创建函数:
create function myfun1 as "包名.Myudf" using jar "hdfs:/jars/myudf_jar";
GenericUDF 类的文档描述为:
它有三个抽象方法必须实现(getDisplayString, initialize, evalute ),具体使用如下:
初始化判断: initialize方法
类型判断 | ois[0].getCategory() != ObjectInspector.Category.PRIMITIVE | (((PrimitiveObjectInspector) ois[0])).getPrimitiveCategory() != PrimitiveObjectInspector.PrimitiveCategory.INT、STRING、LONG |
---|---|---|
参数–(映射)–转换器 | 准备转换器数组: converters = new ObjectInspectorConverters.Converter[4] | ObjectInspectorConverters.getConverter(ois[0], PrimitiveObjectInspectorFactory.javaIntObjectInspector、 javaLongObjectInspector、 javaStringObjectInspector ); |
在hive中被调用的方法: evaluate , 解析参数生成新的数据
解析参数 | 返回结果 |
---|---|
int offset = (Integer)converters[0].convert(arguments[0].get()) | return XXX |
类FormatByDayUDF extends GenericUDF { // initialize, evalute
//属性
ObjectInspectorConverters.Converter[] converters = new ObjectInspectorConverters.Converter[4]
//方法1
ObjectInspector initialize(ObjectInspector[] ois){
//判断: 参数类型
ois[0].getCategory() != ObjectInspector.Category.PRIMITIVE
(((PrimitiveObjectInspector) ois[0])).getPrimitiveCategory() != PrimitiveObjectInspector.PrimitiveCategory.INT、STRING、LONG) //类型
//参数 <--->参数转换器
converters[0] = ObjectInspectorConverters.getConverter(ois[0], PrimitiveObjectInspectorFactory.javaIntObjectInspector、 javaLongObjectInspector、 javaStringObjectInspector );
}
//方法2
Object evaluate(DeferredObject[] arguments){
//一个参数: formatDay(-1)
int offset = (Integer)converters[0].convert(arguments[0].get()) ;
Calendar cal = Calendar.getInstance();
cal.setTime( new Date() ) ;
cal.add(Calendar.DAY_OF_MONTH , offset);//计算日期: 前n天, 后n天 [long]
return new SimpleDateFormat( "yyyyMMdd" ) .format(cal.getTime()) ;//日期格式化:【“2018 09 01”】
//两个参数: formatDay(-1 , 'yyyy/MM/dd')
//三个参数: formatDay(154xxxx ,-1 , 'yyyy/MM/dd')
//四个参数: formatDay('2018/12/12' , 'yyyy/MM/dd', -1 , 'yyyy/MM/dd')
}
}
String formatDay(Date date , int offset , String fmt){ //前n天, 后n天
Calendar cal = Calendar.getInstance();
cal.setTime(date) ;
//在day的成分上进行累加累减
cal.add(Calendar.DAY_OF_MONTH , offset);
return new SimpleDateFormat(fmt).format(cal.getTime()) ;
}
String formatWeek(Date date , int offset , String fmt){ //上周几, 下周几
Calendar cal = Calendar.getInstance();
cal.setTime(date) ;
//取出指定date位于当前周的第几天
int n = cal.get(Calendar.DAY_OF_WEEK) ;
//在day的成分上进行累加累减
cal.add(Calendar.DAY_OF_MONTH , -(n - 1));
cal.add(Calendar.DAY_OF_MONTH , offset * 7);
}
String formatMonth(Date date , int offset , String fmt){ //上n月, 下n月
Calendar cal = Calendar.getInstance();
cal.setTime(date) ;
cal.add(Calendar.MONTH , offset);
return new SimpleDateFormat(fmt) .format(cal.getTime()) ;
}
活跃统计:日活,周活,月活
select
count( distinct
concat(
appid,
deviceid,
appplatform,
brand,
devicestyle,
ostype,
appversion
)
)
from appstartuplogs
where appid is not null
and deviceid is not null
and concat(ym,day)=formatbyday(-1,"yyyyMMdd"); #指定日活统计:
# and concat(ym,day)=formatbyweek(-1,"yyyyMMdd"); #指定周活统计:
# and concat(ym,day)=formatbymonth(-1,"yyyyMMdd"); #指定月活统计