UDF函数
sum count … build-in 基础
线上业务,单单是用build-in是完成不了了
==> 扩展我们自己所需要的函数
迁移:RDBMS ==> 云化(大数据上来Hive/Spark)
基于已有的业务使用Hive语法改造
开发一个函数,和RDBMS是同名的
DIFF
UDF: User-Defined-Function
UDF:one-to-one upper substr(ename,....)
UDAF: User-Defined Aggregation Function sum count max
many-to-one
UDTF: User-Defined Table-Generating Function
one-to-many
you need to create a new class that extends UDF
with one or more methods named(方法的重载) evaluate
compiling your code to a jar
you need to add this to the Hive classpath (TODO…)
add jar …
Create Function
package com.ccj.pxj.udf;
import org.apache.hadoop.hive.ql.exec.UDF;
public class UDFRemoveRandomPrefix extends UDF {
public String evaluate(String name){
String result = name.substring(name.lastIndexOf("_")+1);
return result;
}
public static void main(String[] args) {
UDFRemoveRandomPrefix udf = new UDFRemoveRandomPrefix();
String result = udf.evaluate("1_pxj");
System.out.println(result);
}
}
package com.ccj.pxj.udf;
import org.apache.hadoop.hive.ql.exec.UDF;
import java.util.Random;
public class UDFAddRandomPrefix extends UDF {
public String evaluate(String name){
Random random = new Random();
int prefix = random.nextInt(10);
String result = prefix +"_"+ name;
return result;
}
public static void main(String[] args) {
UDFAddRandomPrefix udf = new UDFAddRandomPrefix();
for (int i = 0; i < 10; i++) {
String pk = udf.evaluate("pxj");
System.out.println(pk);
}
}
}
package com.ccj.pxj.udf;
import org.apache.hadoop.hive.ql.exec.UDF;
public class UDFRemoveRandomPrefix extends UDF {
public String evaluate(String name){
String result = name.substring(name.lastIndexOf("_")+1);
return result;
}
public static void main(String[] args) {
UDFRemoveRandomPrefix udf = new UDFRemoveRandomPrefix();
String result = udf.evaluate("1_pxj");
System.out.println(result);
}
}
hive (default)> add jar /home/pxj/lib/hivecode-1.0-SNAPSHOT.jar;
Added [/home/pxj/lib/hivecode-1.0-SNAPSHOT.jar] to class path
Added resources: [/home/pxj/lib/hivecode-1.0-SNAPSHOT.jar]
hive (default)> create temporary function pxj as 'com.ccj.pxj.udf.PxjUDF';
OK
Time taken: 0.01 seconds
hive (default)> select pxj(ename) from emp;
OK
_c0
smith
allen
ward
jones
martin
blake
clark
scott
king
turner
adams
james
ford
miller
hive
hive (default)> show functions;
OK
tab_name
pxj
create table a(
id int,
name string,
subject array<string>
) row format delimited fields terminated by ','
collection items terminated by ':'
;
此方法仅在当前session
[pxj@pxj /home/pxj/lib]$hadoop fs -mkdir -p /pxj60/lib
20/02/02 00:58:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[pxj@pxj /home/pxj/lib]$ll
总用量 856
-rw-r--r--. 1 pxj pxj 2647 1月 31 21:57 hivecode-1.0-SNAPSHOT.jar
-rw-r--r--. 1 pxj pxj 872303 1月 18 11:46 mysql-connector-java-5.1.27-bin.jar
[pxj@pxj /home/pxj/lib]$hadoop fs -put hivecode-1.0-SNAPSHOT.jar /pxj60/lib
20/02/02 01:04:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[pxj@pxj /home/pxj/lib]$
hive (default)> CREATE FUNCTION pxj AS 'com.ccj.pxj.udf.PxjUDF' USING JAR 'hdfs://pxj:9000/pxj60/lib/hivecode-1.0-SNAPSHOT.jar';
converting to local hdfs://pxj:9000/pxj60/lib/hivecode-1.0-SNAPSHOT.jar
Added [/tmp/c1c645e9-1770-468c-8fef-dcc874c260a1_resources/hivecode-1.0-SNAPSHOT.jar] to class path
Added resources: [hdfs://pxj:9000/pxj60/lib/hivecode-1.0-SNAPSHOT.jar]
OK
Time taken: 1.527 seconds
hive (default)> select pxj(ename) from emp;
OK
_c0
smith
allen
ward
jones
martin
blake
clark
scott
king
turner
adams
james
ford
miller
hive
Time taken: 3.848 seconds, Fetched: 15 row(s)
需要去修改Hive的源码的FunctionRegistry
system.registerUDF("pxj", PxjUDF.class, false);
Hive编译
1.1.0-cdh5.16.2 的源码下载
mvn clean package
hive
show functions
hive (default)> create table pxj_word(
> word string
> );
OK
Time taken: 0.097 seconds
hive (default)> select * from pxj_word;
OK
pxj_word.word
pxj,pxj,pxj
wfy,wfy
ccj
hive (default)> select
> word,
> count(1) cnt
> from
> (
> select explode(split(word,',')) word from pxj_word
> ) t
> group by word;
Total MapReduce CPU Time Spent: 3 seconds 600 msec
OK
word cnt
ccj 1
pxj 3
wfy 2
Time taken: 28.574 seconds, Fetched: 3 row(s)
/data/hive/mulit_file/1.txt
/data/hive/mulit_file/sub_dir/2.txt
==> 数据所在的文件夹可能下面还有文件夹
输入:/data/hive/mulit_file
wc
[pxj@pxj /home/pxj/app/hadoop/etc/hadoop]$vim mapred-site.xml
<property>
<name>mapreduce.input.fileinputformat.input.dir.recursive</name>
<value>true</value>
</property>
[pxj@pxj /home/pxj/app/hadoop/etc/hadoop]$hadoop fs -mkdir -p /data/hive/mulit_file
20/02/02 01:47:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[pxj@pxj /home/pxj/app/hadoop/etc/hadoop]$hadoop fs -mkdir -p /data/hive/mulit_file/sub_dir
20/02/02 01:48:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[pxj@pxj /home/pxj/app/hadoop]$hadoop jar \
> ./share/hadoop/mapreduce2/hadoop-mapreduce-examples-2.6.0-cdh5.16.2.jar \
> wordcount \
> /data/hive/mulit_file /data/hive/mulit_file/output
[pxj@pxj /home/pxj/app/hadoop]$hadoop fs -text /data/hive/mulit_file/output/part-r-00000
20/02/02 01:56:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
ccj 1
pxj 3
pxj,ccj 1
pxj,wfy 1
wfy 3
wi 1
wo 1
woo 1
作者:pxj(潘陈)
日期:2020-02-02 凌晨2:02