官网实例 :
第一步:
idea创建maven项目,并在pom中添加依赖:
4.0.0
ramos
hive-udf-test
0.0.1-SNAPSHOT
jar
hive-udf-test
http://maven.apache.org
UTF-8
2.6.0
1.1.0
junit
junit
3.8.1
test
org.apache.hive
hive-exec
${hive.version}
org.apache.hive
hive-jdbc
${hive.version}
org.apache.hadoop
hadoop-common
${hadoop.version}
org.apache.maven.plugins
maven-compiler-plugin
1.7
代码:
package ramos.hive_udf_test;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;
/**
* LowerUDF
* 不可以返回void,可以为null。
*
*/
public class LowerUDF extends UDF
{
public Text evaluate(Text str){
//validate
if(null == str.toString()){
return null;
}
//lower
return new Text(str.toString().toLowerCase());
}
public static void main( String[] args )
{
// System.out.println( "Hello World!" );
System.out.println(new LowerUDF().evaluate(new Text("HIVE")));
}
}
打包:
[root@sparkproject1 hiveTestJar]# ll
total 4
-rw-r--r-- 1 root root 3092 Jun 9 2019 hive-udf-test-0.0.1-SNAPSHOT.jar
[root@sparkproject1 hiveTestJar]#
[root@sparkproject1 hiveTestJar]#
[root@sparkproject1 hiveTestJar]# pwd
/usr/local/hive/hiveTestJar
[root@sparkproject1 hiveTestJar]#
hive中 添加jar:
add jar /usr/local/hive/hiveTestJar/hive-udf-test-0.0.1-SNAPSHOT.jar;
create temporary function my_lower as "ramos.hive_udf_test.LowerUDF";
hive> add jar /usr/local/hive/hiveTestJar/hive-udf-test-0.0.1-SNAPSHOT.jar;
Usage: add [FILE|JAR|ARCHIVE] []*
Query returned non-zero code: 1, cause: null
hive>
参考 :https://blog.csdn.net/u011495642/article/details/84327256
hive> dfs -put /usr/local/hive/hiveTestJar/hive-udf-test-0.0.1-SNAPSHOT.jar /user/hive/warehouse/hive-udf-test-0.0.1-SNAPSHOT.jar;
hive>
将har文件 上传到hdfs(我试了本地的路径一直报错,注意目录一定要正确!!)
dfs -put /usr/local/hive/hiveTestJar/hive-udf-test-0.0.1-SNAPSHOT.jar /user/hive/warehouse/hive-udf-test-0.0.1-SNAPSHOT.jar;
创建并注册函数:
例1
create function my_lower as'ramos.hive_udf_test.LowerUDF' using jar 'hdfs:///user/hive/warehouse/hive-udf-test-0.0.1-SNAPSHOT.jar';
例2
hive>
>
> create function hive2kafka as'ramos.hive_udf_test.Hive2Kafka' using jar 'hdfs:///user/hive/warehouse/hive-udf-test-0.0.1.jar';
converting to local hdfs:///user/hive/warehouse/hive-udf-test-0.0.1.jar
Added /tmp/a0a81812-1c55-4ee5-921d-c118691ef134_resources/hive-udf-test-0.0.1.jar to class path
Added resource: /tmp/a0a81812-1c55-4ee5-921d-c118691ef134_resources/hive-udf-test-0.0.1.jar
OK
Time taken: 2.576 seconds
hive>
hive> dfs -put /usr/local/hive/hiveTestJar/hive-udf-test-0.0.1-SNAPSHOT.jar /user/hive/warehouse/test-jar/hive-udf-test-0.0.1-SNAPSHOT.jar;
put: Parent path is not a directory: /user/hive/warehouse/test-jar test-jar
Command failed with exit code = 1
Query returned non-zero code: 1, cause: null
hive> dfs -put /usr/local/hive/hiveTestJar/hive-udf-test-0.0.1-SNAPSHOT.jar /user/hive/warehouse/hive-udf-test-0.0.1-SNAPSHOT.jar;
hive> create function my_lower as'ramos.hive_udf_test.LowerUDF' using jar '/user/hive/warehouse/hive-udf-test-0.0.1-SNAPSHOT.jar';
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask. Hive warehouse is non-local, but /user/hive/warehouse/hive-udf-test-0.0.1-SNAPSHOT.jar specifies file on local filesystem. Resources on non-local warehouse should specify a non-local scheme/path
hive>
>
>
> create function my_lower as'ramos.hive_udf_test.LowerUDF' using jar 'hdfs:///user/hive/warehouse/hive-udf-test-0.0.1-SNAPSHOT.jar';
converting to local hdfs:///user/hive/warehouse/hive-udf-test-0.0.1-SNAPSHOT.jar
Added /tmp/1214f72c-84fd-42f0-a04f-14dae9ae003e_resources/hive-udf-test-0.0.1-SNAPSHOT.jar to class path
Added resource: /tmp/1214f72c-84fd-42f0-a04f-14dae9ae003e_resources/hive-udf-test-0.0.1-SNAPSHOT.jar
OK
Time taken: 0.456 seconds
hive>
查看是否有了自定义的函数:
show functions
使用自定义的函数:
hive> select default.my_lower(name) lowername from student2;
FAILED: SemanticException [Error 10001]: Line 1:45 Table not found 'student2'
hive> select default.my_lower(name) lowername from db_hive_edu.student2;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1559981808383_0012, Tracking URL = http://sparkproject1:8088/proxy/application_1559981808383_0012/
Kill Command = /usr/local/hadoop/bin/hadoop job -kill job_1559981808383_0012
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-06-09 05:44:34,951 Stage-1 map = 0%, reduce = 0%
2019-06-09 05:44:42,765 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.75 sec
MapReduce Total cumulative CPU time: 1 seconds 750 msec
Ended Job = job_1559981808383_0012
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 1.75 sec HDFS Read: 431 HDFS Write: 71 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 750 msec
OK
zhangsan2
lisi2
wangwu2
zhaoliu
zhangsan2
lisi2
wangwu2
zhaoliu
tangqi
Time taken: 19.323 seconds, Fetched: 9 row(s)
hive>
如图所示,已经将字段中的大写字母转换为小写。
1.创建为mavn项目
2.jdk用的1.8
3.代码:
java:
package com.huay;
import org.apache.hadoop.hive.ql.exec.UDF;
import java.util.HashSet;
import java.util.Set;
/**
* Created by tang on 2019/01/07
*/
public class Udf_doubleMinSalary extends UDF {
public String evaluate(String a) {
return a+"____udf";
}
public static void main(String[] args) {
//System.out.println(evaluate(6));
}
}
pom:
4.0.0
huay
udf_doubleMinSalary
0.0.1-SNAPSHOT
org.apache.hive
hive-exec
1.1.0
org.apache.hadoop
hadoop-common
2.6.0
junit
junit
4.12
jdk.tools
jdk.tools
1.8
system
${JAVA_HOME}/lib/tools.jar
org.apache.maven.plugins
maven-shade-plugin
2.2
package
shade
*:*
META-INF/*.SF
META-INF/*.DSA
META-INF/*.RSA
4.打包:项目上右键---run as----maven install
5.上传到linux一个目录如:/var/lib/hadoop-hdfs/spride_sqoop_beijing/udf_jar
6.创建udf函数:
add jar /var/lib/hadoop-hdfs/spride_sqoop_beijing/udf_jar/udf_doubleMinSalary-0.0.1-SNAPSHOT.jar;
创建一个临时函数函数名为:doubleMinSalary create temporary function doubleMinSalary as 'com.huay.Udf_doubleMinSalary';
|
udf 创建永久函数:
先把包传到hdfs:
hadoop fs -put /var/lib/hadoop-hdfs/spride_sqoop_beijing/udf_jar/udf_hive2kafka-0.0.1-SNAPSHOT.jar /user/hive/warehouse/ods.db/udf_jar/udf_hive2kafka-0.0.1-SNAPSHOT.jar
然后创建永久函数
CREATE FUNCTION udf_hive2kafka AS 'com.huay.Hive2KakfaUDF'
USING JAR 'hdfs:///user/hive/warehouse/ods.db/udf_jar/udf_hive2kafka-0.0.1-SNAPSHOT.jar';
show functions:
执行sql如;
SELECT g,default.udf_hive2kafka('lienidata001:9092','bobizlist_tzb',collect_list(map(
'bo_id',bo_id,
'full_name', full_name,
'simple_name',simple_name,
'source',source,
'company_id',company_id,
'contact',contact,
'position',position,
'mobile_phone',mobile_phone,
'phone',phone,
'email',email,
'contact_source',contact_source,
'request_host',request_host,
'request_url',request_url,
'insert_time',insert_time
))) AS result
FROM
(
SELECT r1,pmod(ABS(hash(r1)),100) AS g,bo_id,full_name,simple_name,source,company_id,contact,position,mobile_phone,phone,email,contact_source,request_host,request_url,insert_time
FROM dws_bo_final_spider_contact
LIMIT 10000
) tmp
GROUP BY g;
看最后一句 报错:
参考:http://www.k6k4.com/blog/show/aaaxzznf21469153102000
hive低版本不支持 collect_list
=====================================================================================
由于本人家里安装的集群hadoop kafka hive版本都是比较低的,所以走了很多坑,最困扰的如下几个:
报错1:
打包后上传到hdfs创建函数时报错
类找不到(注意路径一定要写对)
提示 没有继承UDF类,这个要检查一下。
打包后上传hdfs,创建函数时报错
hive>
>
> CREATE FUNCTION hive2kafkaSimple AS 'ramos.hive_udf_test.HiveToKakfaSimple'
> USING JAR 'hdfs:///user/hive/warehouse/hive-udf-test-0.0.1.jar';
converting to local hdfs:///user/hive/warehouse/hive-udf-test-0.0.1.jar
Added /tmp/5ef75554-8a71-43f4-b8f4-e413df882e49_resources/hive-udf-test-0.0.1.jar to class path
Added resource: /tmp/5ef75554-8a71-43f4-b8f4-e413df882e49_resources/hive-udf-test-0.0.1.jar
java.lang.UnsupportedClassVersionError: ramos/hive_udf_test/HiveToKakfaSimple : Unsupported major.minor version 52.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.hadoop.hive.ql.exec.FunctionTask.getUdfClass(FunctionTask.java:313)
at org.apache.hadoop.hive.ql.exec.FunctionTask.createPermanentFunction(FunctionTask.java:138)
at org.apache.hadoop.hive.ql.exec.FunctionTask.execute(FunctionTask.java:84)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:155)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1554)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1321)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1139)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:962)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:952)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:269)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:221)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:431)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:800)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:694)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:633)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.FunctionTask. ramos/hive_udf_test/HiveToKakfaSimple : Unsupported major.minor version 52.0
hive> You have new mail in /var/spool/mail/root
参考 :https://www.cnblogs.com/jpfss/p/9036645.html
Unsupported major.minor version 52.0: 看到Unsupported你是不是会想到jdk高版本能兼容低版本,但是低版本不能兼容高版本,不错,猜对了,其实就是这个意思。“本地jdk版本太低,不支持这个jdk1.8编译过的项目运行”。
解决方法 :使用对应的版本编译,如我这里需要用1.7,要用1.7打包。
IDEA
java代码:
package ramos.hive_udf_test;
import com.alibaba.fastjson.JSONObject;
import java.util.HashMap;
import java.util.Map;
import java.util.Properties;
import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;
import kafka.serializer.StringEncoder;
import org.apache.hadoop.hive.ql.exec.UDF;
/**
* Created by tang on 2019/01/07
*/
public class HiveToKakfaSimple extends UDF {
public String evaluate(String zklis,String brokerlis,String topic,String id,String name) {
Producer producer = createProducer(zklis,brokerlis);
Map params = new HashMap();
params.put("id", id);
params.put("name", name);
Object o = JSONObject.toJSON(params);
producer.send(new KeyedMessage(topic,o.toString()));
return o.toString();
}
private static Producer createProducer(String zklis,String brokerlis) {
Properties properties = new Properties();
properties.put("zookeeper.connect", zklis);//声明zk 多个ip逗号分隔
properties.put("serializer.class", StringEncoder.class.getName());
properties.put("metadata.broker.list", brokerlis);// 声明kafka broker
return new Producer(new ProducerConfig(properties));
}
}
pom:
4.0.0
ramos
hive-udf-test
0.0.1-SNAPSHOT
jar
hive-udf-test
http://maven.apache.org
UTF-8
org.apache.hive
hive-exec
1.1.0
org.apache.hadoop
hadoop-common
2.6.0
junit
junit
4.12
junit
junit
3.8.1
test
org.apache.kafka
kafka_2.10
0.8.2.0
com.alibaba
fastjson
1.2.46
org.json
json
20160212
org.apache.poi
poi
3.10-FINAL
org.apache.poi
poi-ooxml
3.10-FINAL
org.apache.maven.plugins
maven-shade-plugin
2.2
package
shade
*:*
META-INF/*.SF
META-INF/*.DSA
META-INF/*.RSA
package打包,上传到linux目录下,然后再上传到 hdfs:
上传hdfs文件:
打开hive client执行:
dfs -put /usr/local/hive/hiveTestJar/hive-udf-test-0.0.1.jar /user/hive/warehouse/hive-udf-test-0.0.1.jar; (后面是hdfs路径)
删除hdfs文件:
hadoop fs -rm -r -skipTrash /user/hive/warehouse/hive-udf-test-0.0.1.jar
创建函数:
CREATE FUNCTION hive2kafkaSimple AS 'ramos.hive_udf_test.HiveToKakfaSimple' USING JAR 'hdfs:///user/hive/warehouse/hive-udf-test-0.0.1.jar';
show functions;执行下看看有没有这个函数了。
使用函数将hive的数据推到kafka:
select default.hive2kafkasimple('sparkproject1:2181','sparkproject1:9092','TestTopic',id,name) from db_hive_edu.student2 limit 10000 ;
select default.hive2kafkasimple('192.168.124.110:2181','192.168.124.110:9092','TestTopic',id,name) from db_hive_edu.student2 limit 10000 ;
select default.hive2kafkasimple('192.168.124.110:2181,192.168.124.111:2181,192.168.124.112:2181','192.168.124.110:9092','TestTopic',id,name) from db_hive_edu.student2 limit 10000 ;
kafka收到消息了:
2019-06-12 02:10:16,485] INFO Accepted socket connection from /192.168.124.112:43697 (org.apache.zookeeper.server.NIOServerCnxn)
[2019-06-12 02:10:16,497] INFO Client attempting to establish new session at /192.168.124.112:43697 (org.apache.zookeeper.server.NIOServerCnxn)
[2019-06-12 02:10:16,508] INFO Established session 0x16b47372e1f000a with negotiated timeout 6000 for client /192.168.124.112:43697 (org.apache.zookeeper.server.NIOServerCnxn)
[2019-06-12 02:10:19,069] INFO Got user-level KeeperException when processing sessionid:0x16b47372e1f0008 type:create cxid:0x26 zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/consumers/console-consumer-31723/owners/TestTopic Error:KeeperErrorCode = NoNode for /consumers/console-consumer-31723/owners/TestTopic (org.apache.zookeeper.server.PrepRequestProcessor)
[2019-06-12 02:10:19,071] INFO Got user-level KeeperException when processing sessionid:0x16b47372e1f0008 type:create cxid:0x27 zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/consumers/console-consumer-31723/owners Error:KeeperErrorCode = NoNode for /consumers/console-consumer-31723/owners (org.apache.zookeeper.server.PrepRequestProcessor)
{"id":"1","name":"zhangsan2"}
{"id":"3","name":"wangwu2"}
{"id":"4","name":"zhaoliu"}
{"id":"3","name":"wangwu2"}
{"id":"4","name":"zhaoliu"}
{"id":"2","name":"lisi2"}
{"id":"1","name":"zhangsan2"}
{"id":"2","name":"lisi2"}
{"id":"5","name":"TANGQI"}