hive:自定义函数UDF-其他实例参考

官网实例 : 

hive:自定义函数UDF-其他实例参考_第1张图片

hive:自定义函数UDF-其他实例参考_第2张图片

hive:自定义函数UDF-其他实例参考_第3张图片

 

实例1:自定义一个 大写转小写函数

第一步:

idea创建maven项目,并在pom中添加依赖:


  4.0.0

  ramos
  hive-udf-test
  0.0.1-SNAPSHOT
  jar

  hive-udf-test
  http://maven.apache.org

  
    UTF-8
      2.6.0
      1.1.0
  


  
    
      junit
      junit
      3.8.1
      test
    
    
            org.apache.hive
            hive-exec
            ${hive.version}
    
    
            org.apache.hive
            hive-jdbc
            ${hive.version}
    
    
            org.apache.hadoop
            hadoop-common
            ${hadoop.version}
    
        
  

    
        
            
                org.apache.maven.plugins
                maven-compiler-plugin
                
                    1.7
                    1.7
                
            
        
    

代码:

package ramos.hive_udf_test;

import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;

/**
 * LowerUDF
 * 不可以返回void,可以为null。
 *
 */
public class LowerUDF extends UDF
{

    public Text evaluate(Text str){

        //validate
        if(null == str.toString()){
            return null;
        }

        //lower
        return new Text(str.toString().toLowerCase());

    }

	
	
	public static void main( String[] args )
    {
//        System.out.println( "Hello World!" );

        System.out.println(new LowerUDF().evaluate(new Text("HIVE")));



    }
}

 

 

 

打包:

hive:自定义函数UDF-其他实例参考_第4张图片

上传jar到linux:

[root@sparkproject1 hiveTestJar]# ll
total 4
-rw-r--r-- 1 root root 3092 Jun  9  2019 hive-udf-test-0.0.1-SNAPSHOT.jar
[root@sparkproject1 hiveTestJar]# 
[root@sparkproject1 hiveTestJar]# 
[root@sparkproject1 hiveTestJar]# pwd
/usr/local/hive/hiveTestJar
[root@sparkproject1 hiveTestJar]# 

hive中 添加jar:

add jar /usr/local/hive/hiveTestJar/hive-udf-test-0.0.1-SNAPSHOT.jar;

create temporary function my_lower as "ramos.hive_udf_test.LowerUDF";

hive> add jar /usr/local/hive/hiveTestJar/hive-udf-test-0.0.1-SNAPSHOT.jar;
Usage: add [FILE|JAR|ARCHIVE]  []*
Query returned non-zero code: 1, cause: null
hive> 

如上问题,我将路径写为linux本地路径便无法添加jar包,将jar上传到hdfs,然后使用hdfs的路径就可以。

参考 :https://blog.csdn.net/u011495642/article/details/84327256

hive> dfs -put /usr/local/hive/hiveTestJar/hive-udf-test-0.0.1-SNAPSHOT.jar /user/hive/warehouse/hive-udf-test-0.0.1-SNAPSHOT.jar;
hive> 

将har文件 上传到hdfs(我试了本地的路径一直报错,注意目录一定要正确!!)

dfs -put /usr/local/hive/hiveTestJar/hive-udf-test-0.0.1-SNAPSHOT.jar /user/hive/warehouse/hive-udf-test-0.0.1-SNAPSHOT.jar; 

hive:自定义函数UDF-其他实例参考_第5张图片

创建并注册函数:

例1

create function my_lower as'ramos.hive_udf_test.LowerUDF' using jar 'hdfs:///user/hive/warehouse/hive-udf-test-0.0.1-SNAPSHOT.jar';

例2

hive> 
    > 
    > create function hive2kafka as'ramos.hive_udf_test.Hive2Kafka' using jar 'hdfs:///user/hive/warehouse/hive-udf-test-0.0.1.jar';         
converting to local hdfs:///user/hive/warehouse/hive-udf-test-0.0.1.jar
Added /tmp/a0a81812-1c55-4ee5-921d-c118691ef134_resources/hive-udf-test-0.0.1.jar to class path
Added resource: /tmp/a0a81812-1c55-4ee5-921d-c118691ef134_resources/hive-udf-test-0.0.1.jar
OK
Time taken: 2.576 seconds
hive> 
hive> dfs -put /usr/local/hive/hiveTestJar/hive-udf-test-0.0.1-SNAPSHOT.jar /user/hive/warehouse/test-jar/hive-udf-test-0.0.1-SNAPSHOT.jar;
put: Parent path is not a directory: /user/hive/warehouse/test-jar test-jar
Command failed with exit code = 1
Query returned non-zero code: 1, cause: null
hive> dfs -put /usr/local/hive/hiveTestJar/hive-udf-test-0.0.1-SNAPSHOT.jar /user/hive/warehouse/hive-udf-test-0.0.1-SNAPSHOT.jar;         
hive> create function my_lower as'ramos.hive_udf_test.LowerUDF' using jar '/user/hive/warehouse/hive-udf-test-0.0.1-SNAPSHOT.jar';         
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask. Hive warehouse is non-local, but /user/hive/warehouse/hive-udf-test-0.0.1-SNAPSHOT.jar specifies file on local filesystem. Resources on non-local warehouse should specify a non-local scheme/path
hive> 
    > 
    > 
    > create function my_lower as'ramos.hive_udf_test.LowerUDF' using jar 'hdfs:///user/hive/warehouse/hive-udf-test-0.0.1-SNAPSHOT.jar';
converting to local hdfs:///user/hive/warehouse/hive-udf-test-0.0.1-SNAPSHOT.jar
Added /tmp/1214f72c-84fd-42f0-a04f-14dae9ae003e_resources/hive-udf-test-0.0.1-SNAPSHOT.jar to class path
Added resource: /tmp/1214f72c-84fd-42f0-a04f-14dae9ae003e_resources/hive-udf-test-0.0.1-SNAPSHOT.jar
OK
Time taken: 0.456 seconds
hive> 

查看是否有了自定义的函数:

show functions

hive:自定义函数UDF-其他实例参考_第6张图片

hive:自定义函数UDF-其他实例参考_第7张图片

使用自定义的函数:

hive:自定义函数UDF-其他实例参考_第8张图片

hive> select default.my_lower(name) lowername from student2; 
FAILED: SemanticException [Error 10001]: Line 1:45 Table not found 'student2'
hive> select default.my_lower(name) lowername from db_hive_edu.student2;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1559981808383_0012, Tracking URL = http://sparkproject1:8088/proxy/application_1559981808383_0012/
Kill Command = /usr/local/hadoop/bin/hadoop job  -kill job_1559981808383_0012
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-06-09 05:44:34,951 Stage-1 map = 0%,  reduce = 0%
2019-06-09 05:44:42,765 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.75 sec
MapReduce Total cumulative CPU time: 1 seconds 750 msec
Ended Job = job_1559981808383_0012
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 1.75 sec   HDFS Read: 431 HDFS Write: 71 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 750 msec
OK
zhangsan2
lisi2
wangwu2
zhaoliu
zhangsan2
lisi2
wangwu2
zhaoliu
tangqi
Time taken: 19.323 seconds, Fetched: 9 row(s)
hive> 

如图所示,已经将字段中的大写字母转换为小写。

 

 

实例2:创建udf,将hive数据推到kafka

1.创建为mavn项目

2.jdk用的1.8

3.代码:

java:

package com.huay;

import org.apache.hadoop.hive.ql.exec.UDF;

import java.util.HashSet;
import java.util.Set;

/**
 * Created by tang on 2019/01/07
 */
public class Udf_doubleMinSalary extends UDF {

    public String evaluate(String a) {

        return a+"____udf";
    }

    public static void main(String[] args) {
		//System.out.println(evaluate(6));
	}
}

pom:


  4.0.0
  huay
  udf_doubleMinSalary
  0.0.1-SNAPSHOT
  
  
        
            org.apache.hive
            hive-exec
            1.1.0
        
        
            org.apache.hadoop
            hadoop-common
            2.6.0
        
        
            junit
            junit
            4.12
        
        
		    jdk.tools
		    jdk.tools
		    1.8
		    system
		    ${JAVA_HOME}/lib/tools.jar
		
    
  
        
            
                org.apache.maven.plugins
                maven-shade-plugin
                2.2
                
                    
                        package
                        
                            shade
                        
                        
                            
                                
                                    *:*
                                    
                                        META-INF/*.SF
                                        META-INF/*.DSA
                                        META-INF/*.RSA
                                    
                                
                            
                        
                    
                
            
        
    
  

 

4.打包:项目上右键---run as----maven install

5.上传到linux一个目录如:/var/lib/hadoop-hdfs/spride_sqoop_beijing/udf_jar

6.创建udf函数:

add jar /var/lib/hadoop-hdfs/spride_sqoop_beijing/udf_jar/udf_doubleMinSalary-0.0.1-SNAPSHOT.jar;

 

创建一个临时函数函数名为:doubleMinSalary

create temporary function doubleMinSalary as 'com.huay.Udf_doubleMinSalary';

 

 

 

udf 创建永久函数:

先把包传到hdfs:
hadoop fs -put /var/lib/hadoop-hdfs/spride_sqoop_beijing/udf_jar/udf_hive2kafka-0.0.1-SNAPSHOT.jar /user/hive/warehouse/ods.db/udf_jar/udf_hive2kafka-0.0.1-SNAPSHOT.jar

然后创建永久函数
CREATE FUNCTION udf_hive2kafka  AS 'com.huay.Hive2KakfaUDF'
 USING JAR 'hdfs:///user/hive/warehouse/ods.db/udf_jar/udf_hive2kafka-0.0.1-SNAPSHOT.jar';

show functions:

hive:自定义函数UDF-其他实例参考_第9张图片

执行sql如;

SELECT g,default.udf_hive2kafka('lienidata001:9092','bobizlist_tzb',collect_list(map(
    'bo_id',bo_id,
    'full_name', full_name,
    'simple_name',simple_name,
    'source',source,
    'company_id',company_id,
    'contact',contact,
    'position',position,
    'mobile_phone',mobile_phone,
    'phone',phone,
    'email',email,
    'contact_source',contact_source,
    'request_host',request_host,
    'request_url',request_url,
    'insert_time',insert_time
    ))) AS result
FROM 
(
SELECT r1,pmod(ABS(hash(r1)),100) AS g,bo_id,full_name,simple_name,source,company_id,contact,position,mobile_phone,phone,email,contact_source,request_host,request_url,insert_time
FROM dws_bo_final_spider_contact
LIMIT 10000
) tmp
GROUP BY g;

 

看最后一句 报错:

参考:http://www.k6k4.com/blog/show/aaaxzznf21469153102000

hive低版本不支持 collect_list

=====================================================================================

再次测试:

由于本人家里安装的集群hadoop kafka hive版本都是比较低的,所以走了很多坑,最困扰的如下几个:

报错1:

打包后上传到hdfs创建函数时报错

类找不到(注意路径一定要写对)

提示 没有继承UDF类,这个要检查一下。

打包后上传hdfs,创建函数时报错

hive> 
    > 
    > CREATE FUNCTION hive2kafkaSimple  AS 'ramos.hive_udf_test.HiveToKakfaSimple'
    >  USING JAR 'hdfs:///user/hive/warehouse/hive-udf-test-0.0.1.jar';         
converting to local hdfs:///user/hive/warehouse/hive-udf-test-0.0.1.jar
Added /tmp/5ef75554-8a71-43f4-b8f4-e413df882e49_resources/hive-udf-test-0.0.1.jar to class path
Added resource: /tmp/5ef75554-8a71-43f4-b8f4-e413df882e49_resources/hive-udf-test-0.0.1.jar
java.lang.UnsupportedClassVersionError: ramos/hive_udf_test/HiveToKakfaSimple : Unsupported major.minor version 52.0
	at java.lang.ClassLoader.defineClass1(Native Method)
	at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
	at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
	at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:270)
	at org.apache.hadoop.hive.ql.exec.FunctionTask.getUdfClass(FunctionTask.java:313)
	at org.apache.hadoop.hive.ql.exec.FunctionTask.createPermanentFunction(FunctionTask.java:138)
	at org.apache.hadoop.hive.ql.exec.FunctionTask.execute(FunctionTask.java:84)
	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:155)
	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
	at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1554)
	at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1321)
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1139)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:962)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:952)
	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:269)
	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:221)
	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:431)
	at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:800)
	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:694)
	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:633)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.FunctionTask. ramos/hive_udf_test/HiveToKakfaSimple : Unsupported major.minor version 52.0
hive> You have new mail in /var/spool/mail/root

参考 :https://www.cnblogs.com/jpfss/p/9036645.html

Unsupported major.minor version 52.0: 看到Unsupported你是不是会想到jdk高版本能兼容低版本,但是低版本不能兼容高版本,不错,猜对了,其实就是这个意思。“本地jdk版本太低,不支持这个jdk1.8编译过的项目运行”。

解决方法 :使用对应的版本编译,如我这里需要用1.7,要用1.7打包。

 

下面是具体实例和使用 方法:

IDEA

java代码:

package ramos.hive_udf_test;

import com.alibaba.fastjson.JSONObject;
import java.util.HashMap;
import java.util.Map;
import java.util.Properties;
import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;
import kafka.serializer.StringEncoder;
import org.apache.hadoop.hive.ql.exec.UDF;

/**
 * Created by tang on 2019/01/07
 */

public class HiveToKakfaSimple extends UDF {

    public String evaluate(String zklis,String brokerlis,String topic,String id,String name) {
        Producer producer = createProducer(zklis,brokerlis);
        Map params = new HashMap();
        params.put("id", id);
        params.put("name", name);

        Object o = JSONObject.toJSON(params);
        producer.send(new KeyedMessage(topic,o.toString()));
        return o.toString();
    }


    private static Producer createProducer(String zklis,String brokerlis) {
        Properties properties = new Properties();
        properties.put("zookeeper.connect", zklis);//声明zk  多个ip逗号分隔
        properties.put("serializer.class", StringEncoder.class.getName());
        properties.put("metadata.broker.list", brokerlis);// 声明kafka broker
        return new Producer(new ProducerConfig(properties));
    }
}

pom: 


  4.0.0

  ramos
  hive-udf-test
  0.0.1-SNAPSHOT
  jar

  hive-udf-test
  http://maven.apache.org

    
        UTF-8
    
    
        
            org.apache.hive
            hive-exec
            1.1.0
        
        
            org.apache.hadoop
            hadoop-common
            2.6.0
        
        
            junit
            junit
            4.12
        
        
            junit
            junit
            3.8.1
            test
        
        
            org.apache.kafka
            kafka_2.10
            0.8.2.0
        
        
            com.alibaba
            fastjson
            1.2.46
        
        
            org.json
            json
            20160212
        
        
            org.apache.poi
            poi
            3.10-FINAL
        
        
            org.apache.poi
            poi-ooxml
            3.10-FINAL
        
    
    
        
            
                org.apache.maven.plugins
                maven-shade-plugin
                2.2
                
                    
                        package
                        
                            shade
                        
                        
                            
                                
                                    *:*
                                    
                                        META-INF/*.SF
                                        META-INF/*.DSA
                                        META-INF/*.RSA
                                    
                                
                            
                        
                    
                
            
        
    

package打包,上传到linux目录下,然后再上传到 hdfs:

上传hdfs文件:

打开hive client执行:

dfs -put /usr/local/hive/hiveTestJar/hive-udf-test-0.0.1.jar /user/hive/warehouse/hive-udf-test-0.0.1.jar;   (后面是hdfs路径)

删除hdfs文件:

hadoop fs -rm -r -skipTrash /user/hive/warehouse/hive-udf-test-0.0.1.jar
 

创建函数:

CREATE FUNCTION hive2kafkaSimple  AS 'ramos.hive_udf_test.HiveToKakfaSimple' USING JAR 'hdfs:///user/hive/warehouse/hive-udf-test-0.0.1.jar';

show functions;执行下看看有没有这个函数了。

使用函数将hive的数据推到kafka:


select default.hive2kafkasimple('sparkproject1:2181','sparkproject1:9092','TestTopic',id,name)  from db_hive_edu.student2 limit 10000 ;

select default.hive2kafkasimple('192.168.124.110:2181','192.168.124.110:9092','TestTopic',id,name)  from db_hive_edu.student2 limit 10000 ;

select default.hive2kafkasimple('192.168.124.110:2181,192.168.124.111:2181,192.168.124.112:2181','192.168.124.110:9092','TestTopic',id,name)  from db_hive_edu.student2 limit 10000 ;

 

hive:自定义函数UDF-其他实例参考_第10张图片 hive:自定义函数UDF-其他实例参考_第11张图片

 

kafka收到消息了:

2019-06-12 02:10:16,485] INFO Accepted socket connection from /192.168.124.112:43697 (org.apache.zookeeper.server.NIOServerCnxn)
[2019-06-12 02:10:16,497] INFO Client attempting to establish new session at /192.168.124.112:43697 (org.apache.zookeeper.server.NIOServerCnxn)
[2019-06-12 02:10:16,508] INFO Established session 0x16b47372e1f000a with negotiated timeout 6000 for client /192.168.124.112:43697 (org.apache.zookeeper.server.NIOServerCnxn)
[2019-06-12 02:10:19,069] INFO Got user-level KeeperException when processing sessionid:0x16b47372e1f0008 type:create cxid:0x26 zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/consumers/console-consumer-31723/owners/TestTopic Error:KeeperErrorCode = NoNode for /consumers/console-consumer-31723/owners/TestTopic (org.apache.zookeeper.server.PrepRequestProcessor)
[2019-06-12 02:10:19,071] INFO Got user-level KeeperException when processing sessionid:0x16b47372e1f0008 type:create cxid:0x27 zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/consumers/console-consumer-31723/owners Error:KeeperErrorCode = NoNode for /consumers/console-consumer-31723/owners (org.apache.zookeeper.server.PrepRequestProcessor)
{"id":"1","name":"zhangsan2"}
{"id":"3","name":"wangwu2"}
{"id":"4","name":"zhaoliu"}
{"id":"3","name":"wangwu2"}
{"id":"4","name":"zhaoliu"}
{"id":"2","name":"lisi2"}
{"id":"1","name":"zhangsan2"}
{"id":"2","name":"lisi2"}
{"id":"5","name":"TANGQI"}

 

 

 

 

 

你可能感兴趣的:(hive)