Hive从入门到精通8:Hive自定义函数(UDF)

1.什么是Hive UDF?

Hive中的用户自定义函数(即User Defined Function,简称UDF),是用户对一些列Hive操作进行封装以实现特定的功能的函数。比如:在Hive的UDF中,可以直接使用select语句,对查询结果按照一定的格式输出。

2.Hive UDF如何实现

实现一个Hive UDF需要满足以下两点要求:

  • Hive UDF类需要继承自org.apache.hadoop.hive.sql.exec.UDF父类;
  • 需要实现evaluate函数,evaluate函数支持重载。

3.Hive UDF案例

(1)定义一个UDF类ConcatString :按指定格式拼接字符串

package demo.udf;
import org.apache.hadoop.hive.sql.exec.UDF;
//条件1:继承UDF父类
public class ConcatString extends UDF {
    //条件2:实现evaluate方法
    public String evaluate(String a, String b) {
        return a + "====" + b;
    }
    //可以重载
    public String evaluate(String a, String b, String cat) {
        return a + cat + b; 
    }
}

(2)定义一个UDF类SalaryGrade :判断员工表中的工资级别

package demo.udf;
import org.apache.hadoop.hive.sql.exec.UDF;
//条件1:继承UDF
public class SalaryGrade extends UDF {
    //条件2:实现evaluate方法
    public String evaluate(String salary) {
        int sal = Integer.parseInt(salary.trim());
        if(sal <= 1000){
            return "Grade_A";
        }else if(sal >1000 && sal <= 3000){
            return "Grade_B";
        }else{
            return "Grade_C";
        }
    }
}

4.Hive UDF的部署

(1)把编写好的程序打包放到目标机器上

[root@bigdata ~]# ls /root/input/myudf.jar

(2)进入Hive客户端,添加myudf.jar包到当前环境

hive> add jar /root/input/myudf.jar;

(3)创建临时函数

hive> create temporary function concatString as 'demo.udf.ConcatString';
hive> create temporary function salaryGrade as 'demo.udf.SalaryGrade';

5.Hive UDF的调用

测试数据:

hive> select * from emp_in;
emp_in.empno emp_in.ename emp_in.job emp_in.mgr emp_in.hiredate emp_in.sal emp_in.comm emp_in.deptno
7369 SMITH CLERK 7902 1980/12/17 800 NULL 20
7499 ALLEN SALESMAN 7698 1981/2/20 1600 300 30
7521 WARD SALESMAN 7698 1981/2/22 1250 500 30
7566 JONES MANAGER 7839 1981/4/2 2975 NULL 20
7654 MARTIN SALESMAN 7698 1981/9/28 1250 1400 30
7698 BLAKE MANAGER 7839 1981/5/1 2850 NULL 30
7782 CLARK MANAGER 7839 1981/6/9 2450 NULL 10
7788 SCOTT ANALYST 7566 1987/4/19 3000 NULL 20
7839 KING PRESIDENT NULL 1981/11/17 5000 NULL 10
7844 TURNER SALESMAN 7698 1981/9/8 1500 0 30
7876 ADAMS CLERK 7788 1987/5/23 1100 NULL 20
7900 JAMES CLERK 7698 1981/12/3 950 NULL 30
7902 FORD ANALYST 7566 1981/12/3 3000 NULL 20
7934 MILLER CLERK 7782 1982/1/23 1300 NULL 10

(1)查询HQL语句

连接两个字段:默认连接符用“====”

hive> select concatString(ename, job) from emp_in;
SMITH====CLERK
ALLEN====SALESMAN
WARD====SALESMAN
JONES====MANAGER
MARTIN====SALESMAN
BLAKE====MANAGER
CLARK====MANAGER
SCOTT====ANALYST
KING====PRESIDENT
TURNER====SALESMAN
ADAMS====CLERK
JAMES====CLERK
FORD====ANALYST
MILLER====CLERK

连接两个字段:自定义连接符

hive> select concatString(ename, job, ''s job is ') from emp_in;
SMITH's job is CLERK
ALLEN's job is SALESMAN
WARD's job is SALESMAN
JONES's job is MANAGER
MARTIN's job is SALESMAN
BLAKE's job is MANAGER
CLARK's job is MANAGER
SCOTT's job is ANALYST
KING's job is PRESIDENT
TURNER's job is SALESMAN
ADAMS's job is CLERK
JAMES's job is CLERK
FORD's job is ANALYST
MILLER's job is CLERK

hive> select ename,sal,salaryGrade(sal) salarylevel from emp_in;
ename sal salarylevel
SMITH 800 Grade_A
ALLEN 1600 Grade_B
WARD 1250 Grade_B
JONES 2975 Grade_B
MARTIN 1250 Grade_B
BLAKE 2850 Grade_B
CLARK 2450 Grade_B
SCOTT 3000 Grade_B
KING 5000 Grade_C
TURNER 1500 Grade_B
ADAMS 1100 Grade_B
JAMES 950 Grade_A
FORD 3000 Grade_B
MILLER 1300 Grade_B

(2)销毁临时函数

hive> drop temporary function concatString;
hive> drop temporary function salaryGrade;
hive (default)> select ename,sal,salaryGrade(sal) salaryLevel from emp_in;
FAILED: SemanticException [Error 10011]: Invalid function salaryGrade
hive (default)> select concatString(ename, job, '****') from emp_in;
FAILED: SemanticException [Error 10011]: Invalid function concatString

你可能感兴趣的:(Hive从入门到精通8:Hive自定义函数(UDF))