备注:
Hive 版本 2.1.1
Hive 2.0版本开始,已经有了Hive存储过程的解决方案(HPL/SQL –Procedural SQL on hadoop)。该解决方案不仅支持Hive,还支持在SparkSQL,其他NoSQL,甚至是RDBMS中使用类似于Oracle PL/SQL的功能,这将极大的方便数据开发者的工作,Hive中很多之前比较难实现的功能,现在可以很方便的实现,比如自定义变量、基于一个结果集的游标、循环等等。
测试hplsql命令:
[root@hp1 sql]# find / -name *hplsql*
/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hive/lib/hive-hplsql-2.1.1-cdh6.3.1.jar
/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hive/lib/hive-hplsql.jar
/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hive/bin/ext/hplsql.sh
/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hive/bin/hplsql
/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hive/bin/hplsql.cmd
/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/hive-hplsql-2.1.1-cdh6.3.1.jar
[root@hp1 sql]#
[root@hp1 sql]# /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hive/bin/hplsql -e "CURRENT_DATE + 1"
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2020-12-18
为什么选择HPL/SQL
Hadoop在数据仓库中的作用是巨大的。但是要实现全面的ETL、报告、分析和数据挖掘过程,您不仅需要分布式处理引擎(如MapReduce、Spark或Tez),还需要一种表达全面业务规则的方法。
业务逻辑驱动和高级错误处理
HPL/SQL允许您使用变量、表达式、控制流语句和迭代来实现业务逻辑。HPL/SQL支持使用异常和条件处理程序进行错误处理。您可以开发管理和控制分布式进程的程序,但同时不会成为系统的瓶颈。
使hadoop上的sql更加动态
HPL/SQL的关键特性之一是它允许您使SQL更加动态。您可以使用高级表达式、各种内置函数、条件来根据用户配置、以前查询的结果、来自文件或非hadoop数据源的数据等动态生成SQL。
3.利用现有的过程SQL技能
传统的数据库管理系统提供广泛用于实现高级数据操作场景和工作流的过程性SQL语言。这种方法对于数据库开发人员和数据分析人员来说既简单又熟悉。
与Python、Java或Linux shell脚本相比,HPL/SQL使Hadoop能够为BI分析师和开发人员提供更广泛的受众。
ETL框架
HPL/SQL提供函数和语句,使典型的ETL开发效率更高。
可读性和可维护性
与Bash脚本、Java、Python或Scala程序相比,HPL/SQL对于BI/SQL开发人员来说简洁、可读和可维护。
集成和多语言持久性
Hadoop扩展了使用RDBMS产品构建的传统数据仓库。这意味着你必须集成多个系统,包括Hadoop、RDBMS、NoSQL和其他系统。
HPL/SQL允许您在一个脚本中处理多个系统,因此您可以针对不同类型的工作负载采用最好的方法,并轻松地集成它们。
兼容性和迁移
HPL/SQL试图尽可能地支持所有广泛使用的过程语言的语法。您不需要从头开始学习一种新的过程化语言。这有助于开发新代码以及将现有代码库迁移到Hadoop。
Hadoop快速启动
HPL/SQL提供了开始使用Hadoop的最快方式。之后,您可以使用Spark、Tez、Storm、Flink和其他框架重新设计和实现高级数据处理工作流,但现在您可以使用您当前的技能和现有的代码在Hadoop上运行业务逻辑。
[root@hp1 ~]# more b.sql
select count(*) from test.ods_fact_sale;
执行b.sql
[root@hp1 ~]# hplsql -f b.sql
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Open connection: jdbc:hive2://localhost:10000 (248 ms)
Starting query
Query executed successfully (444.87 sec)
767830000
测试一个函数和for 循环的例子
代码:
CREATE FUNCTION hello(text STRING)
RETURNS STRING
BEGIN
RETURN 'Hello, ' || text || '!';
END;
FOR item IN (
SELECT empno,ename FROM test.emp limit 10
)
LOOP
PRINT item.empno || '|' || item.ename || '|' || hello(item.empno);
END LOOP;
测试记录:
[root@hp1 ~]# hplsql -f a.sql
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Open connection: jdbc:hive2://localhost:10000 (691 ms)
Starting query
Query executed successfully (639 ms)
7369|smith|Hello, 7369!
7499|allen|Hello, 7499!
7876|adams|Hello, 7876!
7900|james|Hello, 7900!
7902|ford|Hello, 7902!
7934|miller|Hello, 7934!
7521|ward|Hello, 7521!
7566|jones|Hello, 7566!
7654|martin|Hello, 7654!
7698|blake|Hello, 7698!
[root@hp1 ~]#
[root@hp1 ~]# more set_message.sql
CREATE PROCEDURE set_message(IN name STRING, OUT result STRING)
BEGIN
print 'name='||name
SET result = 'Hello, ' || name || '!';
END;
[root@hp1 ~]# more include.sql
INCLUDE set_message.sql
DECLARE str STRING;
CALL set_message('world', str);
PRINT str;
[root@hp1 ~]#
测试记录:
[root@hp1 ~]# hplsql -f include.sql
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
name=world
Hello, world!
[root@hp1 ~]#
代码:
create or replace package users as
session_count int := 0;
function get_count() return int;
procedure add(name varchar(100));
end;
create or replace package body users as
function get_count() return int
is
begin
return session_count;
end;
procedure add(name varchar(100))
is
begin
-- ...
session_count = session_count + 1;end;
end;
users.add('John');
users.add('Sarah');
users.add('Paul');
print 'Number of users: ' || users.get_count();
测试记录:
create or replace package users as
session_count int := 0;
function get_count() return int;
procedure add(name varchar(100));
end;
create or replace package body users as
function get_count() return int
is
begin
return session_count;
end;
procedure add(name varchar(100))
is
begin
-- ...
session_count = session_count + 1;end;
end;
users.add('John');
users.add('Sarah');
users.add('Paul');
print 'Number of users: ' || users.get_count();
代码:
CREATE PROCEDURE spOpenIssues
DYNAMIC RESULT SETS 1
BEGIN
DECLARE cur CURSOR WITH RETURN FOR
SELECT empno,ename FROM test.emp;
OPEN cur;
END;
DECLARE id INT;
DECLARE name VARCHAR(30);
CALL spOpenIssues;
ALLOCATE c1 CURSOR FOR PROCEDURE spOpenIssues;
FETCH c1 INTO id, name;
WHILE (SQLCODE = 0)
DO
PRINT id || ' - ' || name;
FETCH c1 INTO id, name;
END WHILE;
CLOSE c1;
测试记录:
[root@hp1 ~]# hplsql -f cursor_test.sql
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Open connection: jdbc:hive2://localhost:10000 (255 ms)
Starting query
Query executed successfully (102 ms)
7369 - smith
7499 - allen
7876 - adams
7900 - james
7902 - ford
7934 - miller
7521 - ward
7566 - jones
7654 - martin
7698 - blake
7782 - clark
7788 - scott
7839 - king
7844 - turner
用我自己plsql习惯的方式来写游标
代码:
vim c.sql
cursor emp_cur is
select empno,ename from test.emp;
for rc in emp_cur loop
dbms_output.put_line(rc.empno||'-'||rc.ename);
end loop;
测试记录:
[root@hp1 ~]# hplsql -f c.sql
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Open connection: jdbc:hive2://localhost:10000 (250 ms)
Starting query
Query executed successfully (111 ms)
7369 smith
7499 allen
7876 adams
7900 james
7902 ford
7934 miller
7521 ward
7566 jones
7654 martin
7698 blake
7782 clark
7788 scott
7839 king
7844 turner
-
[root@hp1 ~]#
最后多了一个’-’ 好奇怪,而且前面的输出都没有连接符
替换成如下就没问题了
for rc in (select empno,ename from test.emp) loop
dbms_output.put_line(rc.empno||'-'||rc.ename);
end loop;
测试记录:
[root@hp1 ~]# hplsql -f c.sql
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Open connection: jdbc:hive2://localhost:10000 (235 ms)
Starting query
Query executed successfully (97 ms)
7369-smith
7499-allen
7876-adams
7900-james
7902-ford
7934-miller
7521-ward
7566-jones
7654-martin
7698-blake
7782-clark
7788-scott
7839-king
7844-turner
[root@hp1 ~]#
1.http://www.hplsql.org/home