hive之hplsql学习

最近需要搞这方面的工作,首先百度了一下,发现重复率太高,于是自己去官网查看了,附上官网地址

http://www.hplsql.org/doc

这个人应该最开始发表文章的,写的不错

http://lxw1234.com/archives/category/hive

hive之hplsql学习_第1张图片

这段话翻译一下 注意时间是2016年2月就有了

1、hive2.0.0 实现了hpl/sql功能,实际上就是类似于mysql、oracle的存储过程

2、hpl/sql很牛b

一、commandline

hplsql -e 'query' | -f file 
      [-main procname]
      [-d | --define | -hiveconf | -hivevar var=value ...] 
      [-version | --version]
      [-trace | --trace]
      [-H | --help]

Parameters:

Parameter Description HPL/SQL Version
-e 'query' SQL statements to execute 0.1
-f file Execute SQL statements from file 0.1
-main procname Entry point (procedure or function name) 0.3.7
-d var=value Variable definition 0.1
--define var=value 0.1
-hiveconf var=value 0.1
-hivevar var=value 0.1
-version --version Print HPL/SQL version 0.1
-trace --trace Print trace information 0.1
-H --help Print help information and exit 0.1

hive之hplsql学习_第2张图片

进入hive的bin安装目录就可以看到hplsql这个脚本,版本是cdh6.3.0,HPL/SQL 0.3.31

Notes:

  • -e and -f cannot be specified together

  • if -main option is not specified, HPL/SQL starts executing all statements from the beginning of the script

  • You can use single 'query' and double “query” quotes with -e option

  • Currently -d, --define, -hivevar and -hiveconf are equivalent and allow you to define input variables.

翻译下

1、-e 和-f 不能同时使用

2、如果-main选项没有指定,hplsql也就是你写的存储过程将执行所有的内容

3、-e的时候,你可以使用单引号和双引号

4、-d, --define, -hivevar 和-hiveconf都可以用来传递参数

Example 1:

Executing HPL/SQL statements from a script:

hplsql -f script.sql 

Example 2:

Executing HPL/SQL statements from command line:

hplsql -e "NVL(MAX_PARTITION_DATE(db.sales, local_dt, code='A'), CURRENT_DATE)" 

这句话暂时还不太理解,这个函数目前没看到,而且里面的参数也不明白,这个官方示例还不太懂

Example 3:

Using variables:

hplsql -e "PRINT a || ', ' || b" -d a=Hello -d b=world 

Result:

Hello, world

就目前来说./hplsql 和./hive的功能基本一样

——————————————————————————————————————————————————————

下面主要学习存储过程基础、写法和使用

Operators

  • Addition +

  • Subtraction -

  • Equality and Comparison =, <>, !=, <, >, <=, >=

  • String Concatenation ||  这个是字符串连接符

 

操作符

Expression Result Result Type
3 + 1 4 Integer
'Ab' + 'c' 'Abc' String
DATE '2014-12-31' + 1 DATE '2015-01-01' DATE
3 = 3 True Boolean
3 = 1 False Boolean
'CA' = NULL NULL Boolean
3 == 3 True Boolean
3 <> 3 False Boolean
3 != 3 False Boolean
3 > 1 True Boolean
3 < 1 False Boolean
3 >= 1 True Boolean
3 <= 1 False Boolean
'a' || 'b' || 'c' 'abc' String
'a' || 1 || 'c' 'a1c' String
'a' || NULL || 'c' 'ac' String
NULL || NULL NULL String

 许多示例使用

./hplsql -e "PRINT a || ', ' || b" --hivevar a=Hello --hivevar b=lxw1234.com

也没人说使用原因,hplsql中 a b是字符串,然后用||连接符连接 ', '注意这个,是字符串,所以一定要引起来

使用位置:

比如我现在用hplsql写个脚本,每天把a表的数据插入到b表

vim test.sh

hive -e 'insert into tableA as select * from  impala.'||tableB||' limit 10';

—————————————————————————————————————————————————————————

Attributes and Built-in Variables

HPL/SQL attributes and built-in variables:

ACTIVITY_COUNT Number of rows affected by last SQL statement //是否有数据,有为1没有为0
ERRORCODE Return code of the last SQL statement //上句sql写的对不对 0正确,正数warning 负数error
HOSTCODE Return code of the last OS command  //
SQLCODE Return code of the last SQL statement  //0正确,正数warning 负数error
SQLSTATE Return status of the last SQL statement //算了感觉都差不多

 

这个的具体作用主要就是判断上一句sql 的查询插入删除等情况是否正常运行,从而来输出日志,同时判断下一步走向

————————————————————————————————————————————————————————

 

Statements

HPL/SQL statements:

ALLOCATE CURSOR Allocate cursor for procedure result set
ASSOCIATE RESULT SET LOCATOR Define locators for procedure result sets
BREAK Exit a loop
CALL Execute a stored procedure
CLOSE Close a cursor
CMP Compare data in tables
COPY Copy data between tables and files
COPY FROM FTP Copy FTP files to Hadoop compatible file system
COPY FROM LOCAL Copy local files to Hadoop compatible file system
CREATE DATABASE Create a database
CREATE FUNCTION Create a user-defined SQL function
CREATE LOCAL TEMPORARY TABLE Create a session-level temporary table
CREATE PACKAGE Create a program package
CREATE PROCEDURE Create a user-defined SQL procedure
CREATE TABLE Create a table
CREATE VOLATILE TABLE Create a session-level temporary table
DECLARE Declare a variable
DECLARE CONDITION Declare a condition
DECLARE CURSOR Declare a cursor
DECLARE HANDLER Declare a condition handler
DECLARE TEMPORARY TABLE Declare a temporary table
DESCRIBE Describe a database object
DROP DATABASE Drop a database
DROP TABLE Drop a table
EXEC EXECUTE EXECUTE IMMEDIATE Execute a dynamic SQL statement or procedure
EXIT WHEN Exit a loop
FETCH Fetch the next row from a cursor
FOR cursor FOR statement (Cursor loop)
FOR range FOR statement (Integer range)
GET DIAGNOSTICS Get execution information
HOST Execute an OS command or run an external process
IF IF statement
INCLUDE Include statements from another script
INSERT INSERT statement
INSERT DIRECTORY Write query results to a file
LEAVE Exit a loop
OPEN Open a cursor
LOOP Unconditional loop
MAP OBJECT Map object name to a connection profile
NULL No operation (no-op) statement
PRINT Print a line
RESIGNAL Resignal the exception
RETURN Return from a routine
SELECT SELECT statement
SELECT INTO Assign values from a query
SIGNAL Raise a condition or exception
SUMMARY Summary stats for a table or result set
SET Assign a value to a variable
SET Session Option Set a session option
TRUNCATE Truncate a table
UPDATE UPDATE statement
USE Change the default database
VALUES INTO Assign a value to a variable
WHILE While loop

这个是存储过程中的语法,以前看过oracle的存储过程感觉挺像的。

CREATE PROCEDURE set_message(IN arg STRING)
BEGIN
 SET result = 'Hello, lxw的大数据田地!';
 print arg;
 PRINT result || ', ' || arg;
END;
 
PRINT 'Hello, lxw1234.com! at beginning .. ';
CALL set_message(' by CALL ..');

借用文章开头那位大神的例子说明下(大神觉得简单,基本没介绍)

首先CREATE PROCEDURE 声明了一个存储过程,其实类似与java里的方法一样,调用的方式是call procedure,这个就相当于main方法。(中文可能有乱码,无所谓)

第一种 hplsql -f xxx.sql

          该方法会执行print 和call set_message(' by CALL ..');

          输出Hello, lxw1234.com! at beginning ..

                 by CALL ..

                 Hello, lxw的大数据田地! , by CALL ..

第二种 hplsql -f xxx.sql -main set_message

          直接调用存储过程 call set_message(null)

          输出 null

                 Hello, lxw的大数据田地! ,

 这里调用的时候是没有参数的,这个时候我就在想肯定不行吧,一般作为脚本肯定需要时间传递

那么肯定有解决办法:内置的时间参数类似与${YYYY-mm-dd} 或者可以传入时间参数类似 call set_message(2020-04-13)

1、使用-d参数传递

./hplsql -f cc_test/hplsql_test -d yyyy=2020

脚本里的print yyyy 就会在屏幕上输出 2020

2、内置时间参数暂时还没找到。。。。。。。。。。。。。。

一些简单的使用demo

hplsql的使用demo

查询数据库里的内容

[devuser@node04 ~/hplsql/hplsql-0.3.31/cc_test_hplsql]$ cat  select.sql 
select * from impala.impala_test_middle limit 3 --hive的库名和表名

[devuser@node04 ~/hplsql/hplsql-0.3.31/cc_test_hplsql]$ ../hplsql -f select.sql

Starting query
Query executed successfully (893 ms)
col1_2 col2_2 col3_2 col4_2 col5_2 col6_2 col7_2 col8_2 col9_2 part10
col1_2 col2_2 col3_2 col4_2 col5_2 col6_2 col7_2 col8_2 col9_2 part1
col1_4 col2_4 col3_4 col4_4 col5_4 col6_4 col7_4 col8_4 col9_4 part10

使用简单的函数

[devuser@node04 ~/hplsql/hplsql-0.3.31/cc_test_hplsql]$ cat function_hello.sql 
CREATE FUNCTION hello2(text STRING)
RETURNS STRING
BEGIN
RETURN 'Hello, ' || text || '!';
END;
-- Call the function
PRINT hello2('world');

[devuser@node04 ~/hplsql/hplsql-0.3.31/cc_test_hplsql]$ ../hplsql -f function_hello.sql

Hello, world!

创建存储过程

[devuser@node04 ~/hplsql/hplsql-0.3.31/cc_test_hplsql]$ cat proc_print.sql 
CREATE PROCEDURE set_message(IN arg STRING)
BEGIN
SET result = 'Hello chenchi!';
print arg;
PRINT result || ', ' || arg;
END;

PRINT 'Hello, chenchi! at beginning .. ';
CALL set_message(' by CALL ..');

[devuser@node04 ~/hplsql/hplsql-0.3.31/cc_test_hplsql]$ ../hplsql -f proc_print.sql

Hello, chenchi! at beginning .. 
by CALL ..
Hello chenchi!, by CALL ..

[devuser@node04 ~/hplsql/hplsql-0.3.31/cc_test_hplsql]$ ../hplsql -f proc_print.sql -main set_message

null
Hello chenchi!,

使用include引用其他文件里的存储过程

[devuser@node04 ~/hplsql/hplsql-0.3.31/cc_test_hplsql]$ cat set_message.sql 
CREATE PROCEDURE set_message(IN name STRING, OUT result STRING)
BEGIN
print 'name='||name
SET result = 'Hello, ' || name || '!';
END;

[devuser@node04 ~/hplsql/hplsql-0.3.31/cc_test_hplsql]$ cat include.sql 
INCLUDE set_message.sql
DECLARE str STRING;
CALL set_message('world', str);
PRINT str;

[devuser@node04 ~/hplsql/hplsql-0.3.31/cc_test_hplsql]$ ../hplsql -f include.sql

name=world
Hello, world!

使用package

[devuser@node04 ~/hplsql/hplsql-0.3.31/cc_test_hplsql]$ cat package_users.sql
create or replace package users as
session_count int := 0;
function get_count() return int; 
procedure add(name varchar(100));
end;

create or replace package body users as
function get_count() return int
is
begin
return session_count;
end; 
procedure add(name varchar(100))
is 
begin
-- ...
session_count = session_count + 1;end;
end;
users.add('John');
users.add('Sarah');
users.add('Paul');
print 'Number of users: ' || users.get_count();

[devuser@node04 ~/hplsql/hplsql-0.3.31/cc_test_hplsql]$ ../hplsql -f package_users.sql

Number of users: 3

使用游标

[devuser@node04 ~/hplsql/hplsql-0.3.31/cc_test_hplsql]$ cat cursor.sql 
CREATE PROCEDURE spOpenIssues 
DYNAMIC RESULT SETS 1
BEGIN
DECLARE cur CURSOR WITH RETURN FOR
SELECT col1,col2 FROM impala.impala_test_middle;
OPEN cur;
END;

DECLARE id INT;
DECLARE name VARCHAR(30);

CALL spOpenIssues;
ALLOCATE c1 CURSOR FOR PROCEDURE spOpenIssues;

FETCH c1 INTO id, name;
WHILE (SQLCODE = 0)
DO
PRINT id || ' - ' || name;
FETCH c1 INTO id, name;
END WHILE;
CLOSE c1;

[devuser@node04 ~/hplsql/hplsql-0.3.31/cc_test_hplsql]$ ../hplsql -f cursor.sql

- col2_2
- col2_4
- col2_4

更多demo自己看http://www.hplsql.org/doc

 

       

 

你可能感兴趣的:(hive,hplsql)