Hive内嵌表生成函数UDTF:explode,posexplode,json_tuple,parse_url_tuple,stack

0.Hive内嵌表生成函数

Built-in Table-Generating Functions (UDTF)

普通的用户定义函数,如concat(),接受单个输入行并输出单个输出行。相反,表生成函数将单个输入行转换为多个输出行

Row-set columns types

Name(Signature)

Description

T

explode(ARRAY a)

Explodes an array to multiple rows. Returns a row-set with a single column (col), one row for each element from the array.

将数组分解为多行。返回一个单列(col)的行集,数组中的每个元素对应一行。

Tkey,Tvalue

explode(MAP m)

Explodes a map to multiple rows. Returns a row-set with a two columns (key,value) , one row for each key-value pair from the input map. (As of Hive 0.8.0.).

int,T posexplode(ARRAY a)

Explodes an array to multiple rows with additional positional column of int type (position of items in the original array, starting with 0). Returns a row-set with two columns (pos,val), one row for each element from the array.

数组分解为多行,其中包含int类型的附加位置列(原始数组中的位置,从0开始),意思行转列以后,还多一列数组元素对应下标的列

T1,...,Tn

inline(ARRAY> a)

Explodes an array of structs to multiple rows. Returns a row-set with N columns (N = number of top level elements in the struct), one row per struct from the array. (As of Hive 0.10.)

将一个结构数组分解为多行。返回一个包含N列的行集(N =结构中顶级元素的数量),数组中的每个结构中有一行。

T1,...,Tn/r stack(int r,T1 V1,...,Tn/r Vn)

Breaks up n values V1,...,Vn into rows. Each row will have n/r columns. must be constant.

分解n值V1,…,将Vn分成r行。每一行都有n/r列。r必须是常数

string1,...,stringn

json_tuple(string jsonStr,string k1,...,string kn)

Takes JSON string and a set of n keys, and returns a tuple of n values. This is a more efficient version of the get_json_object UDF because it can get multiple keys with just one call.

string 1,...,stringn

parse_url_tuple(string urlStr,string p1,...,string pn)

Takes URL string and a set of n URL parts, and returns a tuple of n values. This is similar to the parse_url() UDF but can extract multiple parts at once out of a URL. Valid part names are: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO, QUERY:.

1.案例演示

1.1 explode,一般和lateral view explode一起使用

    一般用来实现表的行转列

select explode(array('A','B','C'));
select explode(array('A','B','C')) as col;
select tf.* from (select 0) t lateral view explode(array('A','B','C')) tf;
select tf.* from (select 0) t lateral view explode(array('A','B','C')) tf as col;

Hive内嵌表生成函数UDTF:explode,posexplode,json_tuple,parse_url_tuple,stack_第1张图片

select explode(map('A',10,'B',20,'C',30));
select explode(map('A',10,'B',20,'C',30)) as (key,value);
select tf.* from (select 0) t lateral view explode(map('A',10,'B',20,'C',30)) tf;
select tf.* from (select 0) t lateral view explode(map('A',10,'B',20,'C',30)) tf as key,value;

Hive内嵌表生成函数UDTF:explode,posexplode,json_tuple,parse_url_tuple,stack_第2张图片

1.2. posexplode的使用


select posexplode(array('A','B','C'));
select posexplode(array('A','B','C')) as (pos,val);
select tf.* from (select 0) t lateral view posexplode(array('A','B','C')) tf;
select tf.* from (select 0) t lateral view posexplode(array('A','B','C')) tf as pos,val;

Hive内嵌表生成函数UDTF:explode,posexplode,json_tuple,parse_url_tuple,stack_第3张图片

1.3 inline的使用

select inline(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2016-02-02')));
select inline(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2016-02-02'))) as (col1,col2,col3);
select tf.* from (select 0) t lateral view inline(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2016-02-02'))) tf;
select tf.* from (select 0) t lateral view inline(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2016-02-02'))) tf as col1,col2,col3;

Hive内嵌表生成函数UDTF:explode,posexplode,json_tuple,parse_url_tuple,stack_第4张图片

1.4 stack的使用

select stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01');
select stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01') as (col0,col1,col2);
select tf.* from (select 0) t lateral view stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01') tf;
select tf.* from (select 0) t lateral view stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01') tf as col0,col1,col2;

Hive内嵌表生成函数UDTF:explode,posexplode,json_tuple,parse_url_tuple,stack_第5张图片

1.5 json_turple 的使用

       在Hive 0.7中引入了一个新的json_tuple() ,是UDTF。它的参数是一个JSON字符串和一组key键,并使用一个函数返回一个元组值。这比调用GET_JSON_OBJECT从一个JSON字符串中检索多个键要高效得多。相当于加强版的get_JSON_OBJECT。它可以一次返回多个object,以元组的形式存储。


select a.timestamp, b.*
from log a lateral view json_tuple(a.appevent, 'eventid', 'eventname') b as f1, f2;

1.6 parse_url_tuple的使用

         parse_url_tuple() 是UDTF函数,类似于UDF中的parse_url(),但是可以同时提取给定URL的多个部分,以元组的形式返回数据。例如parse_url_tuple('http://facebook.com/path1/p.php?(k1=v1&k2=v2#Ref1', 'QUERY:k1', 'QUERY:k2')返回一个值为'v1','v2'的元组。这比多次调用parse_url()更有效。所有输入参数和输出列类型都是string。

Hive内嵌表生成函数UDTF:explode,posexplode,json_tuple,parse_url_tuple,stack_第6张图片

 

你可能感兴趣的:(Hive编程和数据仓库,UDTF函数,hive函数,explode,函数使用,get_json_turple,hive表生成函数)