hive 列转行 行转列

1. explode

explode就是将hive一行中复杂的 array 或者 map 结构拆分成多行。

hive wiki对于expolde的解释如下:

explode() takes in an array (or a map) as an input and outputs the elements of the array (map) as separate rows.

UDTFs can be used in the SELECT expression list and as a part of LATERAL VIEW.

eg1.

myTable

hive 列转行 行转列_第1张图片

Then running the query:

SELECT explode(myCol) AS myNewCol FROM myTable;

will produce: 

hive 列转行 行转列_第2张图片

eg2.

 

2.lateral view

hive wiki 上的解释如下:

Lateral View Syntax

lateralView: LATERAL VIEW udtf(expression) tableAlias AS columnAlias (‘,’ columnAlias)* 
fromClause: FROM baseTable (lateralView)*

Description

Lateral view is used in conjunction with user-defined table generating functions such as explode(). As mentioned in Built-in Table-Generating Functions, a UDTF generates zero or more output rows for each input row. A lateral view first applies the UDTF to each row of base table and then joins resulting output rows to the input rows to form a virtual table having the supplied table alias.

An example table with two rows: 
hive 列转行 行转列_第3张图片

SELECT pageid, adid
FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid;

The resulting output will be 
hive 列转行 行转列_第4张图片

 

-----行转列
create table test_1102 ( name string,  marks string );

insert into test_1102 values ( 'jason', 'english'); 
insert into test_1102 values ( 'jason', 'chinese');
insert into test_1102 values ( 'jason', 'math');
insert into test_1102 values ( 'tracy', 'jjj');
insert into test_1102 values ( 'tracy', 'kkkkk');

select
   name,
   concat_ws(',', collect_set(concat(marks,'""""" """"ooooo')))
from  test_1102
group by name;

 

inline: 

SELECT *
from (
select array(struct('2019-07-01','2019-07-10')) as date_array
)a
lateral view inline(date_array) tf as start_dt, end_dt
a.date_array

 

tf.start_dt

 

tf.end_dt

 

[{"col1":"2019-07-01","col2":"2019-07-10"}] 2019-07-01 2019-07-10

 

SELECT *
from (
select  array(struct('2019-06-25',last_day('2019-06-25')),struct(date_add(last_day('2019-06-25'),1),'2019-07-03')) as date_array
)a
lateral view inline(date_array) tf as start_dt, end_dt
a.date_array

 

tf.start_dt

 

tf.end_dt

 

[{"col1":"2019-06-25","col2":"2019-06-30"},{"col1":"2019-07-01","col2":"2019-07-03"}] 2019-06-25 2019-06-30
[{"col1":"2019-06-25","col2":"2019-06-30"},{"col1":"2019-07-01","col2":"2019-07-03"}] 2019-07-01 2019-07-03

 

SELECT posexplode(split(space(7-1),' ')) as (pos,val)

pos

 

val

 

0  
1  
2  
3  
4  
5  
6

 

Built-in Table-Generating Functions (UDTF)

Normal user-defined functions, such as concat(), take in a single input row and output a single output row. In contrast, table-generating functions transform a single input row to multiple output rows.

Row-set columns types

Name(Signature)

Description

T

explode(ARRAY a)

Explodes an array to multiple rows. Returns a row-set with a single column (col), one row for each element from the array.

Tkey,Tvalue

explode(MAP m)

Explodes a map to multiple rows. Returns a row-set with a two columns (key,value) , one row for each key-value pair from the input map. (As of Hive 0.8.0.).

int,T posexplode(ARRAY a) Explodes an array to multiple rows with additional positional column of int type (position of items in the original array, starting with 0). Returns a row-set with two columns (pos,val), one row for each element from the array.

T1,...,Tn

inline(ARRAY> a)

Explodes an array of structs to multiple rows. Returns a row-set with N columns (N = number of top level elements in the struct), one row per struct from the array. (As of Hive 0.10.)

T1,...,Tn/r stack(int r,T1 V1,...,Tn/r Vn) Breaks up n values V1,...,Vn into rows. Each row will have n/r columns. must be constant.
     

string1,...,stringn

json_tuple(string jsonStr,string k1,...,string kn)

Takes JSON string and a set of n keys, and returns a tuple of n values. This is a more efficient version of the get_json_object UDF because it can get multiple keys with just one call.

string 1,...,stringn

parse_url_tuple(string urlStr,string p1,...,string pn)

Takes URL string and a set of n URL parts, and returns a tuple of n values. This is similar to the parse_url() UDF but can extract multiple parts at once out of a URL. Valid part names are: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO, QUERY:.

from : https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

 

ref: 

https://blog.csdn.net/bitcarmanlee/article/details/51926530

http://www.cnblogs.com/judylucky/p/3713774.html

http://chengjianxiaoxue.iteye.com/blog/2268371

https://stackoverflow.com/questions/38064412/converting-columns-to-rows-unpivot-in-hiveql

https://stackoverflow.com/questions/39353425/how-to-convert-columns-into-rows-in-hive

你可能感兴趣的:(hive 列转行 行转列)