Hive 行转列 & 列转行

前言

hive中行转列和列转行比较常见，一般是用于中间数据的生产。

行转列

行转列一般采用IF进行判断，筛选出本列对应的数据进行统计。

基础数据

unique_id	sex	age
561635f8aa0cc1195238dbaf0c2044e3	Female	20
5ef957bb74e46a0f7b61d56d80f01689	Male	30
fbb891c344394007e00b3bd51ed65e2e	Female	25

ETL转化

SELECT age
    , count(if(sex = 'Female', 1, NULL)) AS Female
    , count(if(sex = 'Male', 1, NULL)) AS Male
FROM person_basic_info_all
GROUP BY age
ORDER BY age

输出结果

age	Female	Male
10	10	20
20	30	30
30	25	25

炸裂函数explode，将数组转换成多行

基础数据

unique_id	certificate_id
34f228cc5176d87907cbefc9ed5d01f6	01095583,P0104378,30342-01
c3db65dca36fa59a368204fd18f5bc82	51425538,EC6311054,0073774-01
21e4e7491e3ef58307ade2ac8e5e3c90	01092425,P0058397,1045-12

ETL转化

SELECT explode(split(certificate_id, ',')) AS certificate_id FROM certification_info

输出结果

certificate_id

01095583
EC6311054
0073774-01
P0058397

如果我们直接

SELECT unique_id, explode(split(certificate_id, ',')) AS certificate_id FROM certification_info

就会报错

Error while compiling statement: FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions

那么，我们由此引入LATERAL VIEW函数

1.Lateral View 用于和UDTF函数【explode,split】结合来使用。
2.首先通过UDTF函数将数据拆分成多行，再将多行结果组合成一个支持别名的虚拟表。
3..主要解决在select使用UDTF做查询的过程中查询只能包含单个UDTF，不能包含其它字段以及多个UDTF的情况。
4.语法：LATERAL VIEW udtf(expression) tableAlias AS columnAlias (',' columnAlias)

列转行(Lateral View + explode)

对一列进行炸裂，其他列保持不变

SELECT unique_id, certificate_id FROM certification_info
LATERAL VIEW explode(certificate_id) tmpTable AS certificate_id

输出结果

unique_id	certificate_id
34f228cc5176d87907cbefc9ed5d01f6	01095583
34f228cc5176d87907cbefc9ed5d01f6	P0104378
34f228cc5176d87907cbefc9ed5d01f6	30342-01
c3db65dca36fa59a368204fd18f5bc82	51425538
c3db65dca36fa59a368204fd18f5bc82	EC6311054
21e4e7491e3ef58307ade2ac8e5e3c90	01092425

列转行(Lateral View + posexplode)

对多列进行炸裂，其他列保持不变

基础数据

unique_id	ic_number	passport_number	birth_certificate_id
561635f8aa0cc1195238dbaf0c2044e3	01052408	C0544935	36066
5ef957bb74e46a0f7b61d56d80f01689	01107410	P0093218	40067-01
fbb891c344394007e00b3bd51ed65e2e	01041475	C0474512	22612

ETL转化

SELECT unique_id, certificate_type, certificate_id
FROM (
    SELECT unique_id, certificate_type, certificate_id
    FROM (
        SELECT unique_id
            , concat_ws(',', 'ic_number', 'passport_number', 'birth_certificate_id') AS certificate_type_tuple
            , concat_ws(',', ic_number, passport_number, birth_certificate_id) AS certificate_id_tuple
        FROM person_basic_info_all
    ) person
        LATERAL VIEW posexplode(split(certificate_type_tuple, ',')) certificate_type_tuple AS idx_type, certificate_type
        LATERAL VIEW posexplode(split(certificate_id_tuple, ',')) certificate_id_tuple AS idx_id, certificate_id
    WHERE idx_type = idx_id
) tmp

输出结果

unique_id	certificate_type	certificate_id
5ef957bb74e46a0f7b61d56d80f01689	passport_number	X419700
81536afe4de7de3d33e0c9d864b54f3a	ic_number	50687170
d45af8297c01ebea84782d2b3e72cb0c	birth_certificate_id	0090801-01

结

行和列转换的问题主要解决存储和计算过程中关联关系不对等的问题。

Hive 行转列 & 列转行

前言

行转列

炸裂函数explode，将数组转换成多行

列转行(Lateral View + explode)

列转行(Lateral View + posexplode)

结

你可能感兴趣的:(Hive 行转列 & 列转行)