用Hive处理数据的时候经常会遇到行列互相转换的需求,总结并记录一下行列转行的常见场景和操作语法
其中所有的操作都可以直接复制语句去自己的hive执行,查看结果
执行hive或beeline进入,执行
desc function explode;
查看函数说明;
explode(a) - separates the elements of array a into multiple rows, or the elements of a map into multiple rows and columns
可以将数组炸开成多行,或者将map炸开成多行多列,是Hive内置的UDTF
split(str, regex) - Splits str around occurances that match regex
按照正则规则去切割字符串
collect_list(x) - Returns a list of objects with duplicates
返回不去重的集合
collect_set(x) - Returns a set of objects with duplicate elements eliminated
返回一个去重的集合
concat_ws(separator, [string | array(string)]+) - returns the concatenation of the strings separated by the separator
返回一个特定分隔符的拼接字符串
max(expr) - Returns the maximum value of expr
返回表达式的最大值
创建一个name,subject,score的期末考试成绩表,代表每个同学每个学科的成绩
导入数据也可以自定义文本然后load也行,sql语句一步到位,测试简单方便
create table school_final_test as
select 'jack' as name, 'english' as subject, 70 as score union all
select 'jack' as name, 'math' as subject, 80 as score union all
select 'jack' as name, 'chinese' as subject, 90 as score union all
select 'tim' as name, 'english' as subject, 10 as score union all
select 'tim' as name, 'math' as subject, 20 as score union all
select 'tim' as name, 'chinese' as subject, 30 as score;
表1数据:
name | subject | score |
---|---|---|
jack | english | 70 |
jack | math | 80 |
jack | chinese | 90 |
tim | english | 10 |
tim | math | 20 |
tim | chinese | 30 |
create table school_final_test1 as
select 'jack' as name, 70 as english,80 as math, 90 as chinese union all
select 'tim' as name, 10 as english,20 as math, 30 as chinese;
表2数据:
name | english | math | chinese |
---|---|---|---|
jack | 70 | 80 | 90 |
tim | 10 | 20 | 30 |
多行转多列,数据源是表1
结果表:
name | english | math | chinese |
---|---|---|---|
jack | 70 | 80 | 90 |
tim | 10 | 20 | 30 |
group by + max + case when
语法select name,
max(case subject when 'english' then score else 0 end) as english,
max(case subject when 'math' then score else 0 end) as math,
max(case subject when 'chinese' then score else 0 end) as chinese
from school_final_test
group by name;
select max(str) from
(select 'str' as str union all
select 'sts' as str union all
select null as str)t1; -- result : sts
多行转单列,数据源是表1
结果表:
name | scores |
---|---|
jack | english:70,math:80,chinese:90 |
tim | english:10,math:20,chinese:30 |
group by + collect_list + concat_ws
语法select name,concat_ws(',',
collect_list(
concat_ws(':',subject,cast(score as string))
)
) as scores
from school_final_test
group by name;
多列转多行,数据源是表2
结果表:
name | subject | score |
---|---|---|
jack | english | 70 |
jack | math | 80 |
jack | chinese | 90 |
tim | english | 10 |
tim | math | 20 |
tim | chinese | 30 |
select name,'english' as subject,english as score from school_final_test1
union all
select name,'math' as subject,math as score from school_final_test1
union all
select name,'chinese' as subject,chinese as score from school_final_test1;
单行转多列
数据源:
create table school_final_test2 as
select name,concat_ws(',',
collect_list(
concat_ws(':',subject,cast(score as string))
)
) as scores
from school_final_test
group by name;
name | scores |
---|---|
jack | english:70,math:80,chinese:90 |
tim | english:10,math:20,chinese:30 |
结果表:
name | scores |
---|---|
jack | english:70 |
jack | math:80 |
jack | chinese:90 |
tim | english:10 |
tim | math:20 |
tim | chinese:30 |
split + explode
语法select name,table1.scores as scores
from school_final_test2
lateral view explode(split(scores,',')) table1 as scores;
:
进行拆分,然后用数组角标方式{{{arrya[index]
}}}获取对应字段select name,split(scores,':')[0] as subject,split(scores,':')[1] as score from (
select name,table1.scores as scores
from school_final_test2
lateral view explode(split(scores,',')) table1 as scores
)t1;