[TOC]

用Hive处理数据的时候经常会遇到行列互相转换的需求，总结并记录一下行列转行的常见场景和操作语法
其中所有的操作都可以直接复制语句去自己的hive执行，查看结果

前置知识

执行hive或beeline进入，执行 desc function explode;查看函数说明；

explode(a) - separates the elements of array a into multiple rows, or the elements of a map into multiple rows and columns
    可以将数组炸开成多行，或者将map炸开成多行多列，是Hive内置的UDTF

split(str, regex) - Splits str around occurances that match regex
    按照正则规则去切割字符串
    
collect_list(x) - Returns a list of objects with duplicates
    返回不去重的集合
    
collect_set(x) - Returns a set of objects with duplicate elements eliminated
    返回一个去重的集合

concat_ws(separator, [string | array(string)]+) - returns the concatenation of the strings separated by the separator
    返回一个特定分隔符的拼接字符串

max(expr) - Returns the maximum value of expr
    返回表达式的最大值

准备数据

创建一个name,subject,score的期末考试成绩表，代表每个同学每个学科的成绩
导入数据也可以自定义文本然后load也行，sql语句一步到位，测试简单方便

用于行专列的数据，后称表1

create table school_final_test as
select 'jack' as name, 'english' as subject, 70 as score union all
select 'jack' as name, 'math' as subject, 80 as score union all
select 'jack' as name, 'chinese' as subject, 90 as score union all
select 'tim' as name, 'english' as subject, 10 as score union all
select 'tim' as name, 'math' as subject, 20 as score union all
select 'tim' as name, 'chinese' as subject, 30 as score;

表1数据:

name	subject	score
jack	english	70
jack	math	80
jack	chinese	90
tim	english	10
tim	math	20
tim	chinese	30

用于列转行的数据，后称表2

create table school_final_test1 as
select 'jack' as name, 70 as english,80 as math, 90 as chinese union all
select 'tim' as name, 10 as english,20 as math, 30 as chinese;

表2数据：

name	english	math	chinese
jack	70	80	90
tim	10	20	30

开始测试

需求1

多行转多列，数据源是表1

结果表：

name	english	math	chinese
jack	70	80	90
tim	10	20	30

group by + max + case when语法
巧妙的使用了数值型和0之间的大小关系

select name,
max(case subject when 'english' then score else 0 end) as english,
max(case subject when 'math' then score else 0 end) as math,
max(case subject when 'chinese' then score else 0 end) as chinese
from school_final_test
group by name;

疑问: 如果需要多行转多列的不是数值型，而是str类型是否仍可行呢？

select max(str) from
    (select 'str' as str union all
    select 'sts' as str union all
    select null as str)t1; -- result : sts

很明显，如果是字符串，可以使用null作为默认值，那么max仍然可行

需求2

多行转单列，数据源是表1

结果表：

name	scores
jack	english:70,math:80,chinese:90
tim	english:10,math:20,chinese:30

group by + collect_list + concat_ws语法
注意：函数中要求的类型

select name,concat_ws(',',
    collect_list(
        concat_ws(':',subject,cast(score as string))
        )
    ) as scores
from school_final_test
group by name;

需求3

多列转多行，数据源是表2

结果表：

name	subject	score
jack	english	70
jack	math	80
jack	chinese	90
tim	english	10
tim	math	20
tim	chinese	30

将每列需要的数据单独提取出来合并即可
注意： union和union all的区别是是否去重

select name,'english' as subject,english as score from school_final_test1 
union all 
select name,'math' as subject,math as score from school_final_test1 
union all 
select name,'chinese' as subject,chinese as score from school_final_test1;

需求4

单行转多列

数据源：

create table school_final_test2 as
select name,concat_ws(',',
    collect_list(
        concat_ws(':',subject,cast(score as string))
        )
    ) as scores
from school_final_test
group by name;

name	scores
jack	english:70,math:80,chinese:90
tim	english:10,math:20,chinese:30

结果表：

name	scores
jack	english:70
jack	math:80
jack	chinese:90
tim	english:10
tim	math:20
tim	chinese:30

split + explode语法

Lateral View是一个连接用户自定义函数(UDFs)的连接词，可以有多个;如果炸裂的列中有null值但是需要显示可以在view后面加outer，类似表的左外连接;

其中table1表示开窗得到的新表别名，必须写但可不使用(不使用时必须保证没有重复的列名)

select name,table1.scores as scores
from school_final_test2
lateral view explode(split(scores,',')) table1 as scores;

如果需要将结果表转换成表1的形式，可以继续对scores字段按照:进行拆分，然后用数组角标方式{{{arrya[index]}}}获取对应字段

select name,split(scores,':')[0] as subject,split(scores,':')[1] as score from (
    select name,table1.scores as scores
    from school_final_test2
    lateral view explode(split(scores,',')) table1 as scores
)t1;

需求5

将字符串"keyl=valuel&key2=value2...keyn=valuen"进行分割放到一个字段里面，可以查出任意一个keyx对应的valuex值

核心思路就是将字符串内数据疯转在map结构中，可以使用内置函数str_to_map，如果有其他更复杂的操作可自定义UDF；
str_to_map(text, delimiter1, delimiter2) - Creates a map by parsing text; Split text into key-value pairs using two delimiters. The first delimiter seperates pairs, and the second delimiter sperates key and value. If only one parameter is given, default delimiters are used: ',' as delimiter1 and '=' as delimiter2.*

select result['key1'] from 
    (select str_to_map('key1=value1&key2=value2&keyn=valuen','&','=') as result)t1;

Hive_语法_行列转换

前置知识

准备数据

开始测试

需求1

需求2

需求3

需求4

需求5

你可能感兴趣的:(hive,大数据)