hive 行列转换案例

 

 

 

0  stu表数据:

 

stu:
id             name
hello,you zm2008
hello,me zm2015

 

 

1 实现单词计数:

 

1.0 数据拆分成数组

select split(id,',') from  stu;    得到数组
[hello,you]
[hello,me]

 

1.1 继续将数组拆分(hive explode函数会将数组继续拆分成单个字符)

select explode(split(id,','))  from stu;   窗体函数
hello
you
hello
me

 

 

1.2 分组统计:

select t1.c1, count(1) from (select explode(split(id,',')) as c1  from stu)  t1 group by t1.c1;
hello 2
you 1
me 1

 

上述语句中, t1表是1.1的结果

 

 

2 列转成行:  明确以哪个字段分组,分组后将列转成行使用 

公式: concat_ws(",",collect_set(要转成行的列))  group by 分组列

 

2.0 表数据

 

列转换行:
user
id name
1 zhangsan
2 lisi
3 wangwu

address
name     addr
zhangsan beijing
zhangsan shanghai
lisi tianjin
wangwu nanjing

期待结果:
1 zhangsan beijing,shanghai
2 lisi tianjin
3 wangwu nanjing

 

2.1  函数介绍:

collect_set(x)   列转行函数---没有重复
collect_list(x)   列转行函数---可以有重复
concat_ws 拼接函数

 

 

2.2 操作步骤:

 

2.2.0:

select user.id, user.name, address.addr from user join address on user.name = address.name;
1 zhangsan beijing
2 zhangsan shanghai
3 lisi tianjin
4 wangwu nanjing

 

2.2.1:

select max(user.id), user.name, collect_set(address.addr) from user join address on user.name = address.name group by user.name;
2 lisi [tianjin]
3 wangwu [nanjing]
1 zhangsan [shanghai,beijing]

 

2.2.2: 以name分组,将name相同下的 addr列转变成行存储,并且分组后列为多个下下以,做间隔

select max(user.id) as id, user.name, concat_ws(",",collect_set(address.addr)) from user join address on user.name = address.name group by user.name order by id;
1 zhangsan shanghai,beijing
2 lisi tianjin
3 wangwu nanjing

 

 

2.3 行转换列案例2:

一、问题

hive如何将

a       b       1
a       b       2
a       b       3
c       d       4
c       d       5
c       d       6

变为:

a       b       1,2,3
c       d       4,5,6


二、数据

test.txt

a       b       1 
a       b       2 
a       b       3 
c       d       4 
c       d       5 
c       d       6

三、答案

1.建表

drop table tmp_jiangzl_test;
create table tmp_jiangzl_test
(
col1 string,
col2 string,
col3 string
)
row format delimited fields terminated by '\t'
stored as textfile;


load data local inpath '/home/jiangzl/shell/test.txt' into table tmp_jiangzl_test;

2.处理

select col1,col2,concat_ws(',',collect_set(col3)) 
from tmp_jiangzl_test  
group by col1,col2;

 

 

 

3 行转换列:

格式: select 列1, 列2, 新列名  from 表 lateral view explode(split(要拆分的列,拆分间隔)) tt as 新列名;

  ---> tt是 lateral view explode(...)的别名

 

create table ua as
select max(user.id) as id, user.name, concat_ws(",",collect_set(address.city))
from user join address on user.name = address.name group by user.name order by id;


hive>desc ua;
id string
name string
_c2 string


hive>select `_c2`  from ua; 
1 zhangsan shanghai,beijing
2 lisi tianjin
3 wangwu nanjing

hive>alter table  ua change `_c2` address string;   ----> 这时候这个字段 _c2需要增加 ``   因为_不会被识别


行转成列:
select split(addr,',') from ua;
[shanghai,beijing]
[tianjin]
[nanjing]

select explode(split(address,',')) from ua;
shanghai
beijing
tianjin
nanjing

select ua.id, ua.name, explode(split(address,',')) from ua;  
--->报错, explode(爆炸 可以形象理解成将数组数据爆炸后才会散列出来每个元素)函数只能独立执行 周围不能有其余字段

解决办法:
lateral view:  横向视图

select ua.id, ua.name, addr1  from ua  lateral view explode(split(addr,',')) a as addr1;    
1 zhangsan shanghai
1 zhangsan beijing
2 lisi tianjin
3 wagnwu nanjing

 

 

 

 

 

 

 

 

 

你可能感兴趣的:(hive 行列转换案例)