1、字符串转为map
str_to_map(text[, delimiter1, delimiter2])
使用两个分隔符将文本拆分为键值对。 Delimiter1将文本分成K-V对,Delimiter2分割每个K-V对。对于delimiter1默认分隔符是',',对于delimiter2默认分隔符是'='。
示例:
select str_to_map('aaa:11&bbb:22', '&', ':');
select str_to_map('aaa:11&bbb:22', '&', ':')['aaa'];
select str_to_map('device_ds:2&uid_cnt:1','&',',') --键值分割不到,值会出现Null
综合使用示范:
select a1.appkey,a1.appsource,index_key,index_value
from tab_sum a1
lateral view explode(str_to_map(concat('device_ds:',a1.device_ds_cnt,'&','uid_cnt:',a1.uid_cnt),'&',':')) mid_list_tab as index_key,index_value;
2、字符串转为array
分割字符串函数: split
语法: split(string str, stringpat)
返回值: array
说明: 按照pat字符串分割str,会返回分割后的字符串数组
举例:
select split('aaa:11:bbb:22',':');
["aaa","11","bbb","22"]
select split('aaa:11:bbb:22',':')[0];
aaa
3、字符字段去重汇总转成array
collect_set函数:该函数的作用是将某字段的值进行去重汇总,产生Array类型字段。
drop table if exists xxxxx_tabletest;
CREATE TABLE xxxxx_tabletest(
id string,
name string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'field.delim'=',',
'line.delim'='\n',
'serialization.format'=',');
insert into xxxxx_tabletest(id,name)
values
('1','A'),
('1','C'),
('1','B'),
('2','B'),
('2','C'),
('2','D'),
('3','B'),
('3','C'),
('3','D');
select id,collect_set(name) from xxxxx_tabletest group by id;
OK
1 ["A","C","B"]
2 ["B","C","D"]
3 ["B","C","D"]
Time taken: 36.966 seconds, Fetched: 3 row(s)