name |
constellation |
blood_type |
孙悟空 |
白羊座 |
A |
大海 |
射手座 |
A |
宋宋 |
白羊座 |
B |
猪八戒 |
白羊座 |
A |
凤姐 |
射手座 |
A |
[hadoop@hadoop112 datas]$ vi constellation.txt
孙悟空 白羊座 A
大海 射手座 A
宋宋 白羊座 B
猪八戒 白羊座 A
凤姐 射手座 A
苍老师 白羊座 B
create table person_info(
name string,
constellation string,
blood_type string)
row format delimited fields terminated by "\t";
load data local inpath "/opt/module/datas/constellation.txt" into table person_info;
0: jdbc:hive2://hadoop112:10000> select * from person_info;
+-------------------+----------------------------+-------------------------+--+
| person_info.name | person_info.constellation | person_info.blood_type |
+-------------------+----------------------------+-------------------------+--+
| 孙悟空 | 白羊座 | A |
| 大海 | 射手座 | A |
| 宋宋 | 白羊座 | B |
| 猪八戒 | 白羊座 | A |
| 凤姐 | 射手座 | A |
| 苍老师 | 白羊座 | B |
+-------------------+----------------------------+-------------------------+--+
6 rows selected (0.058 seconds)
0: jdbc:hive2://hadoop112:10000>
执行如下HQL:
select
concat(constellation,",",blood_type) xx,
count(*) geshu
from
person_info
group by
constellation,
blood_type;
结果如下:
+--------+--------+--+
| xx | geshu |
+--------+--------+--+
| 射手座,A | 2 |
| 白羊座,A | 2 |
| 白羊座,B | 2 |
+--------+--------+--+
0: jdbc:hive2://hadoop112:10000> desc function collect_list;
+--------------------------------------------------------------+--+
| tab_name |
+--------------------------------------------------------------+--+
| collect_list(x) - Returns a list of objects with duplicates |
+--------------------------------------------------------------+--+
1 row selected (0.023 seconds)
0: jdbc:hive2://hadoop112:10000>
执行如下HQL:
select
concat(constellation,",",blood_type) xx,
collect_list(name) names
from
person_info
group by
constellation,
blood_type;
结果显示如下:
+--------+----------------+--+
| xx | names |
+--------+----------------+--+
| 射手座,A | ["大海","凤姐"] |
| 白羊座,A | ["孙悟空","猪八戒"] |
| 白羊座,B | ["宋宋","苍老师"] |
+--------+----------------+--+
0: jdbc:hive2://hadoop112:10000> desc function concat_ws;
+-------------------------------------------------------------------------------------------------------------------------+--+
| tab_name |
+-------------------------------------------------------------------------------------------------------------------------+--+
| concat_ws(separator, [string | array(string)]+) - returns the concatenation of the strings separated by the separator. |
+-------------------------------------------------------------------------------------------------------------------------+--+
1 row selected (0.023 seconds)
执行如下HQL:
select
concat_ws(",",constellation,blood_type) xx,
concat_ws(",",collect_list(name)) names
from
person_info
group by
constellation,
blood_type;
结果显示如下:
+--------+----------+--+
| xx | names |
+--------+----------+--+
| 射手座,A | 大海,凤姐 |
| 白羊座,A | 孙悟空,猪八戒 |
| 白羊座,B | 宋宋,苍老师 |
+--------+----------+--+
用法:lateral view udtf(expression) tableAlias AS columnAlias
解释:用于和split, explode等UDTF一起使用,它能够将一列数据拆成多行数据,在此基础上可以对拆分后的数据进行聚合。
movie |
category |
《疑犯追踪》 |
悬疑,动作,科幻,剧情 |
《Lie to me》 |
悬疑,警匪,动作,心理,剧情 |
《战狼2》 |
战争,动作,灾难 |
[hadoop@hadoop112 datas]$ vi movie.txt
《疑犯追踪》 悬疑,动作,科幻,剧情
《Lie to me》 悬疑,警匪,动作,心理,剧情
《战狼2》 战争,动作,灾难
[hadoop@hadoop112 datas]$
create table movie_info(
movie string,
category array)
row format delimited fields terminated by "\t"
collection items terminated by ",";
load data local inpath "/opt/module/datas/movie.txt" into table movie_info;
0: jdbc:hive2://hadoop112:10000> select * from movie_info;
+-------------------+-----------------------------+--+
| movie_info.movie | movie_info.category |
+-------------------+-----------------------------+--+
| 《疑犯追踪》 | ["悬疑","动作","科幻","剧情"] |
| 《Lie to me》 | ["悬疑","警匪","动作","心理","剧情"] |
| 《战狼2》 | ["战争","动作","灾难"] |
+-------------------+-----------------------------+--+
3 rows selected (0.054 seconds)
0: jdbc:hive2://hadoop112:10000>
0: jdbc:hive2://hadoop112:10000> select explode(category) from movie_info;
+------+--+
| col |
+------+--+
| 悬疑 |
| 动作 |
| 科幻 |
| 剧情 |
| 悬疑 |
| 警匪 |
| 动作 |
| 心理 |
| 剧情 |
| 战争 |
| 动作 |
| 灾难 |
+------+--+
12 rows selected (0.051 seconds)
0: jdbc:hive2://hadoop112:10000> desc function explode;
执行如下HQL:
select movie,explode(category) from movie_info;
发现会报如下错误:
Error: Error while compiling statement: FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions (state=42000,code=10081)
则此时需用lateral view 函数实现以上功能,HQL如下:
select
movie_info.movie,
tbl.c_name
from
movie_info
lateral view
explode(category) tbl as c_name;
查询结果如下:
+-------------------+-------------+--+
| movie_info.movie | tbl.c_name |
+-------------------+-------------+--+
| 《疑犯追踪》 | 悬疑 |
| 《疑犯追踪》 | 动作 |
| 《疑犯追踪》 | 科幻 |
| 《疑犯追踪》 | 剧情 |
| 《Lie to me》 | 悬疑 |
| 《Lie to me》 | 警匪 |
| 《Lie to me》 | 动作 |
| 《Lie to me》 | 心理 |
| 《Lie to me》 | 剧情 |
| 《战狼2》 | 战争 |
| 《战狼2》 | 动作 |
| 《战狼2》 | 灾难 |
+-------------------+-------------+--+
12 rows selected (0.058 seconds)
0: jdbc:hive2://hadoop112:10000>
1. 实现category所包含的movie
实现HQL如下:
select
categories,
collect_set(movie) movies
from
(select
movie_info.movie,
tbl.categories
from
movie_info
lateral view
explode(category) tbl as categories) t1
group by
categories;
查询结果如下:
+-------------+-----------------------------------+--+
| categories | movies |
+-------------+-----------------------------------+--+
| 剧情 | ["《疑犯追踪》","《Lie to me》"] |
| 动作 | ["《疑犯追踪》","《Lie to me》","《战狼2》"] |
| 心理 | ["《Lie to me》"] |
| 悬疑 | ["《疑犯追踪》","《Lie to me》"] |
| 战争 | ["《战狼2》"] |
| 灾难 | ["《战狼2》"] |
| 科幻 | ["《疑犯追踪》"] |
| 警匪 | ["《Lie to me》"] |
+-------------+-----------------------------------+--+
8 rows selected (16.83 seconds)
2. 使用concat_ws上述结果中的movies列转换为string
实现HQL如下:
select
categories,
concat_ws("|",collect_set(movie)) movies
from
(select
movie_info.movie,
tbl.categories
from
movie_info
lateral view
explode(category) tbl as categories) t1
group by
categories;
查询结果如下:
+-------------+---------------------------+--+
| categories | movies |
+-------------+---------------------------+--+
| 剧情 | 《疑犯追踪》|《Lie to me》 |
| 动作 | 《疑犯追踪》|《Lie to me》|《战狼2》 |
| 心理 | 《Lie to me》 |
| 悬疑 | 《疑犯追踪》|《Lie to me》 |
| 战争 | 《战狼2》 |
| 灾难 | 《战狼2》 |
| 科幻 | 《疑犯追踪》 |
| 警匪 | 《Lie to me》 |
+-------------+---------------------------+--+