Hive连接查询join用法详解

2019独角兽企业重金招聘Python工程师标准>>> hot3.png

先来个总的概括,Hive的join用法基本与SQL一致

建表语句不说了,直接看看两个表中的数据

表一:cl_student

hive (test)> select * from cl_student;
OK
1	chenli	21
2	xuzeng	22
3	xiaodan	23
4	hua	24

表2:cl_stu_sub

hive (test)> select * from cl_stu_sub;
OK
1	chinese
2	english
3	science
5	nature

1、内连接。inner join,即基于on语句,仅列出表1和表2符合连接条件的数据。

hive (test)> select a.*,b.* from cl_student a join cl_stu_sub b on a.id=b.id;
1	chenli	21	1	chinese
2	xuzeng	22	2	english
3	xiaodan	23	3	science

hive (test)> select a.*,b.* from cl_student a,cl_stu_sub b where a.id=b.id;
1	chenli	21	1	chinese
2	xuzeng	22	2	english
3	xiaodan	23	3	science

第一条语句与第二条SQL条语句的效果是一样的

 

2、左连接。左连接是显示左边的表的所有数据,如果有右边表与之对应,则显示;否则显示NULL

hive (test)> select a.*,b.* from cl_student a left outer join cl_stu_sub b on a.id=b.id;
1	chenli	21	1	chinese
2	xuzeng	22	2	english
3	xiaodan	23	3	science
4	hua	24	NULL	NULL

深入了解,比较两者的不同,在上述语句中增加了一个where条件或者是and的条件

on条件会产生一个临时表,where条件是对这个临时表进行过滤

hive (test)> select a.*,b.* from cl_student a left outer join cl_stu_sub b on a.id=b.id where b.name='chinese';
1	chenli	21	1	chinese

hive (test)>select a.*,b.* from cl_student a left outer join cl_stu_sub b on a.id=b.id where a.name='chenli';
1	chenli	21	1	chinese

标准查询关键字执行顺序为 from->where->group by->having->order by

on是先对表进行筛选后再关联的,left关联则on对右表才有效的,左表都是要选出来的

如果是要条件查询后才连接应该把查询件放置于ON后。如果是想再连接完毕后才筛选就应把条件放置于where后面,对主表的帅选要用where条件

hive (test)>select a.*,b.* from cl_student a left outer join cl_stu_sub b where a.id=b.id;
1	chenli	21	1	chinese
2	xuzeng	22	2	english
3	xiaodan	23	3	science

hive (test)>select a.*,b.* from cl_student a left outer join cl_stu_sub b on a.id=b.id and a.name='chenli';
1	chenli	21	1	chinese
2	xuzeng	22	NULL	NULL
3	xiaodan	23	NULL	NULL
4	hua	24	NULL	NULL

hive (test)>select a.*,b.* from cl_student a left outer join cl_stu_sub b on a.id=b.id and b.name='chinese';
1	chenli	21	1	chinese
2	xuzeng	22	NULL	NULL
3	xiaodan	23	NULL	NULL
4	hua	24	NULL	NULL

 

 

3、右连接.与左连接同理

hive (test)> select a.*,b.* from cl_student a right outer join cl_stu_sub b on a.id=b.id;
1	chenli	21	1	chinese
2	xuzeng	22	2	english
3	xiaodan	23	3	science
NULL	NULL	NULL	5	nature

 

4、全连接。相当于表1和表2的数据都显示,如果没有对应的数据,则显示NULL

hive (test)> select a.*,b.* from cl_student a full outer join cl_stu_sub b on a.id=b.id;
1	chenli	21	1	chinese
2	xuzeng	22	2	english
3	xiaodan	23	3	science
4	hua	24	NULL	NULL
NULL	NULL	NULL	5	nature

 

5、左半开连接。left semi join,语法与左连接不一样,只能选择出左边表的数据,此数据符合on后面的条件。

semi join通常比inner join的效率更高

hive (test)>select a.* from cl_student a left semi join cl_stu_sub b on a.id=b.id;
1	chenli	21
2	xuzeng	22
3	xiaodan	23


hive (test)>select * from cl_student a left semi join cl_stu_sub b on a.id=b.id;
1	chenli	21
2	xuzeng	22
3	xiaodan	23

hive (test)>select a.*,b.* from cl_student a left semi join cl_stu_sub b on a.id=b.id;
FAILED: SemanticException [Error 10009]: Line 1:11 Invalid table alias 'b'

 

6、Hive不支持右半开连接

 

7、笛卡尔积join

笛卡尔积是一种连接,表示左边表的行数乘以右边表的行数。

hive (test)>select a.*,b.* from cl_student a join cl_stu_sub b;
1	chenli	21	1	chinese
1	chenli	21	2	english
1	chenli	21	3	science
1	chenli	21	5	nature
2	xuzeng	22	1	chinese
2	xuzeng	22	2	english
2	xuzeng	22	3	science
2	xuzeng	22	5	nature
3	xiaodan	23	1	chinese
3	xiaodan	23	2	english
3	xiaodan	23	3	science
3	xiaodan	23	5	nature
4	hua	24	1	chinese
4	hua	24	2	english
4	hua	24	3	science
4	hua	24	5	nature

 

转载于:https://my.oschina.net/iamchenli/blog/845850

你可能感兴趣的:(Hive连接查询join用法详解)