参考:
http://blog.csdn.net/luanwpp/article/details/7565258 HIVE中join、semi join、outer join举例详解
摘录该文章:
hive> select * from zz0;
OK
111111
222222
888888
Time taken: 0.147 seconds
hive> select * from zz1;
OK
111111
333333
444444
888888
hive> select * from zz0 join zz1 on zz0.uid = zz1.uid;
得到:
111111 111111
888888 888888
select * from zz0 left outer join zz1 on zz0.uid = zz1.uid;
得到:
111111 111111
222222 NULL
888888 888888
hive> select * from zz1;
OK
111111
333333
444444
888888
hive> select * from zz0 full outer join zz1 on zz0.uid = zz1.uid;
OK
NULL
111111 111111
222222 NULL
NULL 333333
NULL 444444
888888 888888
hive> select * from zz0 left semi join zz1 on zz0.uid = zz1.uid;
111111 111111
888888 888888
再转载:http://blog.itpub.net/post/901/12680 join,outer-join,semi-join,anti-join 深入分析:
如下是join
select ename,dname from emp,dept where emp.deptno=dname.deptno;
2个数据源键值一一比较,返回相互匹配的记录集
for example: nested loop join
semi-join
select dname from dept where exists( select null from emp where emp.deptno=dept.deptno)
多在子查询exists中使用,对外部row source的每个键值,查找到内部row source匹配的第一个键值后就返回,如果找到就不用再查找内部row source其他的键值了。
for example: nested loop semi-join