> ) ;
1. (cross可省略)cross join 可以不加连接条件
hive (test)> select * from tmp_a a join tmp_b b ;
a.aid a.anum b.bid b.bname
1 a20050111 1 2006032401
2 a20050112 1 2006032401
3 a20050113 1 2006032401
4 a20050114 1 2006032401
5 a20050115 1 2006032401
1 a20050111 2 2006032402
2 a20050112 2 2006032402
3 a20050113 2 2006032402
4 a20050114 2 2006032402
5 a20050115 2 2006032402
1 a20050111 3 2006032403
2 a20050112 3 2006032403
3 a20050113 3 2006032403
4 a20050114 3 2006032403
5 a20050115 3 2006032403
1 a20050111 4 2006032404
2 a20050112 4 2006032404
3 a20050113 4 2006032404
4 a20050114 4 2006032404
5 a20050115 4 2006032404
1 a20050111 8 2006032408
2 a20050112 8 2006032408
3 a20050113 8 2006032408
4 a20050114 8 2006032408
5 a20050115 8 2006032408
Time taken: 15.056 seconds, Fetched: 25 row(s)
2.join on 条件
select * from tmp_a a join tmp_b b on a.aid =b.bid ;
a.aid a.anum b.bid b.bname
1 a20050111 1 2006032401
2 a20050112 2 2006032402
3 a20050113 3 2006032403
4 a20050114 4 2006032404
Time taken: 14.459 seconds, Fetched: 4 row(s)
可见join on 是join的子集
join的时候没有连接条件的时候结果集是最大的 ,所以Hive中Join的关联键必须在ON()中指定,不能在Where中指定,否则就会先做笛卡尔积,再过滤 ,此外如果join的时候有非等值连接的时候可以刚在where的条件中进行过滤,减少结果集 。
3. full (outer 可以省略) join
select * from tmp_a a full outer join tmp_b b on a.aid =b.bid ;
a.aid a.anum b.bid b.bname
1 a20050111 1 2006032401
2 a20050112 2 2006032402
3 a20050113 3 2006032403
4 a20050114 4 2006032404
5 a20050115 NULL NULL
NULL NULL 8 2006032408
4. select * from tmp_a a left join tmp_b b on a.aid =b.bid ;
a.aid a.anum b.bid b.bname
1 a20050111 1 2006032401
2 a20050112 2 2006032402
3 a20050113 3 2006032403
4 a20050114 4 2006032404
5 a20050115 NULL NULL
Time taken: 12.028 seconds, Fetched: 5 row(s)
left join 保留所有左边的记录,连接条件的不成立的就用NULL补全,而且是full outer join的子集
同理right join也是,只是右边的表结果全被保留 。
5. left semi join
select * from tmp_a a left semi join tmp_b b on a.aid =b.bid ;
a.aid a.anum
1 a20050111
2 a20050112
3 a20050113
4 a20050114
Time taken: 16.078 seconds, Fetched: 4 row(s)
等同于 下面的条件
select a.aid ,a.anum from tmp_a a left join tmp_b b on a.aid =b.bid where b.bid is not null ;
a.aid a.anum
1 a20050111
2 a20050112
3 a20050113
4 a20050114
Time taken: 10.473 seconds, Fetched: 4 row(s)