hive中的join

原文链接: http://www.cnblogs.com/rocky-AGE-24/p/6929636.html

建表

0: jdbc:hive2://localhost:10000> create database myjoin;
No rows affected (3.78 seconds)
0: jdbc:hive2://localhost:10000> use myjoin;
No rows affected (0.419 seconds)
0: jdbc:hive2://localhost:10000> create table a(id int,name string) row format delimited fields terminated by ',';
No rows affected (2.08 seconds)
0: jdbc:hive2://localhost:10000> create table b(id int,name string) row format delimited fields terminated by ',';
0: jdbc:hive2://localhost:10000> select * from a
0: jdbc:hive2://localhost:10000> ;
+-------+---------+--+
| a.id  | a.name  |
+-------+---------+--+
| 1     | qq      |
| 2     | ww      |
| 3     | ee      |
| 4     | rr      |
| 5     | tt      |
| 6     | yy      |
| 7     | aa      |
| 8     | ss      |
| 11    | zz      |
+-------+---------+--+
9 rows selected (1.881 seconds)
0: jdbc:hive2://localhost:10000> select * from b;
+-------+---------+--+
| b.id  | b.name  |
+-------+---------+--+
| 1     | qq      |
| 2     | 22      |
| 3     | dd      |
| 4     | rr      |
| 6     | fgf     |
| 7     | as      |
| 9     | 23      |
| 12    | ww      |
| 34    | 3       |
| 23    | 34      |
| 12    | 45      |
| 26    | 4r      |
+-------+---------+--+
12 rows selected (0.147 seconds)
inner join 的结果,也就是join
0
: jdbc:hive2://localhost:10000> select a.*,b.* from a inner join b on a.id = b.id; INFO : Execution completed successfully INFO : MapredLocal task succeeded INFO : Number of reduce tasks is set to 0 since there's no reduce operator INFO : number of splits:1 INFO : Submitting tokens for job: job_1496277833427_0007 INFO : The url to track the job: http://mini2:8088/proxy/application_1496277833427_0007/ INFO : Starting Job = job_1496277833427_0007, Tracking URL = http://mini2:8088/proxy/application_1496277833427_0007/ INFO : Kill Command = /home/hadoop/xxxxxx/hadoop265/bin/hadoop job -kill job_1496277833427_0007 INFO : Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0 INFO : 2017-06-01 16:32:03,138 Stage-3 map = 0%, reduce = 0% INFO : 2017-06-01 16:32:26,221 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 5.05 sec INFO : MapReduce Total cumulative CPU time: 5 seconds 50 msec INFO : Ended Job = job_1496277833427_0007 +-------+---------+-------+---------+--+ | a.id | a.name | b.id | b.name | +-------+---------+-------+---------+--+ | 1 | qq | 1 | qq | | 2 | ww | 2 | 22 | | 3 | ee | 3 | dd | | 4 | rr | 4 | rr | | 6 | yy | 6 | fgf | | 7 | aa | 7 | as | +-------+---------+-------+---------+--+

full outer join ,两边的数据都会出来只不过on条件没有对应上的一端会显示为null

0: jdbc:hive2://localhost:10000> select a.*,b.* from a full outer join b on a.id = b.id;
INFO  : Number of reduce tasks not specified. Estimated from input data size: 1
INFO  : In order to change the average load for a reducer (in bytes):
INFO  :   set hive.exec.reducers.bytes.per.reducer=
INFO  : In order to limit the maximum number of reducers:
INFO  :   set hive.exec.reducers.max=
INFO  : In order to set a constant number of reducers:
INFO  :   set mapreduce.job.reduces=
INFO  : number of splits:2
INFO  : Submitting tokens for job: job_1496277833427_0008
INFO  : The url to track the job: http://mini2:8088/proxy/application_1496277833427_0008/
INFO  : Starting Job = job_1496277833427_0008, Tracking URL = http://mini2:8088/proxy/application_1496277833427_0008/
INFO  : Kill Command = /home/hadoop/xxxxxx/hadoop265/bin/hadoop job  -kill job_1496277833427_0008
INFO  : Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1
INFO  : 2017-06-01 16:34:05,413 Stage-1 map = 0%,  reduce = 0%
INFO  : 2017-06-01 16:35:05,889 Stage-1 map = 0%,  reduce = 0%
INFO  : 2017-06-01 16:37:35,521 Stage-1 map = 0%,  reduce = 0%
INFO  : 2017-06-01 16:38:46,061 Stage-1 map = 67%,  reduce = 0%, Cumulative CPU 6.52 sec
INFO  : 2017-06-01 16:38:49,443 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 9.17 sec
INFO  : 2017-06-01 16:39:25,252 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 12.65 sec
INFO  : MapReduce Total cumulative CPU time: 12 seconds 650 msec
INFO  : Ended Job = job_1496277833427_0008
+-------+---------+-------+---------+--+
| a.id  | a.name  | b.id  | b.name  |
+-------+---------+-------+---------+--+
| 1     | qq      | 1     | qq      |
| 2     | ww      | 2     | 22      |
| 3     | ee      | 3     | dd      |
| 4     | rr      | 4     | rr      |
| 5     | tt      | NULL  | NULL    |
| 6     | yy      | 6     | fgf     |
| 7     | aa      | 7     | as      |
| 8     | ss      | NULL  | NULL    |
| NULL  | NULL    | 9     | 23      |
| 11    | zz      | NULL  | NULL    |
| NULL  | NULL    | 12    | 45      |
| NULL  | NULL    | 12    | ww      |
| NULL  | NULL    | 23    | 34      |
| NULL  | NULL    | 26    | 4r      |
| NULL  | NULL    | 34    | 3       |
+-------+---------+-------+---------+--+
15 rows selected (371.304 seconds)

select a.*from a left semi join b on a.id = b.id; -- from 前不能写b.* 否则会报错( Error while compiling statement: FAILED: SemanticException [Error 10009]: Line 1:11 Invalid table alias 'b' (state=42000,code=10009))

替代exist in 的用法,返回值只是inner join 中左边的一般,

+-------+---------+--+
| a.id  | a.name  |
+-------+---------+--+
| 1     | qq      |
| 2     | ww      |
| 3     | ee      |
| 4     | rr      |
| 6     | yy      |
| 7     | aa      |
+-------+---------+--+

没有 right semi join

left semi join 是exist in 的高效实现,比inner join 效率高

转载于:https://www.cnblogs.com/rocky-AGE-24/p/6929636.html

你可能感兴趣的:(hive中的join)