hive> create table lpx_mapjoin as
> SELECT '2012-04-17' as stat_date
> ,b.admin_member_id
> ,a.category_level2_id
> ,b.keywords
> ,sum(shownum) as sum_prod_show_cnt
> ,sum(clicknum) as sum_prod_click_cnt
> ,sysdate('yyyy-MM-dd hh:mm:ss')as dw_ins_date
> FROM (SELECT category_id
> ,category_level2_id
> FROM pub_en_category_dimt0_h
> WHERE dw_begin_date <= '2012-04-17'
> AND hp_dw_end_date > '2012-04-17'
> AND category_type = '02'
> ) a
> JOIN (SELECT admin_member_id
> ,lower(trim(regexp_replace(keyword,'[\\\\t|\\\\s]+', ' '))) as keywords
> ,coalesce(shownum,cast(0 as bigint)) as shownum
> ,coalesce(clicknum,cast(0 as bigint)) as clicknum
> ,post_category_id
> FROM idl_en_ctr_click_sdt0
> WHERE hp_stat_date = '2012-04-17'
> AND keyword IS NOT NULL
> ) b
> ON (a.category_id = b.post_category_id)
> GROUP BY b.admin_member_id
> ,a.category_level2_id
> ,b.keywords ;
Total MapReduce jobs = 3
Launching Job 1 out of 3
Number of reduce tasks not specified.Estimated from input data size: 3
In order to change the average loadfor a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum numberof reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number ofreducers:
set mapred.reduce.tasks=<number>
Cannot run job locally: InputSize (= 2511131447) is larger than hive.exec.mode.local.auto.inputbytes.max (= -1)
Warning:Can't find tianshu info:markid missing!
Starting Job = job_201205151343_1062053,Tracking URL = http://hdpjt:50030/jobdetails.jsp?jobid=job_201205151343_1062053
Kill Command = /dhwdata/hadoop/bin/../bin/hadoopjob -Dmapred.job.tracker=hdpjt:9001 -killjob_201205151343_1062053
Hadoop job information for Stage-1: number of mappers: 111; number of reducers: 3
2012-05-25 13:24:05,753 Stage-1 map= 26%, reduce = 0%
2012-05-25 13:24:23,957 Stage-1 map= 77%, reduce = 0%
2012-05-25 13:24:41,265 Stage-1 map= 92%, reduce = 0%
2012-05-25 13:25:00,561 Stage-1 map= 100%, reduce = 14%
2012-05-25 13:25:11,320 Stage-1 map= 100%, reduce = 23%
2012-05-25 13:25:29,255 Stage-1 map= 100%, reduce = 32%
2012-05-25 13:25:48,604 Stage-1 map= 100%, reduce = 57%
2012-05-25 13:26:07,861 Stage-1 map= 100%, reduce = 70%
2012-05-25 13:26:31,523 Stage-1 map= 100%, reduce = 73%
2012-05-25 13:26:54,908 Stage-1 map= 100%, reduce = 77%
2012-05-25 13:27:16,370 Stage-1 map= 100%, reduce = 80%
2012-05-25 13:27:31,626 Stage-1 map= 100%, reduce = 82%
2012-05-25 13:27:43,934 Stage-1 map= 100%, reduce = 84%
2012-05-25 13:27:59,381 Stage-1 map= 100%, reduce = 87%
2012-05-25 13:28:12,944 Stage-1 map= 100%, reduce = 88%
2012-05-25 13:28:32,217 Stage-1 map= 100%, reduce = 92%
2012-05-25 13:28:50,428 Stage-1 map= 100%, reduce = 94%
2012-05-25 13:29:05,792 Stage-1 map= 100%, reduce = 96%
2012-05-25 13:29:10,207 Stage-1 map= 100%, reduce = 97%
2012-05-25 13:29:16,808 Stage-1 map= 100%, reduce = 98%
2012-05-25 13:29:23,570 Stage-1 map= 100%, reduce = 99%
2012-05-25 13:29:33,150 Stage-1 map= 100%, reduce = 100%
Ended Job = job_201205151343_1062053
Launching Job 2 out of 3
Number of reduce tasks not specified.Estimated from input data size: 1
In order to change the average loadfor a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum numberof reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number ofreducers:
set mapred.reduce.tasks=<number>
Cannot run job locally: Input Size(= 927172565) is larger than hive.exec.mode.local.auto.inputbytes.max (= -1)
Warning:Can't find tianshu info:markid missing!
Starting Job = job_201205151343_1062532,Tracking URL = http://hdpjt:50030/jobdetails.jsp?jobid=job_201205151343_1062532
Kill Command = /dhwdata/hadoop/bin/../bin/hadoopjob -Dmapred.job.tracker=hdpjt:9001 -killjob_201205151343_1062532
Hadoop job information for Stage-2: number of mappers: 15; number of reducers: 1
2012-05-25 13:30:10,087 Stage-2 map= 8%, reduce = 0%
2012-05-25 13:30:24,216 Stage-2 map= 40%, reduce = 0%
2012-05-25 13:30:37,083 Stage-2 map= 63%, reduce = 0%
2012-05-25 13:30:52,854 Stage-2 map= 87%, reduce = 0%
2012-05-25 13:31:29,057 Stage-2 map= 99%, reduce = 0%
2012-05-25 13:31:38,701 Stage-2 map= 100%, reduce = 0%
2012-05-25 13:31:47,734 Stage-2 map= 100%, reduce = 4%
2012-05-25 13:31:54,239 Stage-2 map= 100%, reduce = 11%
2012-05-25 13:31:59,466 Stage-2 map= 100%, reduce = 18%
2012-05-25 13:32:04,989 Stage-2 map= 100%, reduce = 22%
2012-05-25 13:32:15,136 Stage-2 map= 100%, reduce = 29%
2012-05-25 13:32:22,236 Stage-2 map= 100%, reduce = 38%
2012-05-25 13:32:31,750 Stage-2 map= 100%, reduce = 43%
2012-05-25 13:32:41,773 Stage-2 map= 100%, reduce = 59%
2012-05-25 13:32:51,002 Stage-2 map= 100%, reduce = 67%
2012-05-25 13:33:02,976 Stage-2 map= 100%, reduce = 68%
2012-05-25 13:33:17,175 Stage-2 map= 100%, reduce = 69%
2012-05-25 13:33:34,641 Stage-2 map= 100%, reduce = 70%
2012-05-25 13:33:53,767 Stage-2 map= 100%, reduce = 71%
2012-05-25 13:34:15,878 Stage-2 map= 100%, reduce = 72%
2012-05-25 13:34:38,170 Stage-2 map= 100%, reduce = 73%
2012-05-25 13:34:49,164 Stage-2 map= 100%, reduce = 74%
2012-05-25 13:35:13,119 Stage-2 map= 100%, reduce = 75%
2012-05-25 13:35:26,087 Stage-2 map= 100%, reduce = 76%
2012-05-25 13:35:46,155 Stage-2 map= 100%, reduce = 77%
2012-05-25 13:36:04,623 Stage-2 map= 100%, reduce = 78%
2012-05-25 13:36:26,206 Stage-2 map= 100%, reduce = 79%
2012-05-25 13:36:54,454 Stage-2 map= 100%, reduce = 80%
2012-05-25 13:37:18,056 Stage-2 map= 100%, reduce = 81%
2012-05-25 13:37:37,048 Stage-2 map= 100%, reduce = 82%
2012-05-25 13:37:49,247 Stage-2 map= 100%, reduce = 83%
2012-05-25 13:38:20,496 Stage-2 map= 100%, reduce = 84%
2012-05-25 13:38:33,526 Stage-2 map= 100%, reduce = 85%
2012-05-25 13:38:58,196 Stage-2 map= 100%, reduce = 86%
2012-05-25 13:39:13,534 Stage-2 map= 100%, reduce = 87%
2012-05-25 13:39:39,946 Stage-2 map= 100%, reduce = 88%
2012-05-25 13:39:56,969 Stage-2 map= 100%, reduce = 89%
2012-05-25 13:40:39,779 Stage-2 map= 100%, reduce = 91%
2012-05-25 13:41:20,402 Stage-2 map= 100%, reduce = 92%
2012-05-25 13:41:27,308 Stage-2 map= 100%, reduce = 93%
2012-05-25 13:41:46,528 Stage-2 map= 100%, reduce = 94%
2012-05-25 13:42:14,809 Stage-2 map= 100%, reduce = 95%
2012-05-25 13:42:31,821 Stage-2 map= 100%, reduce = 96%
2012-05-25 13:42:48,402 Stage-2 map= 100%, reduce = 97%
2012-05-25 13:43:26,587 Stage-2 map= 100%, reduce = 98%
2012-05-25 13:43:49,918 Stage-2 map= 100%, reduce = 99%
2012-05-25 13:44:05,737 Stage-2 map= 100%, reduce = 100%
Ended Job = job_201205151343_1062532
Ended Job = 528296150, job is filteredout (removed at runtime).
Moving data to: hdfs://hdpnn:9000/group/alibaba-dw-icbu/tmp/hive-dwapp/hive_2012-05-25_13-21-45_780_2062757607370764899/-ext-10001
Moving data to: /group/alibaba-dw-icbu/hive/lpx_mapjoin
40843215 Rows loaded to hdfs://hdpnn:9000/group/alibaba-dw-icbu/tmp/hive-dwapp/hive_2012-05-25_13-21-45_780_2062757607370764899/-ext-10000
OK
Time taken: 1360.901 seconds
hive> create table lpx_mapjoin1as
> SELECT /*+mapjoin(a)*/
> '2012-04-17' as stat_date
> ,b.admin_member_id
> ,a.category_level2_id
> ,b.keywords
> ,sum(shownum) as sum_prod_show_cnt
> ,sum(clicknum) as sum_prod_click_cnt
> ,sysdate('yyyy-MM-dd hh:mm:ss')as dw_ins_date
> FROM (SELECT category_id
> ,category_level2_id
> FROM pub_en_category_dimt0_h
> WHERE dw_begin_date <= '2012-04-17'
> AND hp_dw_end_date > '2012-04-17'
> AND category_type = '02'
> ) a
> JOIN (SELECT admin_member_id
> ,lower(trim(regexp_replace(keyword,'[\\\\t|\\\\s]+', ' '))) as keywords
> ,coalesce(shownum,cast(0 as bigint)) as shownum
> ,coalesce(clicknum,cast(0 as bigint)) as clicknum
> ,post_category_id
> FROM idl_en_ctr_click_sdt0
> WHERE hp_stat_date = '2012-04-17'
> AND keyword IS NOT NULL
> ) b
> ON (a.category_id = b.post_category_id)
> GROUP BY b.admin_member_id
> ,a.category_level2_id
> ,b.keywords ;
Total MapReduce jobs = 2
plan = file:/tmp/dwapp/hive_2012-05-25_13-30-29_191_6555140894519831501/-local-10007/plan.xml
2012-05-25 01:30:35 Starting to launch local task to process mapjoin; maximum memory = 932118528
2012-05-25 01:30:50 Processing rows: 4739 Hashtable size: 4739 Memory usage: 3523352 rate: 0.004
2012-05-25 01:30:50 Dump the hashtable into file: file:/tmp/dwapp/hive_2012-05-25_13-30-29_191_6555140894519831501/-local-10005/HashTable-Stage-1/-0--.hashtable
2012-05-25 01:30:51 Upload 1 File to: file:/tmp/dwapp/hive_2012-05-25_13-30-29_191_6555140894519831501/-local-10005/HashTable-Stage-1/-0--.hashtable File size:227587
2012-05-25 01:30:51 End of local task; Time Taken: 16.016 sec.
Mapred Local Task Succeeded . Convertthe Join into MapJoin
Launching Job 1 out of 2
Number of reduce tasks not specified.Estimated from input data size: 3
In order to change the average loadfor a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum numberof reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number ofreducers:
set mapred.reduce.tasks=<number>
Cannot run job locally: Input Size(= 2510886581) is larger than hive.exec.mode.local.auto.inputbytes.max (= -1)
Warning:Can't find tianshu info:markid missing!
Starting Job = job_201205151343_1062638,Tracking URL = http://hdpjt:50030/jobdetails.jsp?jobid=job_201205151343_1062638
Kill Command = /dhwdata/hadoop/bin/../bin/hadoopjob -Dmapred.job.tracker=hdpjt:9001 -killjob_201205151343_1062638
Hadoop job information for Stage-1: number of mappers: 45; number of reducers: 3
2012-05-25 13:31:38,418 Stage-1 map= 0%, reduce = 0%
2012-05-25 13:32:19,988 Stage-1 map= 6%, reduce = 0%
2012-05-25 13:32:29,559 Stage-1 map= 20%, reduce = 0%
2012-05-25 13:32:40,713 Stage-1 map= 44%, reduce = 0%
2012-05-25 13:32:50,025 Stage-1 map= 51%, reduce = 0%
2012-05-25 13:32:55,730 Stage-1 map= 60%, reduce = 0%
2012-05-25 13:33:02,644 Stage-1 map= 74%, reduce = 0%
2012-05-25 13:33:08,201 Stage-1 map= 82%, reduce = 0%
2012-05-25 13:33:13,908 Stage-1 map= 88%, reduce = 0%
2012-05-25 13:33:18,679 Stage-1 map= 95%, reduce = 0%
2012-05-25 13:33:22,685 Stage-1 map= 98%, reduce = 0%
2012-05-25 13:33:24,247 Stage-1 map= 99%, reduce = 0%
2012-05-25 13:33:29,223 Stage-1 map= 100%, reduce = 0%
2012-05-25 13:33:37,668 Stage-1 map= 100%, reduce = 12%
2012-05-25 13:33:45,733 Stage-1 map= 100%, reduce = 13%
2012-05-25 13:33:51,046 Stage-1 map= 100%, reduce = 20%
2012-05-25 13:33:53,741 Stage-1 map= 100%, reduce = 21%
2012-05-25 13:33:56,426 Stage-1 map= 100%, reduce = 22%
2012-05-25 13:34:05,551 Stage-1 map= 100%, reduce = 29%
2012-05-25 13:34:11,259 Stage-1 map= 100%, reduce = 32%
2012-05-25 13:34:15,869 Stage-1 map= 100%, reduce = 46%
2012-05-25 13:34:19,307 Stage-1 map= 100%, reduce = 67%
2012-05-25 13:34:34,205 Stage-1 map= 100%, reduce = 68%
2012-05-25 13:34:38,535 Stage-1 map= 100%, reduce = 69%
2012-05-25 13:34:40,661 Stage-1 map= 100%, reduce = 70%
2012-05-25 13:34:46,251 Stage-1 map= 100%, reduce = 71%
2012-05-25 13:34:53,543 Stage-1 map= 100%, reduce = 72%
2012-05-25 13:35:01,525 Stage-1 map= 100%, reduce = 73%
2012-05-25 13:35:05,963 Stage-1 map= 100%, reduce = 74%
2012-05-25 13:35:15,245 Stage-1 map= 100%, reduce = 75%
2012-05-25 13:35:18,322 Stage-1 map= 100%, reduce = 76%
2012-05-25 13:35:26,102 Stage-1 map= 100%, reduce = 77%
2012-05-25 13:35:32,168 Stage-1 map= 100%, reduce = 78%
2012-05-25 13:35:37,227 Stage-1 map= 100%, reduce = 79%
2012-05-25 13:35:44,590 Stage-1 map= 100%, reduce = 80%
2012-05-25 13:35:51,719 Stage-1 map= 100%, reduce = 81%
2012-05-25 13:36:00,501 Stage-1 map= 100%, reduce = 82%
2012-05-25 13:36:13,369 Stage-1 map= 100%, reduce = 84%
2012-05-25 13:36:21,199 Stage-1 map= 100%, reduce = 85%
2012-05-25 13:36:39,548 Stage-1 map= 100%, reduce = 86%
2012-05-25 13:36:48,621 Stage-1 map= 100%, reduce = 88%
2012-05-25 13:36:59,733 Stage-1 map= 100%, reduce = 89%
2012-05-25 13:37:08,042 Stage-1 map= 100%, reduce = 90%
2012-05-25 13:37:15,360 Stage-1 map= 100%, reduce = 91%
2012-05-25 13:37:23,589 Stage-1 map= 100%, reduce = 92%
2012-05-25 13:37:35,594 Stage-1 map= 100%, reduce = 94%
2012-05-25 13:37:42,345 Stage-1 map= 100%, reduce = 95%
2012-05-25 13:37:48,968 Stage-1 map= 100%, reduce = 96%
2012-05-25 13:37:54,187 Stage-1 map= 100%, reduce = 97%
2012-05-25 13:37:58,186 Stage-1 map= 100%, reduce = 98%
2012-05-25 13:38:15,289 Stage-1 map= 100%, reduce = 99%
2012-05-25 13:38:24,615 Stage-1 map= 100%, reduce = 100%
Ended Job = job_201205151343_1062638
Ended Job = 1570692836, job is filteredout (removed at runtime).
Moving data to: hdfs://hdpnn:9000/group/alibaba-dw-icbu/tmp/hive-dwapp/hive_2012-05-25_13-30-29_191_6555140894519831501/-ext-10001
Moving data to: /group/alibaba-dw-icbu/hive/lpx_mapjoin1
27223002 Rows loaded to hdfs://hdpnn:9000/group/alibaba-dw-icbu/tmp/hive-dwapp/hive_2012-05-25_13-30-29_191_6555140894519831501/-ext-10000
OK
Time taken: 519.457 seconds
总结:Mapjoin会把小表全部载入内存,在map阶段直接拿另一个表的数据和内存中表数据进行匹配。由于在map中直接完成JOIN操作,省略了reduce操作,因此运行效率很高。
Join 操作在 Map 阶段完成,不再需要Reduce,前提条件是需要的数据在 Map的过程中可以访问到。