sql:
select '$v_date','$v_prov','ps',cell_cnt,misidn_cnt,imsi_cnt,imei_cnt,total_cnt,A.rantype from
(select /*+mapjoin(tb2)*/ 'aa' tt,count(*) cell_cnt,rantype from
snl_dwd.dwd_d_sa_basic_normal tb1
join snl_dwa.dim_all_station_cellid tb2
on tb1.month_id='$v_month' and tb1.day_id='$v_day' and tb1.prov_id='$v_prov' and tb1.sa_type='ps' and tb1.current_lac=tb2.lac and tb1.current_ci=tb2.cellid
group by rantype) A
join
(select 'aa' tt,sum(case when is_num(misidn)=true then 1 else 0 end) misidn_cnt,
sum( case when is_num(imsi)=true then 1 else 0 end) imsi_cnt ,
sum( case when is_num(imei)=true then 1 else 0 end) imei_cnt,
sum(1) total_cnt,
rantype
from snl_dwd.dwd_d_sa_basic_normal
where month_id='$v_month' and day_id='$v_day' and prov_id='$v_prov' and sa_type='ps'
group by rantype) B
on A.tt=B.tt and A.rantype=B.rantype;
这里小表是snl_dwa.dim_all_station_cellid ,大表是:snl_dwd.dwd_d_sa_basic_normal
报错信息:
16/07/06 17:35:21 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
16/07/06 17:35:21 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
16/07/06 17:35:21 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
16/07/06 17:35:21 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
16/07/06 17:35:21 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
16/07/06 17:35:21 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
16/07/06 17:35:21 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
Logging initialized using configuration in file:/opt/beh/core/hive/conf/hive-log4j.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/beh/core/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/beh/core/hive/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
OK
Time taken: 0.497 seconds
Added /data/all_signal_dev/UDF_wuy.jar to class path
Added resource: /data/all_signal_dev/UDF_wuy.jar
OK
Time taken: 0.392 seconds
Total MapReduce jobs = 9
Launching Job 1 out of 9
Number of reduce tasks not specified. Estimated from input data size: 9
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapred.reduce.tasks=
Stage-16 is selected by condition resolver.
Stage-1 is filtered out by condition resolver.
Starting Job = job_1460530329895_286504, Tracking URL = http://DSJ-signal-4T-206:23188/proxy/application_1460530329895_286504/
Kill Command = /opt/beh/core/hadoop/bin/hadoop job -kill job_1460530329895_286504
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/beh/core/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/beh/core/hive/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/07/06 17:35:28 WARN conf.Configuration: file:/tmp/all_signal_dev/hive_2016-07-06_17-35-22_865_2627282251966464053-1/-local-10016/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
16/07/06 17:35:28 WARN conf.Configuration: file:/tmp/all_signal_dev/hive_2016-07-06_17-35-22_865_2627282251966464053-1/-local-10016/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
16/07/06 17:35:28 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
16/07/06 17:35:28 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
16/07/06 17:35:28 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
16/07/06 17:35:28 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
16/07/06 17:35:28 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
16/07/06 17:35:28 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
16/07/06 17:35:28 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
Execution log at: /tmp/all_signal_dev/all_signal_dev_20160706173535_b9069bd9-dd43-4e2f-9873-c2b832d73eff.log
2016-07-06 05:35:29 Starting to launch local task to process map join; maximum memory = 514850816
Hadoop job information for Stage-6: number of mappers: 111; number of reducers: 9
2016-07-06 17:35:30,239 Stage-6 map = 0%, reduce = 0%
2016-07-06 05:35:30 Processing rows: 200000 Hashtable size: 199999 Memory usage: 154090568 percentage: 0.299
2016-07-06 05:35:30 Processing rows: 300000 Hashtable size: 299999 Memory usage: 64363592 percentage: 0.125
2016-07-06 05:35:30 Processing rows: 400000 Hashtable size: 399999 Memory usage: 81483296 percentage: 0.158
2016-07-06 05:35:30 Processing rows: 500000 Hashtable size: 499999 Memory usage: 91823584 percentage: 0.178
2016-07-06 05:35:31 Processing rows: 600000 Hashtable size: 599999 Memory usage: 104748984 percentage: 0.203
2016-07-06 05:35:31 Processing rows: 700000 Hashtable size: 699999 Memory usage: 117674344 percentage: 0.229
2016-07-06 05:35:31 Processing rows: 800000 Hashtable size: 799999 Memory usage: 136403224 percentage: 0.265
2016-07-06 05:35:31 Processing rows: 900000 Hashtable size: 899999 Memory usage: 149328584 percentage: 0.29
2016-07-06 05:35:31 Processing rows: 1000000 Hashtable size: 999999 Memory usage: 162253984 percentage: 0.315
2016-07-06 05:35:31 Processing rows: 1100000 Hashtable size: 1099999 Memory usage: 172594272 percentage: 0.335
2016-07-06 05:35:31 Processing rows: 1200000 Hashtable size: 1199999 Memory usage: 185519632 percentage: 0.36
2016-07-06 05:35:32 Processing rows: 1300000 Hashtable size: 1299999 Memory usage: 190954904 percentage: 0.371
2016-07-06 05:35:32 Processing rows: 1400000 Hashtable size: 1399999 Memory usage: 203699304 percentage: 0.396
2016-07-06 05:35:32 Processing rows: 1500000 Hashtable size: 1499999 Memory usage: 216443704 percentage: 0.42
2016-07-06 05:35:32 Processing rows: 1600000 Hashtable size: 1599999 Memory usage: 243416472 percentage: 0.473
2016-07-06 05:35:32 Processing rows: 1700000 Hashtable size: 1699999 Memory usage: 256160872 percentage: 0.498
2016-07-06 05:35:32 Processing rows: 1800000 Hashtable size: 1799999 Memory usage: 268905272 percentage: 0.522
2016-07-06 05:35:33 Processing rows: 1900000 Hashtable size: 1899999 Memory usage: 281649664 percentage: 0.547
2016-07-06 05:35:33 Processing rows: 2000000 Hashtable size: 1999999 Memory usage: 291845192 percentage: 0.567
Execution failed with exit status: 3
Obtaining error information
Task failed!
Task ID:
Stage-16
Logs:
/opt/beh/logs/hive/all_signal_dev/hive.log
FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
ATTEMPT: Execute BackupTask: org.apache.hadoop.hive.ql.exec.mr.MapRedTask
Launching Job 3 out of 9
Number of reduce tasks not specified. Estimated from input data size: 9
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapred.reduce.tasks=
Starting Job = job_1460530329895_286507, Tracking URL = http://DSJ-signal-4T-206:23188/proxy/application_1460530329895_286507/
Kill Command = /opt/beh/core/hadoop/bin/hadoop job -kill job_1460530329895_286507
Hadoop job information for Stage-1: number of mappers: 117; number of reducers: 9
2016-07-06 17:35:39,056 Stage-1 map = 0%, reduce = 0%
2016-07-06 17:35:41,011 Stage-6 map = 1%, reduce = 0%, Cumulative CPU 4.9 sec
2016-07-06 17:35:42,064 Stage-6 map = 2%, reduce = 0%, Cumulative CPU 11.78 sec
2016-07-06 17:35:43,091 Stage-6 map = 2%, reduce = 0%, Cumulative CPU 26.8 sec
2016-07-06 17:35:44,122 Stage-6 map = 4%, reduce = 0%, Cumulative CPU 86.76 sec
2016-07-06 17:35:45,152 Stage-6 map = 4%, reduce = 0%, Cumulative CPU 89.92 sec
2016-07-06 17:35:46,179 Stage-6 map = 5%, reduce = 0%, Cumulative CPU 96.0 sec
2016-07-06 17:35:47,209 Stage-6 map = 7%, reduce = 0%, Cumulative CPU 188.29 sec
2016-07-06 17:35:48,239 Stage-6 map = 7%, reduce = 0%, Cumulative CPU 262.25 sec
2016-07-06 17:35:49,269 Stage-6 map = 8%, reduce = 0%, Cumulative CPU 346.72 sec
2016-07-06 17:35:50,299 Stage-6 map = 10%, reduce = 0%, Cumulative CPU 492.01 sec
2016-07-06 17:35:51,329 Stage-6 map = 10%, reduce = 0%, Cumulative CPU 594.49 sec
2016-07-06 17:35:52,357 Stage-6 map = 11%, reduce = 0%, Cumulative CPU 724.09 sec
2016-07-06 17:35:52,515 Stage-1 map = 1%, reduce = 0%, Cumulative CPU 71.99 sec
2016-07-06 17:35:53,386 Stage-6 map = 17%, reduce = 0%, Cumulative CPU 898.56 sec
2016-07-06 17:35:53,541 Stage-1 map = 2%, reduce = 0%, Cumulative CPU 127.15 sec
解决方法:设置set hive.auto.convert.join=false;
默认情况下,hive是自动把左表当做小表加载到内存里,这里设置/*+mapjoin(tb2)*/,是想强制把tb2表当做小表放到内存里,但是在这里看起来不管用 。
Execution failed with exit status: 3
FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
Hive converted a join into a locally running and faster 'mapjoin', but ran out of memory while doing so. There are two bugs responsible for this.
Bug 1)
hives metric for converting joins miscalculated the required amount of memory. This is especially true for compressed files and ORC files, as hive uses the filesize as metric, but compressed tables require more memory in their uncompressed 'in memory representation'.
You could simply decrease 'hive.smalltable.filesize' to tune the metric, or increase 'hive.mapred.local.mem' to allow the allocation of more memory for map tasks.
The later option may lead to bug number two if you happen to have a affected hadoop version.
Bug 2)
Hive/Hadoop ignores 'hive.mapred.local.mem' ! (more exactly: bug in Hadoop 2.2 where hadoop-env.cmd sets the -xmx parameter multiple times, effectively overriding the user set hive.mapred.local.mem setting. see: https://issues.apache.org/jira/browse/HADOOP-10245
There are 3 workarounds for this bug:
2) & 3) can be set in Big-Bench/engines/hive/conf/hiveSettings.sql