关闭hive自动开启mapjoin

Hive的mapjoin可以将小表放到内存然后进行表的关联,极大的提升了hive语句的执行效率,在Hive0.11前,必须使用MAPJOIN来标记显示地启动该优化操作,在Hive0.11后,Hive默认启动该优化,也就是不在需要显示的使用MAPJOIN标记,其会在必要的时候触发该优化操作将普通JOIN转换成MapJoin。实际使用中我遇到了如下问题

Launching Job 2 out of 5
Number of reduce tasks is set to 0 since there's no reduce operator
Selecting local mode for task: Stage-10
Job running in-process (local Hadoop)
2021-02-02 09:34:02,323 Stage-10 map = 0%,  reduce = 0%
2021-02-02 09:34:04,325 Stage-10 map = 100%,  reduce = 0%
Ended Job = job_local1553976964_0023
Stage-11 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/software_tools/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/software_tools/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2021-02-02 09:34:12	Starting to launch local task to process map join;	maximum memory = 477626368
2021-02-02 09:34:14	Processing rows:	200000	Hashtable size:	199999	Memory usage:	135134360	percentage:	0.283
2021-02-02 09:34:14	Processing rows:	300000	Hashtable size:	299999	Memory usage:	177934256	percentage:	0.373
2021-02-02 09:34:14	Dump the side-table for tag: 1 with group count: 336680 into file: file:/tmp/root/310054d9-60d9-49ca-afd8-39aa8052b6e2/hive_2021-02-02_09-33-49_434_7693544935790785889-1/-local-10005/HashTable-Stage-8/MapJoin-mapfile131--.hashtable
2021-02-02 09:34:16	Uploaded 1 File to: file:/tmp/root/310054d9-60d9-49ca-afd8-39aa8052b6e2/hive_2021-02-02_09-33-49_434_7693544935790785889-1/-local-10005/HashTable-Stage-8/MapJoin-mapfile131--.hashtable (20807331 bytes)
2021-02-02 09:34:16	End of local task; Time Taken: 3.263 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 4 out of 5
Number of reduce tasks is set to 0 since there's no reduce operator
Selecting local mode for task: Stage-8
Job running in-process (local Hadoop)
2021-02-02 09:34:19,175 Stage-8 map = 0%,  reduce = 0%
2021-02-02 09:35:20,308 Stage-8 map = 0%,  reduce = 0%
2021-02-02 09:36:21,223 Stage-8 map = 0%,  reduce = 0%
2021-02-02 09:37:21,310 Stage-8 map = 0%,  reduce = 0%
2021-02-02 09:38:22,568 Stage-8 map = 0%,  reduce = 0%
2021-02-02 09:39:23,080 Stage-8 map = 0%,  reduce = 0%
2021-02-02 09:40:24,211 Stage-8 map = 0%,  reduce = 0%
2021-02-02 09:41:25,590 Stage-8 map = 0%,  reduce = 0%
2021-02-02 09:42:25,741 Stage-8 map = 0%,  reduce = 0%
2021-02-02 09:43:25,911 Stage-8 map = 0%,  reduce = 0%
2021-02-02 09:44:26,041 Stage-8 map = 0%,  reduce = 0%
2021-02-02 09:45:26,746 Stage-8 map = 0%,  reduce = 0%
2021-02-02 09:46:28,234 Stage-8 map = 0%,  reduce = 0%

日志提示了内存不足
2021-02-02 09:34:14 Processing rows: 200000 Hashtable size: 199999 Memory usage: 135134360 percentage: 0.283
2021-02-02 09:34:14 Processing rows: 300000 Hashtable size: 299999 Memory usage: 177934256 percentage: 0.373
并且一直输出
Stage-8 map = 0%, reduce = 0%
Stage-8 map = 0%, reduce = 0%
程序卡住不动

解决方法:
关闭mapjoin
set hive.auto.convert.join=false;(关闭自动MAPJOIN转换操作)
set hive.ignore.mapjoin.hint=false;(不忽略MAPJOIN标记,默认为忽略,这句可不加)
不忽略MAPJOIN标记是针对手写的mapjon语句而言,如下
select /+MAPJOIN(smallTableTwo)/ …这种语句;

你可能感兴趣的:(hive,hive)