场景三.合理进行并行控制
合理的使用并行化参数控制
并行化参数有以下几个:
开启任务并行执行
set hive.exec.parallel=true;
允许并行任务的最大线程数
set hive.exec.parallel.thread.number=8;
这个只是开启并行化建议,具体最终能不能并行,还是依赖于之间有无依赖关系。
有依赖关系,既使开启了优化参数也是不行的。
如何查看每个阶段间是否有依赖关系,我们可以通过 EXPLAIN 去查看
起作用的情况 ,针对于以下情况(但不限于)
1) UNION ALL 语句 :
因为 UNION ALL 的子句之间是没有依赖关系,所以开启并行参数,可以让这些STAGE 并行工作。
例如:如下SQL
SELECT
'a' AS type
,COUNT(1) AS num
FROM datacube_salary_basic_aggr AS a
UNION ALL
SELECT
'b' AS type
,COUNT(1) AS num
FROM datacube_salary_company_aggr AS b
UNION ALL
SELECT
'c' AS type
,COUNT(1) AS num
FROM datacube_salary_dep_aggr AS c
UNION ALL
SELECT
'd' AS type
,COUNT(1) AS num
FROM datacube_salary_total_aggr AS d
;
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-2 depends on stages: Stage-1, Stage-3, Stage-4, Stage-5
Stage-3 is a root stage
Stage-4 is a root stage
Stage-5 is a root stage
Stage-0 depends on stages: Stage-2
可以看到 stage1, stage3, stage4, stage5 之间并无依赖。它们之间是并行的。
2) JOIN 语句:
当多张表 JOIN 是相同条件的时候,会转化为一个STAGE
SELECT
a.user_id
,a.mobile
,a.sex
,b.user_name
,c.type
FROM join_multi_a AS a
LEFT JOIN join_multi_b AS b
ON a.user_id = b.user_id
LEFT JOIN join_multi_c AS c
ON a.user_id = c.user_id
;
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: a
Statistics: Num rows: 3 Data size: 30 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: user_id (type: bigint)
sort order: +
Map-reduce partition columns: user_id (type: bigint)
Statistics: Num rows: 3 Data size: 30 Basic stats: COMPLETE Column stats: NONE
value expressions: mobile (type: string), sex (type: bigint)
TableScan
alias: b
Statistics: Num rows: 3 Data size: 42 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: user_id (type: bigint)
sort order: +
Map-reduce partition columns: user_id (type: bigint)
Statistics: Num rows: 3 Data size: 42 Basic stats: COMPLETE Column stats: NONE
value expressions: user_name (type: string)
TableScan
alias: c
Statistics: Num rows: 3 Data size: 30 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: user_id (type: bigint)
sort order: +
Map-reduce partition columns: user_id (type: bigint)
Statistics: Num rows: 3 Data size: 30 Basic stats: COMPLETE Column stats: NONE
value expressions: type (type: string)
Reduce Operator Tree:
Join Operator
condition map:
Left Outer Join0 to 1
Left Outer Join0 to 2
keys:
0 user_id (type: bigint)
1 user_id (type: bigint)
2 user_id (type: bigint)
outputColumnNames: _col0, _col1, _col2, _col8, _col14
Statistics: Num rows: 6 Data size: 66 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: bigint), _col1 (type: string), _col2 (type: bigint), _col8 (type: string), _col14 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4
Statistics: Num rows: 6 Data size: 66 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 6 Data size: 66 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
筛选下重要的部分:
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
Reduce Operator Tree:
Join Operator
condition map:
Left Outer Join0 to 1
Left Outer Join0 to 2
keys:
0 user_id (type: bigint)
1 user_id (type: bigint)
2 user_id (type: bigint)
————————————————
本文声明:本文为CSDN博主「高达一号」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/u010003835/article/details/105494048