源码介绍
/**
* Implements predicate pushdown. Predicate pushdown is a term borrowed from
* relational databases even though for Hive it is predicate pushup. The basic
* idea is to process expressions as early in the plan as possible. The default
* plan generation adds filters where they are seen but in some instances some
* of the filter expressions can be pushed nearer to the operator that sees this
* particular data for the first time. e.g. select a.*, b.* from a join b on
* (a.col1 = b.col1) where a.col1 > 20 and b.col2 > 40
*
* For the above query, the predicates (a.col1 > 20) and (b.col2 > 40), without
* predicate pushdown, would be evaluated after the join processing has been
* done. Suppose the two predicates filter out most of the rows from a and b,
* the join is unnecessarily processing these rows. With predicate pushdown,
* these two predicates will be processed before the join.
*
* Predicate pushdown is enabled by setting hive.optimize.ppd to true. It is
* disable by default.
*
* The high-level algorithm is describe here - An operator is processed after
* all its children have been processed - An operator processes its own
* predicates and then merges (conjunction) with the processed predicates of its
* children. In case of multiple children, there are combined using disjunction
* (OR). - A predicate expression is processed for an operator using the
* following steps - If the expr is a constant then it is a candidate for
* predicate pushdown - If the expr is a col reference then it is a candidate
* and its alias is noted - If the expr is an index and both the array and index
* expr are treated as children - If the all child expr are candidates for
* pushdown and all of the expression reference only one alias from the
* operator's RowResolver then the current expression is also a candidate One
* key thing to note is that some operators (Select, ReduceSink, GroupBy, Join
* etc) change the columns as data flows through them. In such cases the column
* references are replaced by the corresponding expression in the input data.
*/
开启命令
默认开启
hive> set hive.optimize.ppd=true;
hive> set hive.optimize.ppd=false;
准备表以及数据
create table user_info(
user_id int,
dept_id int,
username string
)
row format delimited fields terminated by ',';
create table dept_info(
dept_id int,
deptname string
)
row format delimited fields terminated by ',';
INSERT INTO `dept_info` VALUES ('1', '技术部');
INSERT INTO `dept_info` VALUES ('2', '业务部');
INSERT INTO `dept_info` VALUES ('3', '运营部');
INSERT INTO `dept_info` VALUES ('4', '决策层');
INSERT INTO `user_info` VALUES ('1', '1', '张三');
INSERT INTO `user_info` VALUES ('2', '1', '李四');
INSERT INTO `user_info` VALUES ('3', '2', '王五');
INSERT INTO `user_info` VALUES ('4', '3', '赵六');
join 测试
- set hive.optimize.ppd=false;
hive> explain select * from user_info u join dept_info d on u.dept_id = d.dept_id where u.dept_id=1;
OK
STAGE DEPENDENCIES:
Stage-4 is a root stage
Stage-3 depends on stages: Stage-4
Stage-0 depends on stages: Stage-3
STAGE PLANS:
Stage: Stage-4
TableScan
alias: u
Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: dept_id is not null (type: boolean)
Stage: Stage-3
TableScan
alias: d
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: dept_id is not null (type: boolean)
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Filter Operator
predicate: (_col1 = 1) (type: boolean)
Local Work:
Map Reduce Local Work
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
Time taken: 0.403 seconds, Fetched: 65 row(s)
hive> explain select * from user_info u join dept_info d on u.dept_id = d.dept_id and u.dept_id=1;
OK
STAGE DEPENDENCIES:
Stage-4 is a root stage
Stage-3 depends on stages: Stage-4
Stage-0 depends on stages: Stage-3
STAGE PLANS:
Stage: Stage-4
Map Reduce Local Work
Alias -> Map Local Tables:
u
Fetch Operator
limit: -1
Alias -> Map Local Operator Tree:
u
TableScan
alias: u
Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (dept_id = 1) (type: boolean)
Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
HashTable Sink Operator
keys:
0 1 (type: int)
1 dept_id (type: int)
Stage: Stage-3
Map Reduce
Map Operator Tree:
TableScan
alias: d
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: dept_id is not null (type: boolean)
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 1 (type: int)
1 dept_id (type: int)
outputColumnNames: _col0, _col2, _col6, _col7
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: int), 1 (type: int), _col2 (type: string), _col6 (type: int), _col7 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Local Work:
Map Reduce Local Work
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
Time taken: 0.472 seconds, Fetched: 62 row(s)
hive> explain select * from user_info u join dept_info d on u.dept_id = d.dept_id and d.dept_id=1;
OK
STAGE DEPENDENCIES:
Stage-4 is a root stage
Stage-3 depends on stages: Stage-4
Stage-0 depends on stages: Stage-3
STAGE PLANS:
Stage: Stage-4
Map Reduce Local Work
Alias -> Map Local Tables:
u
Fetch Operator
limit: -1
Alias -> Map Local Operator Tree:
u
TableScan
alias: u
Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: dept_id is not null (type: boolean)
Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
HashTable Sink Operator
keys:
0 dept_id (type: int)
1 1 (type: int)
Stage: Stage-3
Map Reduce
Map Operator Tree:
TableScan
alias: d
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (dept_id = 1) (type: boolean)
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 dept_id (type: int)
1 1 (type: int)
outputColumnNames: _col0, _col1, _col2, _col7
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: int), _col1 (type: int), _col2 (type: string), 1 (type: int), _col7 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Local Work:
Map Reduce Local Work
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
Time taken: 1.215 seconds, Fetched: 62 row(s)
hive> explain select * from user_info u join dept_info d on u.dept_id = d.dept_id where d.dept_id=1;
OK
STAGE DEPENDENCIES:
Stage-4 is a root stage
Stage-3 depends on stages: Stage-4
Stage-0 depends on stages: Stage-3
STAGE PLANS:
Stage: Stage-4
Map Reduce Local Work
Alias -> Map Local Tables:
u
Fetch Operator
limit: -1
Alias -> Map Local Operator Tree:
u
TableScan
alias: u
Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: dept_id is not null (type: boolean)
Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
HashTable Sink Operator
keys:
0 dept_id (type: int)
1 dept_id (type: int)
Stage: Stage-3
Map Reduce
Map Operator Tree:
TableScan
alias: d
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: dept_id is not null (type: boolean)
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 dept_id (type: int)
1 dept_id (type: int)
outputColumnNames: _col0, _col1, _col2, _col6, _col7
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (_col6 = 1) (type: boolean)
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: int), _col1 (type: int), _col2 (type: string), 1 (type: int), _col7 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Local Work:
Map Reduce Local Work
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
Time taken: 0.431 seconds, Fetched: 65 row(s)
- set hive.optimize.ppd=false;
hive> explain select * from user_info u join dept_info d on u.dept_id = d.dept_id where u.dept_id=1;
OK
STAGE DEPENDENCIES:
Stage-4 is a root stage
Stage-3 depends on stages: Stage-4
Stage-0 depends on stages: Stage-3
STAGE PLANS:
Stage: Stage-4
Map Reduce Local Work
Alias -> Map Local Tables:
u
Fetch Operator
limit: -1
Alias -> Map Local Operator Tree:
u
TableScan
alias: u
Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (dept_id = 1) (type: boolean)
Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
HashTable Sink Operator
keys:
0 1 (type: int)
1 1 (type: int)
Stage: Stage-3
Map Reduce
Map Operator Tree:
TableScan
alias: d
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (dept_id = 1) (type: boolean)
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 1 (type: int)
1 1 (type: int)
outputColumnNames: _col0, _col2, _col7
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: int), 1 (type: int), _col2 (type: string), 1 (type: int), _col7 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Local Work:
Map Reduce Local Work
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
Time taken: 0.428 seconds, Fetched: 62 row(s)
hive> explain select * from user_info u join dept_info d on u.dept_id = d.dept_id and u.dept_id=1;
OK
STAGE DEPENDENCIES:
Stage-4 is a root stage
Stage-3 depends on stages: Stage-4
Stage-0 depends on stages: Stage-3
STAGE PLANS:
Stage: Stage-4
Map Reduce Local Work
Alias -> Map Local Tables:
u
Fetch Operator
limit: -1
Alias -> Map Local Operator Tree:
u
TableScan
alias: u
Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (dept_id = 1) (type: boolean)
Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
HashTable Sink Operator
keys:
0 1 (type: int)
1 1 (type: int)
Stage: Stage-3
Map Reduce
Map Operator Tree:
TableScan
alias: d
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (dept_id = 1) (type: boolean)
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 1 (type: int)
1 1 (type: int)
outputColumnNames: _col0, _col2, _col7
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: int), 1 (type: int), _col2 (type: string), 1 (type: int), _col7 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Local Work:
Map Reduce Local Work
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
Time taken: 0.423 seconds, Fetched: 62 row(s)
hive> explain select * from user_info u join dept_info d on u.dept_id = d.dept_id and d.dept_id=1;
OK
STAGE DEPENDENCIES:
Stage-4 is a root stage
Stage-3 depends on stages: Stage-4
Stage-0 depends on stages: Stage-3
STAGE PLANS:
Stage: Stage-4
Map Reduce Local Work
Alias -> Map Local Tables:
u
Fetch Operator
limit: -1
Alias -> Map Local Operator Tree:
u
TableScan
alias: u
Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (dept_id = 1) (type: boolean)
Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
HashTable Sink Operator
keys:
0 1 (type: int)
1 1 (type: int)
Stage: Stage-3
Map Reduce
Map Operator Tree:
TableScan
alias: d
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (dept_id = 1) (type: boolean)
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 1 (type: int)
1 1 (type: int)
outputColumnNames: _col0, _col2, _col7
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: int), 1 (type: int), _col2 (type: string), 1 (type: int), _col7 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Local Work:
Map Reduce Local Work
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
Time taken: 0.411 seconds, Fetched: 62 row(s)
hive> explain select * from user_info u join dept_info d on u.dept_id = d.dept_id where d.dept_id=1;
OK
STAGE DEPENDENCIES:
Stage-4 is a root stage
Stage-3 depends on stages: Stage-4
Stage-0 depends on stages: Stage-3
STAGE PLANS:
Stage: Stage-4
Map Reduce Local Work
Alias -> Map Local Tables:
u
Fetch Operator
limit: -1
Alias -> Map Local Operator Tree:
u
TableScan
alias: u
Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (dept_id = 1) (type: boolean)
Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
HashTable Sink Operator
keys:
0 1 (type: int)
1 1 (type: int)
Stage: Stage-3
Map Reduce
Map Operator Tree:
TableScan
alias: d
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (dept_id = 1) (type: boolean)
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 1 (type: int)
1 1 (type: int)
outputColumnNames: _col0, _col2, _col7
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: int), 1 (type: int), _col2 (type: string), 1 (type: int), _col7 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Local Work:
Map Reduce Local Work
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
Time taken: 0.477 seconds, Fetched: 62 row(s)
left outer join 测试
- set hive.optimize.ppd=true;
hive> explain select * from user_info u left outer join dept_info d on u.dept_id = d.dept_id where d.dept_id=1;
OK
STAGE DEPENDENCIES:
Stage-4 is a root stage
Stage-3 depends on stages: Stage-4
Stage-0 depends on stages: Stage-3
STAGE PLANS:
Stage: Stage-4
Map Reduce Local Work
Alias -> Map Local Tables:
d
Fetch Operator
limit: -1
Alias -> Map Local Operator Tree:
d
TableScan
alias: d
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
HashTable Sink Operator
keys:
0 dept_id (type: int)
1 dept_id (type: int)
Stage: Stage-3
Map Reduce
Map Operator Tree:
TableScan
alias: u
Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Left Outer Join0 to 1
keys:
0 dept_id (type: int)
1 dept_id (type: int)
outputColumnNames: _col0, _col1, _col2, _col6, _col7
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (_col6 = 1) (type: boolean)
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: int), _col1 (type: int), _col2 (type: string), 1 (type: int), _col7 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Local Work:
Map Reduce Local Work
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
Time taken: 0.507 seconds, Fetched: 59 row(s)
hive> explain select * from user_info u left outer join dept_info d on u.dept_id = d.dept_id and d.dept_id=1;
OK
STAGE DEPENDENCIES:
Stage-4 is a root stage
Stage-3 depends on stages: Stage-4
Stage-0 depends on stages: Stage-3
STAGE PLANS:
Stage: Stage-4
Map Reduce Local Work
Alias -> Map Local Tables:
d
Fetch Operator
limit: -1
Alias -> Map Local Operator Tree:
d
TableScan
alias: d
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (dept_id = 1) (type: boolean)
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
HashTable Sink Operator
keys:
0 dept_id (type: int)
1 dept_id (type: int)
Stage: Stage-3
Map Reduce
Map Operator Tree:
TableScan
alias: u
Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Left Outer Join0 to 1
keys:
0 dept_id (type: int)
1 dept_id (type: int)
outputColumnNames: _col0, _col1, _col2, _col6, _col7
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: int), _col1 (type: int), _col2 (type: string), _col6 (type: int), _col7 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Local Work:
Map Reduce Local Work
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
Time taken: 0.873 seconds, Fetched: 59 row(s)
hive> explain select * from user_info u left outer join dept_info d on u.dept_id = d.dept_id where u.dept_id=1;
OK
STAGE DEPENDENCIES:
Stage-4 is a root stage
Stage-3 depends on stages: Stage-4
Stage-0 depends on stages: Stage-3
STAGE PLANS:
Stage: Stage-4
Map Reduce Local Work
Alias -> Map Local Tables:
d
Fetch Operator
limit: -1
Alias -> Map Local Operator Tree:
d
TableScan
alias: d
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (dept_id = 1) (type: boolean)
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
HashTable Sink Operator
keys:
0 dept_id (type: int)
1 dept_id (type: int)
Stage: Stage-3
Map Reduce
Map Operator Tree:
TableScan
alias: u
Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (dept_id = 1) (type: boolean)
Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Left Outer Join0 to 1
keys:
0 dept_id (type: int)
1 dept_id (type: int)
outputColumnNames: _col0, _col2, _col6, _col7
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: int), 1 (type: int), _col2 (type: string), _col6 (type: int), _col7 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Local Work:
Map Reduce Local Work
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
Time taken: 0.41 seconds, Fetched: 62 row(s)
- set hive.optimize.ppd=false;
hive> explain select * from user_info u left outer join dept_info d on u.dept_id = d.dept_id where u.dept_id=1;
OK
STAGE DEPENDENCIES:
Stage-4 is a root stage
Stage-3 depends on stages: Stage-4
Stage-0 depends on stages: Stage-3
STAGE PLANS:
Stage: Stage-4
Map Reduce Local Work
Alias -> Map Local Tables:
d
Fetch Operator
limit: -1
Alias -> Map Local Operator Tree:
d
TableScan
alias: d
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
HashTable Sink Operator
keys:
0 dept_id (type: int)
1 dept_id (type: int)
Stage: Stage-3
Map Reduce
Map Operator Tree:
TableScan
alias: u
Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Left Outer Join0 to 1
keys:
0 dept_id (type: int)
1 dept_id (type: int)
outputColumnNames: _col0, _col1, _col2, _col6, _col7
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (_col1 = 1) (type: boolean)
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: int), 1 (type: int), _col2 (type: string), _col6 (type: int), _col7 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Local Work:
Map Reduce Local Work
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
Time taken: 0.416 seconds, Fetched: 59 row(s)
hive> explain select * from user_info u left outer join dept_info d on u.dept_id = d.dept_id where d.dept_id=1;
OK
STAGE DEPENDENCIES:
Stage-4 is a root stage
Stage-3 depends on stages: Stage-4
Stage-0 depends on stages: Stage-3
STAGE PLANS:
Stage: Stage-4
Map Reduce Local Work
Alias -> Map Local Tables:
d
Fetch Operator
limit: -1
Alias -> Map Local Operator Tree:
d
TableScan
alias: d
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
HashTable Sink Operator
keys:
0 dept_id (type: int)
1 dept_id (type: int)
Stage: Stage-3
Map Reduce
Map Operator Tree:
TableScan
alias: u
Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Left Outer Join0 to 1
keys:
0 dept_id (type: int)
1 dept_id (type: int)
outputColumnNames: _col0, _col1, _col2, _col6, _col7
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (_col6 = 1) (type: boolean)
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: int), _col1 (type: int), _col2 (type: string), 1 (type: int), _col7 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Local Work:
Map Reduce Local Work
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
Time taken: 0.4 seconds, Fetched: 59 row(s)
hive> explain select * from user_info u left outer join dept_info d on u.dept_id = d.dept_id and d.dept_id=1;
OK
STAGE DEPENDENCIES:
Stage-4 is a root stage
Stage-3 depends on stages: Stage-4
Stage-0 depends on stages: Stage-3
STAGE PLANS:
Stage: Stage-4
Map Reduce Local Work
Alias -> Map Local Tables:
d
Fetch Operator
limit: -1
Alias -> Map Local Operator Tree:
d
TableScan
alias: d
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (dept_id = 1) (type: boolean)
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
HashTable Sink Operator
keys:
0 dept_id (type: int)
1 dept_id (type: int)
Stage: Stage-3
Map Reduce
Map Operator Tree:
TableScan
alias: u
Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Left Outer Join0 to 1
keys:
0 dept_id (type: int)
1 dept_id (type: int)
outputColumnNames: _col0, _col1, _col2, _col6, _col7
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: int), _col1 (type: int), _col2 (type: string), _col6 (type: int), _col7 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Local Work:
Map Reduce Local Work
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
Time taken: 0.459 seconds, Fetched: 59 row(s)
hive> explain select * from user_info u left outer join dept_info d on u.dept_id = d.dept_id and u.dept_id=1;
OK
STAGE DEPENDENCIES:
Stage-4 is a root stage
Stage-3 depends on stages: Stage-4
Stage-0 depends on stages: Stage-3
STAGE PLANS:
Stage: Stage-4
Map Reduce Local Work
Alias -> Map Local Tables:
d
Fetch Operator
limit: -1
Alias -> Map Local Operator Tree:
d
TableScan
alias: d
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
HashTable Sink Operator
filter predicates:
0 {(dept_id = 1)}
1
keys:
0 dept_id (type: int)
1 dept_id (type: int)
Stage: Stage-3
Map Reduce
Map Operator Tree:
TableScan
alias: u
Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Left Outer Join0 to 1
filter predicates:
0 {(dept_id = 1)}
1
keys:
0 dept_id (type: int)
1 dept_id (type: int)
outputColumnNames: _col0, _col1, _col2, _col6, _col7
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: int), _col1 (type: int), _col2 (type: string), _col6 (type: int), _col7 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Local Work:
Map Reduce Local Work
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
Time taken: 0.425 seconds, Fetched: 62 row(s)