Hive PredicatePushDown (谓词下推)

源码介绍
/**
 * Implements predicate pushdown. Predicate pushdown is a term borrowed from
 * relational databases even though for Hive it is predicate pushup. The basic
 * idea is to process expressions as early in the plan as possible. The default
 * plan generation adds filters where they are seen but in some instances some
 * of the filter expressions can be pushed nearer to the operator that sees this
 * particular data for the first time. e.g. select a.*, b.* from a join b on
 * (a.col1 = b.col1) where a.col1 > 20 and b.col2 > 40
 *
 * For the above query, the predicates (a.col1 > 20) and (b.col2 > 40), without
 * predicate pushdown, would be evaluated after the join processing has been
 * done. Suppose the two predicates filter out most of the rows from a and b,
 * the join is unnecessarily processing these rows. With predicate pushdown,
 * these two predicates will be processed before the join.
 *
 * Predicate pushdown is enabled by setting hive.optimize.ppd to true. It is
 * disable by default.
 *
 * The high-level algorithm is describe here - An operator is processed after
 * all its children have been processed - An operator processes its own
 * predicates and then merges (conjunction) with the processed predicates of its
 * children. In case of multiple children, there are combined using disjunction
 * (OR). - A predicate expression is processed for an operator using the
 * following steps - If the expr is a constant then it is a candidate for
 * predicate pushdown - If the expr is a col reference then it is a candidate
 * and its alias is noted - If the expr is an index and both the array and index
 * expr are treated as children - If the all child expr are candidates for
 * pushdown and all of the expression reference only one alias from the
 * operator's RowResolver then the current expression is also a candidate One
 * key thing to note is that some operators (Select, ReduceSink, GroupBy, Join
 * etc) change the columns as data flows through them. In such cases the column
 * references are replaced by the corresponding expression in the input data.
 */
开启命令
默认开启
hive> set hive.optimize.ppd=true;
hive> set hive.optimize.ppd=false;
准备表以及数据
create table user_info(
user_id int,
dept_id int,
username string
) 
row format delimited fields terminated by ',';

create table dept_info(
dept_id int,
deptname string
) 
row format delimited fields terminated by ',';

INSERT INTO `dept_info` VALUES ('1', '技术部');
INSERT INTO `dept_info` VALUES ('2', '业务部');
INSERT INTO `dept_info` VALUES ('3', '运营部');
INSERT INTO `dept_info` VALUES ('4', '决策层');

INSERT INTO `user_info` VALUES ('1', '1', '张三');
INSERT INTO `user_info` VALUES ('2', '1', '李四');
INSERT INTO `user_info` VALUES ('3', '2', '王五');
INSERT INTO `user_info` VALUES ('4', '3', '赵六');

join 测试
  • set hive.optimize.ppd=false;
hive> explain select * from user_info u join dept_info d on u.dept_id = d.dept_id where u.dept_id=1;
OK
STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-3

STAGE PLANS:
  Stage: Stage-4
          TableScan
            alias: u
            Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: dept_id is not null (type: boolean)
           
  Stage: Stage-3
          TableScan
            alias: d
            Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: dept_id is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
              Map Join Operator
                condition map:
              
                Filter Operator
                  predicate: (_col1 = 1) (type: boolean)
                
      Local Work:
        Map Reduce Local Work

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

Time taken: 0.403 seconds, Fetched: 65 row(s)

hive> explain select * from user_info u join dept_info d on u.dept_id = d.dept_id and u.dept_id=1;
OK
STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-3

STAGE PLANS:
  Stage: Stage-4
    Map Reduce Local Work
      Alias -> Map Local Tables:
        u 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        u 
          TableScan
            alias: u
            Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (dept_id = 1) (type: boolean)
              Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
              HashTable Sink Operator
                keys:
                  0 1 (type: int)
                  1 dept_id (type: int)

  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: d
            Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: dept_id is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
              Map Join Operator
                condition map:
                     Inner Join 0 to 1
                keys:
                  0 1 (type: int)
                  1 dept_id (type: int)
                outputColumnNames: _col0, _col2, _col6, _col7
                Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
                Select Operator
                  expressions: _col0 (type: int), 1 (type: int), _col2 (type: string), _col6 (type: int), _col7 (type: string)
                  outputColumnNames: _col0, _col1, _col2, _col3, _col4
                  Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
                    table:
                        input format: org.apache.hadoop.mapred.TextInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

Time taken: 0.472 seconds, Fetched: 62 row(s)

hive> explain select * from user_info u join dept_info d on u.dept_id = d.dept_id and d.dept_id=1;
OK
STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-3

STAGE PLANS:
  Stage: Stage-4
    Map Reduce Local Work
      Alias -> Map Local Tables:
        u 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        u 
          TableScan
            alias: u
            Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: dept_id is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
              HashTable Sink Operator
                keys:
                  0 dept_id (type: int)
                  1 1 (type: int)

  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: d
            Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (dept_id = 1) (type: boolean)
              Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
              Map Join Operator
                condition map:
                     Inner Join 0 to 1
                keys:
                  0 dept_id (type: int)
                  1 1 (type: int)
                outputColumnNames: _col0, _col1, _col2, _col7
                Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
                Select Operator
                  expressions: _col0 (type: int), _col1 (type: int), _col2 (type: string), 1 (type: int), _col7 (type: string)
                  outputColumnNames: _col0, _col1, _col2, _col3, _col4
                  Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
                    table:
                        input format: org.apache.hadoop.mapred.TextInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

Time taken: 1.215 seconds, Fetched: 62 row(s)

hive> explain select * from user_info u join dept_info d on u.dept_id = d.dept_id where d.dept_id=1;
OK
STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-3

STAGE PLANS:
  Stage: Stage-4
    Map Reduce Local Work
      Alias -> Map Local Tables:
        u 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        u 
          TableScan
            alias: u
            Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: dept_id is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
              HashTable Sink Operator
                keys:
                  0 dept_id (type: int)
                  1 dept_id (type: int)

  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: d
            Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: dept_id is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
              Map Join Operator
                condition map:
                     Inner Join 0 to 1
                keys:
                  0 dept_id (type: int)
                  1 dept_id (type: int)
                outputColumnNames: _col0, _col1, _col2, _col6, _col7
                Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
                Filter Operator
                  predicate: (_col6 = 1) (type: boolean)
                  Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
                  Select Operator
                    expressions: _col0 (type: int), _col1 (type: int), _col2 (type: string), 1 (type: int), _col7 (type: string)
                    outputColumnNames: _col0, _col1, _col2, _col3, _col4
                    Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
                    File Output Operator
                      compressed: false
                      Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
                      table:
                          input format: org.apache.hadoop.mapred.TextInputFormat
                          output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                          serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

Time taken: 0.431 seconds, Fetched: 65 row(s)

  • set hive.optimize.ppd=false;
hive> explain select * from user_info u join dept_info d on u.dept_id = d.dept_id where u.dept_id=1;
OK
STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-3

STAGE PLANS:
  Stage: Stage-4
    Map Reduce Local Work
      Alias -> Map Local Tables:
        u 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        u 
          TableScan
            alias: u
            Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (dept_id = 1) (type: boolean)
              Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
              HashTable Sink Operator
                keys:
                  0 1 (type: int)
                  1 1 (type: int)

  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: d
            Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (dept_id = 1) (type: boolean)
              Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
              Map Join Operator
                condition map:
                     Inner Join 0 to 1
                keys:
                  0 1 (type: int)
                  1 1 (type: int)
                outputColumnNames: _col0, _col2, _col7
                Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
                Select Operator
                  expressions: _col0 (type: int), 1 (type: int), _col2 (type: string), 1 (type: int), _col7 (type: string)
                  outputColumnNames: _col0, _col1, _col2, _col3, _col4
                  Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
                    table:
                        input format: org.apache.hadoop.mapred.TextInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

Time taken: 0.428 seconds, Fetched: 62 row(s)

hive> explain select * from user_info u join dept_info d on u.dept_id = d.dept_id and u.dept_id=1;
OK
STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-3

STAGE PLANS:
  Stage: Stage-4
    Map Reduce Local Work
      Alias -> Map Local Tables:
        u 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        u 
          TableScan
            alias: u
            Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (dept_id = 1) (type: boolean)
              Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
              HashTable Sink Operator
                keys:
                  0 1 (type: int)
                  1 1 (type: int)

  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: d
            Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (dept_id = 1) (type: boolean)
              Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
              Map Join Operator
                condition map:
                     Inner Join 0 to 1
                keys:
                  0 1 (type: int)
                  1 1 (type: int)
                outputColumnNames: _col0, _col2, _col7
                Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
                Select Operator
                  expressions: _col0 (type: int), 1 (type: int), _col2 (type: string), 1 (type: int), _col7 (type: string)
                  outputColumnNames: _col0, _col1, _col2, _col3, _col4
                  Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
                    table:
                        input format: org.apache.hadoop.mapred.TextInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

Time taken: 0.423 seconds, Fetched: 62 row(s)
hive> explain select * from user_info u join dept_info d on u.dept_id = d.dept_id and d.dept_id=1;
OK
STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-3

STAGE PLANS:
  Stage: Stage-4
    Map Reduce Local Work
      Alias -> Map Local Tables:
        u 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        u 
          TableScan
            alias: u
            Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (dept_id = 1) (type: boolean)
              Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
              HashTable Sink Operator
                keys:
                  0 1 (type: int)
                  1 1 (type: int)

  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: d
            Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (dept_id = 1) (type: boolean)
              Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
              Map Join Operator
                condition map:
                     Inner Join 0 to 1
                keys:
                  0 1 (type: int)
                  1 1 (type: int)
                outputColumnNames: _col0, _col2, _col7
                Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
                Select Operator
                  expressions: _col0 (type: int), 1 (type: int), _col2 (type: string), 1 (type: int), _col7 (type: string)
                  outputColumnNames: _col0, _col1, _col2, _col3, _col4
                  Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
                    table:
                        input format: org.apache.hadoop.mapred.TextInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

Time taken: 0.411 seconds, Fetched: 62 row(s)

hive> explain select * from user_info u join dept_info d on u.dept_id = d.dept_id where d.dept_id=1;
OK
STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-3

STAGE PLANS:
  Stage: Stage-4
    Map Reduce Local Work
      Alias -> Map Local Tables:
        u 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        u 
          TableScan
            alias: u
            Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (dept_id = 1) (type: boolean)
              Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
              HashTable Sink Operator
                keys:
                  0 1 (type: int)
                  1 1 (type: int)

  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: d
            Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (dept_id = 1) (type: boolean)
              Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
              Map Join Operator
                condition map:
                     Inner Join 0 to 1
                keys:
                  0 1 (type: int)
                  1 1 (type: int)
                outputColumnNames: _col0, _col2, _col7
                Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
                Select Operator
                  expressions: _col0 (type: int), 1 (type: int), _col2 (type: string), 1 (type: int), _col7 (type: string)
                  outputColumnNames: _col0, _col1, _col2, _col3, _col4
                  Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
                    table:
                        input format: org.apache.hadoop.mapred.TextInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

Time taken: 0.477 seconds, Fetched: 62 row(s)


left outer join 测试
  • set hive.optimize.ppd=true;
hive> explain select * from user_info u left outer join dept_info d on u.dept_id = d.dept_id where d.dept_id=1;
OK
STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-3

STAGE PLANS:
  Stage: Stage-4
    Map Reduce Local Work
      Alias -> Map Local Tables:
        d 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        d 
          TableScan
            alias: d
            Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
            HashTable Sink Operator
              keys:
                0 dept_id (type: int)
                1 dept_id (type: int)

  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: u
            Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
            Map Join Operator
              condition map:
                   Left Outer Join0 to 1
              keys:
                0 dept_id (type: int)
                1 dept_id (type: int)
              outputColumnNames: _col0, _col1, _col2, _col6, _col7
              Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
              Filter Operator
                predicate: (_col6 = 1) (type: boolean)
                Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
                Select Operator
                  expressions: _col0 (type: int), _col1 (type: int), _col2 (type: string), 1 (type: int), _col7 (type: string)
                  outputColumnNames: _col0, _col1, _col2, _col3, _col4
                  Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
                    table:
                        input format: org.apache.hadoop.mapred.TextInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

Time taken: 0.507 seconds, Fetched: 59 row(s)
hive> explain select * from user_info u left outer join dept_info d on u.dept_id = d.dept_id and d.dept_id=1;
OK
STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-3

STAGE PLANS:
  Stage: Stage-4
    Map Reduce Local Work
      Alias -> Map Local Tables:
        d 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        d 
          TableScan
            alias: d
            Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (dept_id = 1) (type: boolean)
              Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
              HashTable Sink Operator
                keys:
                  0 dept_id (type: int)
                  1 dept_id (type: int)

  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: u
            Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
            Map Join Operator
              condition map:
                   Left Outer Join0 to 1
              keys:
                0 dept_id (type: int)
                1 dept_id (type: int)
              outputColumnNames: _col0, _col1, _col2, _col6, _col7
              Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: _col0 (type: int), _col1 (type: int), _col2 (type: string), _col6 (type: int), _col7 (type: string)
                outputColumnNames: _col0, _col1, _col2, _col3, _col4
                Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
                File Output Operator
                  compressed: false
                  Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
                  table:
                      input format: org.apache.hadoop.mapred.TextInputFormat
                      output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

Time taken: 0.873 seconds, Fetched: 59 row(s)
hive> explain select * from user_info u left outer join dept_info d on u.dept_id = d.dept_id where u.dept_id=1;
OK
STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-3

STAGE PLANS:
  Stage: Stage-4
    Map Reduce Local Work
      Alias -> Map Local Tables:
        d 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        d 
          TableScan
            alias: d
            Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (dept_id = 1) (type: boolean)
              Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
              HashTable Sink Operator
                keys:
                  0 dept_id (type: int)
                  1 dept_id (type: int)

  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: u
            Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (dept_id = 1) (type: boolean)
              Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
              Map Join Operator
                condition map:
                     Left Outer Join0 to 1
                keys:
                  0 dept_id (type: int)
                  1 dept_id (type: int)
                outputColumnNames: _col0, _col2, _col6, _col7
                Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
                Select Operator
                  expressions: _col0 (type: int), 1 (type: int), _col2 (type: string), _col6 (type: int), _col7 (type: string)
                  outputColumnNames: _col0, _col1, _col2, _col3, _col4
                  Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
                    table:
                        input format: org.apache.hadoop.mapred.TextInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

Time taken: 0.41 seconds, Fetched: 62 row(s)
  • set hive.optimize.ppd=false;
hive> explain select * from user_info u left outer join dept_info d on u.dept_id = d.dept_id where u.dept_id=1;
OK
STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-3

STAGE PLANS:
  Stage: Stage-4
    Map Reduce Local Work
      Alias -> Map Local Tables:
        d 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        d 
          TableScan
            alias: d
            Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
            HashTable Sink Operator
              keys:
                0 dept_id (type: int)
                1 dept_id (type: int)

  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: u
            Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
            Map Join Operator
              condition map:
                   Left Outer Join0 to 1
              keys:
                0 dept_id (type: int)
                1 dept_id (type: int)
              outputColumnNames: _col0, _col1, _col2, _col6, _col7
              Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
              Filter Operator
                predicate: (_col1 = 1) (type: boolean)
                Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
                Select Operator
                  expressions: _col0 (type: int), 1 (type: int), _col2 (type: string), _col6 (type: int), _col7 (type: string)
                  outputColumnNames: _col0, _col1, _col2, _col3, _col4
                  Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
                    table:
                        input format: org.apache.hadoop.mapred.TextInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

Time taken: 0.416 seconds, Fetched: 59 row(s)
hive> explain select * from user_info u left outer join dept_info d on u.dept_id = d.dept_id where d.dept_id=1;
OK
STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-3

STAGE PLANS:
  Stage: Stage-4
    Map Reduce Local Work
      Alias -> Map Local Tables:
        d 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        d 
          TableScan
            alias: d
            Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
            HashTable Sink Operator
              keys:
                0 dept_id (type: int)
                1 dept_id (type: int)

  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: u
            Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
            Map Join Operator
              condition map:
                   Left Outer Join0 to 1
              keys:
                0 dept_id (type: int)
                1 dept_id (type: int)
              outputColumnNames: _col0, _col1, _col2, _col6, _col7
              Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
              Filter Operator
                predicate: (_col6 = 1) (type: boolean)
                Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
                Select Operator
                  expressions: _col0 (type: int), _col1 (type: int), _col2 (type: string), 1 (type: int), _col7 (type: string)
                  outputColumnNames: _col0, _col1, _col2, _col3, _col4
                  Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
                    table:
                        input format: org.apache.hadoop.mapred.TextInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

Time taken: 0.4 seconds, Fetched: 59 row(s)
hive> explain select * from user_info u left outer join dept_info d on u.dept_id = d.dept_id and d.dept_id=1;
OK
STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-3

STAGE PLANS:
  Stage: Stage-4
    Map Reduce Local Work
      Alias -> Map Local Tables:
        d 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        d 
          TableScan
            alias: d
            Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (dept_id = 1) (type: boolean)
              Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
              HashTable Sink Operator
                keys:
                  0 dept_id (type: int)
                  1 dept_id (type: int)

  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: u
            Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
            Map Join Operator
              condition map:
                   Left Outer Join0 to 1
              keys:
                0 dept_id (type: int)
                1 dept_id (type: int)
              outputColumnNames: _col0, _col1, _col2, _col6, _col7
              Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: _col0 (type: int), _col1 (type: int), _col2 (type: string), _col6 (type: int), _col7 (type: string)
                outputColumnNames: _col0, _col1, _col2, _col3, _col4
                Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
                File Output Operator
                  compressed: false
                  Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
                  table:
                      input format: org.apache.hadoop.mapred.TextInputFormat
                      output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

Time taken: 0.459 seconds, Fetched: 59 row(s)

hive> explain select * from user_info u left outer join dept_info d on u.dept_id = d.dept_id and u.dept_id=1;
OK
STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-3

STAGE PLANS:
  Stage: Stage-4
    Map Reduce Local Work
      Alias -> Map Local Tables:
        d 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        d 
          TableScan
            alias: d
            Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
            HashTable Sink Operator
              filter predicates:
                0 {(dept_id = 1)}
                1 
              keys:
                0 dept_id (type: int)
                1 dept_id (type: int)

  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: u
            Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
            Map Join Operator
              condition map:
                   Left Outer Join0 to 1
              filter predicates:
                0 {(dept_id = 1)}
                1 
              keys:
                0 dept_id (type: int)
                1 dept_id (type: int)
              outputColumnNames: _col0, _col1, _col2, _col6, _col7
              Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: _col0 (type: int), _col1 (type: int), _col2 (type: string), _col6 (type: int), _col7 (type: string)
                outputColumnNames: _col0, _col1, _col2, _col3, _col4
                Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
                File Output Operator
                  compressed: false
                  Statistics: Num rows: 1 Data size: 48 Basic stats: COMPLETE Column stats: NONE
                  table:
                      input format: org.apache.hadoop.mapred.TextInputFormat
                      output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

Time taken: 0.425 seconds, Fetched: 62 row(s)

你可能感兴趣的:(大数据,hive)