Hive的执行过程

3、Hive的执行过程

3.1 Hive的执行过程

通过CLI,GUI或者thrift Server接口进来,

1)生成执行计划

2)请求执行给Driver(driver进行词法解析),之后有antlr生成词法树

3)Driver获取执行计划,compiler完成



4)Driver从metaStore获取metaData

5)metaStore返回metaData给compiler

6)通过对metaStore分析返回对应的计划给Driver

7)Driver将执行计划交给执行引擎(Execution Engine)

8)执行器DDL中的元数据从METASTORE中获取

9)执行器请求Hadoop中的job tracker来执行job来完成对应的查询服务,在整个过程中依赖hadoop中的MapReduce

10)执行完毕后将结果返回给执行引擎

11)执行请求返回结果给Driver

12)Driver将查询结果返回给用户

  3.2 hive执行过程的示意图如下

 

3.3 hive 执行过程中的语法数,获取语法树是需要使用explan命令:

EXPLAIN SELECT ln,count(distinct split(tma,'[.]')[0],'.',split(tma,'[.]')[1],'.',split(tma,'[.]')[2])
FROM raw_kafka_event_pageview_dt0
WHERE  l_date='2013-11-13'
AND customer='Cjinrongjie'
AND parse_url(ep,'HOST') like '%jrj.com.cn%'
GROUP BY ln limit 10;

该条查询语句生成的语法树如下:

ABSTRACT SYNTAX TREE:
  (
  TOK_QUERY (
      TOK_FROM (
          TOK_TABREF (
TOK_TABNAME raw_kafka_event_pageview_dt0)))
  (TOK_INSERT (
      TOK_DESTINATION (
             TOK_DIR TOK_TMP_FILE))
  (TOK_SELECT (
        TOK_SELEXPR (TOK_TABLE_OR_COL ln))
            (TOK_SELEXPR (
                 TOK_FUNCTIONDI count ([(TOK_FUNCTION split (TOK_TABLE_OR_COL tma) '[.]') 0) '.'
     ([ (TOK_FUNCTION split (TOK_TABLE_OR_COL tma) '[.]') 1) '.'
       ([ (TOK_FUNCTION split (TOK_TABLE_OR_COL tma) '[.]') 2))))
  (TOK_WHERE (
         AND (
              AND (= (TOK_TABLE_OR_COL l_date) '2013-11-13')
                 (= (TOK_TABLE_OR_COL customer) 'Cjinrongjie'))
                      (like (TOK_FUNCTION parse_url (TOK_TABLE_OR_COL ep) 'HOST') '%jrj.com.cn%')))
  (TOK_GROUPBY (TOK_TABLE_OR_COL ln))
  (TOK_LIMIT 10)))

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Alias -> Map Operator Tree:
        raw_kafka_event_pageview_dt0
          TableScan
            alias: raw_kafka_event_pageview_dt0
            Filter Operator
              predicate:
                  expr: ((customer = 'Cjinrongjie') and (parse_url(ep, 'HOST') like '%jrj.com.cn%'))
                  type: boolean
              Select Operator
                expressions:
                      expr: ln
                      type: string
                      expr: tma
                      type: string
                outputColumnNames: ln, tma
                Group By Operator
                  aggregations:
                        expr: count(DISTINCT split(tma, '[.]')[0], '.', split(tma, '[.]')[1], '.', split(tma, '[.]')[2])
                  bucketGroup: false
                  keys:
                        expr: ln
                        type: string
                        expr: split(tma, '[.]')[0]
                        type: string
                        expr: '.'
                        type: string
                        expr: split(tma, '[.]')[1]
                        type: string
                        expr: split(tma, '[.]')[2]
                        type: string
                  mode: hash
                  outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
                  Reduce Output Operator
                    key expressions:
                          expr: _col0
                          type: string
                          expr: _col1
                          type: string
                          expr: _col2
                          type: string
                          expr: _col3
                          type: string
                          expr: _col4
                          type: string
                    sort order: +++++
                    Map-reduce partition columns:
                          expr: _col0
                          type: string
                    tag: -1
                    value expressions:
                          expr: _col5
                          type: bigint
      Reduce Operator Tree:
        Group By Operator
          aggregations:
                expr: count(DISTINCT KEY._col1:0._col0, KEY._col1:0._col1, KEY._col1:0._col2, KEY._col1:0._col3, KEY._col1:0._col4)
          bucketGroup: false
          keys:
                expr: KEY._col0
                type: string
          mode: mergepartial
          outputColumnNames: _col0, _col1
          Select Operator
            expressions:
                  expr: _col0
                  type: string
                  expr: _col1
                  type: bigint
            outputColumnNames: _col0, _col1
            Limit
              File Output Operator
                compressed: false
                GlobalTableId: 0
                table:
                    input format: org.apache.hadoop.mapred.TextInputFormat
                    output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

  Stage: Stage-0
    Fetch Operator


你可能感兴趣的:(Hive的执行过程)