3、Hive的执行过程
3.1 Hive的执行过程
通过CLI,GUI或者thrift Server接口进来,
1)生成执行计划
2)请求执行给Driver(driver进行词法解析),之后有antlr生成词法树。
3)Driver获取执行计划,compiler完成。
4)Driver从metaStore获取metaData。
5)metaStore返回metaData给compiler。
6)通过对metaStore分析返回对应的计划给Driver。
7)Driver将执行计划交给执行引擎(Execution Engine)。
8)执行器DDL中的元数据从METASTORE中获取。
9)执行器请求Hadoop中的job tracker来执行job来完成对应的查询服务,在整个过程中依赖hadoop中的MapReduce。
10)执行完毕后将结果返回给执行引擎。
11)执行请求返回结果给Driver。
12)Driver将查询结果返回给用户。
3.2 hive执行过程的示意图如下
3.3 hive 执行过程中的语法数,获取语法树是需要使用explan命令:
EXPLAIN SELECT ln,count(distinct split(tma,'[.]')[0],'.',split(tma,'[.]')[1],'.',split(tma,'[.]')[2])
FROM raw_kafka_event_pageview_dt0
WHERE l_date='2013-11-13'
AND customer='Cjinrongjie'
AND parse_url(ep,'HOST') like '%jrj.com.cn%'
GROUP BY ln limit 10;
该条查询语句生成的语法树如下:
ABSTRACT SYNTAX TREE:
(
TOK_QUERY (
TOK_FROM (
TOK_TABREF (
TOK_TABNAME raw_kafka_event_pageview_dt0)))
(TOK_INSERT (
TOK_DESTINATION (
TOK_DIR TOK_TMP_FILE))
(TOK_SELECT (
TOK_SELEXPR (TOK_TABLE_OR_COL ln))
(TOK_SELEXPR (
TOK_FUNCTIONDI count ([(TOK_FUNCTION split (TOK_TABLE_OR_COL tma) '[.]') 0) '.'
([ (TOK_FUNCTION split (TOK_TABLE_OR_COL tma) '[.]') 1) '.'
([ (TOK_FUNCTION split (TOK_TABLE_OR_COL tma) '[.]') 2))))
(TOK_WHERE (
AND (
AND (= (TOK_TABLE_OR_COL l_date) '2013-11-13')
(= (TOK_TABLE_OR_COL customer) 'Cjinrongjie'))
(like (TOK_FUNCTION parse_url (TOK_TABLE_OR_COL ep) 'HOST') '%jrj.com.cn%')))
(TOK_GROUPBY (TOK_TABLE_OR_COL ln))
(TOK_LIMIT 10)))
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-1
Map Reduce
Alias -> Map Operator Tree:
raw_kafka_event_pageview_dt0
TableScan
alias: raw_kafka_event_pageview_dt0
Filter Operator
predicate:
expr: ((customer = 'Cjinrongjie') and (parse_url(ep, 'HOST') like '%jrj.com.cn%'))
type: boolean
Select Operator
expressions:
expr: ln
type: string
expr: tma
type: string
outputColumnNames: ln, tma
Group By Operator
aggregations:
expr: count(DISTINCT split(tma, '[.]')[0], '.', split(tma, '[.]')[1], '.', split(tma, '[.]')[2])
bucketGroup: false
keys:
expr: ln
type: string
expr: split(tma, '[.]')[0]
type: string
expr: '.'
type: string
expr: split(tma, '[.]')[1]
type: string
expr: split(tma, '[.]')[2]
type: string
mode: hash
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
Reduce Output Operator
key expressions:
expr: _col0
type: string
expr: _col1
type: string
expr: _col2
type: string
expr: _col3
type: string
expr: _col4
type: string
sort order: +++++
Map-reduce partition columns:
expr: _col0
type: string
tag: -1
value expressions:
expr: _col5
type: bigint
Reduce Operator Tree:
Group By Operator
aggregations:
expr: count(DISTINCT KEY._col1:0._col0, KEY._col1:0._col1, KEY._col1:0._col2, KEY._col1:0._col3, KEY._col1:0._col4)
bucketGroup: false
keys:
expr: KEY._col0
type: string
mode: mergepartial
outputColumnNames: _col0, _col1
Select Operator
expressions:
expr: _col0
type: string
expr: _col1
type: bigint
outputColumnNames: _col0, _col1
Limit
File Output Operator
compressed: false
GlobalTableId: 0
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Stage: Stage-0
Fetch Operator