这几天有个朋友问我 hive的overwrite是怎么执行重写,假如重写执行到一半报错,会不会导致丢失数据呢?
一开始没有反应过来,后来想想,其实这个可以从 explain 上看到的。
hive (temp)> explain insert overwrite table ods.ods_memberext_dd select * from temp.lhc_memberext_20130926;
OK
Explain
ABSTRACT SYNTAX TREE:
(TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME temp lhc_memberext_20130926))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB (TOK_TABNAME ods ods_memberext_dd))) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF))))
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-7 depends on stages: Stage-1 , consists of Stage-4, Stage-3, Stage-5
Stage-4
Stage-0 depends on stages: Stage-4, Stage-3, Stage-6
Stage-2 depends on stages: Stage-0
Stage-3
Stage-5
Stage-6 depends on stages: Stage-5
STAGE PLANS:
Stage: Stage-1
Map Reduce
Alias -> Map Operator Tree:
lhc_memberext_20130926
TableScan
alias: lhc_memberext_20130926
Select Operator
expressions:
expr: id
type: int
expr: bloodtype
type: string
expr: regdate
type: string
expr: termtype
type: string
expr: channel
type: int
expr: ip
type: string
expr: clientid
type: string
expr: imei
type: string
expr: version
type: string
expr: platform
type: string
expr: model
type: string
expr: systemname
type: string
expr: systemversion
type: string
expr: channelid
type: int
expr: resolution
type: string
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14
File Output Operator
compressed: false
GlobalTableId: 1
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: ods.ods_memberext_dd
Stage: Stage-7
Conditional Operator
Stage: Stage-4
Move Operator
files:
hdfs directory: true
destination: hdfs://dx20:9000/tmp/hive-hadoop/hive_2014-01-08_10-58-52_023_7835826938243226729/-ext-10000
Stage: Stage-0
Move Operator
tables:
replace: true
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: ods.ods_memberext_dd
Stage: Stage-2
Stats-Aggr Operator
Stage: Stage-3
Map Reduce
Alias -> Map Operator Tree:
hdfs://dx20:9000/tmp/hive-hadoop/hive_2014-01-08_10-58-52_023_7835826938243226729/-ext-10002
File Output Operator
compressed: false
GlobalTableId: 0
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: ods.ods_memberext_dd
Stage: Stage-5
Map Reduce
Alias -> Map Operator Tree:
hdfs://dx20:9000/tmp/hive-hadoop/hive_2014-01-08_10-58-52_023_7835826938243226729/-ext-10002
File Output Operator
compressed: false
GlobalTableId: 0
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: ods.ods_memberext_dd
Stage: Stage-6
Move Operator
files:
hdfs directory: true
destination: hdfs://dx20:9000/tmp/hive-hadoop/hive_2014-01-08_10-58-52_023_7835826938243226729/-ext-10000
hive.exec.scratchdir
HDFS路径,用于存储不同 map/reduce 阶段的执行计划和这些阶段的中间输出结果。
hive.exec.scratchdir
/tmp/hive-${user.name}
Scratch space for Hive jobs
对于
hive_2014-01-08_10-58-52_023_7835826938243226729/这个文件夹,在job执行完之后就会被自己删除掉的。