使用explain关键字,了解Hive的工作原理
SQL:
select count(1) from dw.fact_ord_arranged where dt = '20160101'
Explain
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree: --------------- Map阶段
TableScan
alias: fact_ord_arranged --------------- 扫描的表
Statistics: Num rows: 0 Data size: 1379094784 Basic stats: PARTIAL Column stats: COMPLETE
Select Operator
Statistics: Num rows: 0 Data size: 1379094784 Basic stats: PARTIAL Column stats: COMPLETE
Group By Operator
aggregations: count(1) --------------- 聚合函数
mode: hash
outputColumnNames: _col0 --------------- 临时字段
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
sort order:
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col0 (type: bigint)
Reduce Operator Tree: --------------- Reduce阶段
Group By Operator
aggregations: count(VALUE._col0)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: _col0 (type: bigint)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat --------------- 输出文件格式
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1 --------------- job没有limit,所有没有操作
查询过程:
CliDriver update main thread name to da9ea076-e1ce-4384-bdb2-e62af5482003
17/04/04 21:33:11 INFO CliDriver: CliDriver update main thread name to da9ea076-e1ce-4384-bdb2-e62af5482003
Logging initialized using configuration in file:/opt/my/versions/hive_components/all_conf/querier_cli_0.13_write/conf/hive-log4j.properties
OK
Time taken: 0.459 seconds
OK
Time taken: 0.01 seconds
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapreduce.job.reduces=
Starting Job = job_1489485669600_6760508, Tracking URL = http://rz-data-hdp-rm01.rz.my.com:8088/proxy/application_1489485669600_6760508/
Kill Command = /opt/my/hadoop/bin/hadoop job -kill job_1489485669600_6760508
Hadoop job information for Stage-1: number of mappers: 18; number of reducers: 1
2017-04-04 21:33:32,207 Stage-1 map = 0%, reduce = 0%
2017-04-04 21:33:40,441 Stage-1 map = 11%, reduce = 0%, Cumulative CPU 5.28 sec
2017-04-04 21:33:41,470 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 26.97 sec
2017-04-04 21:33:42,498 Stage-1 map = 72%, reduce = 0%, Cumulative CPU 43.8 sec
2017-04-04 21:33:43,526 Stage-1 map = 78%, reduce = 0%, Cumulative CPU 49.44 sec
2017-04-04 21:33:44,556 Stage-1 map = 83%, reduce = 0%, Cumulative CPU 55.82 sec
2017-04-04 21:33:45,584 Stage-1 map = 89%, reduce = 0%, Cumulative CPU 62.08 sec
2017-04-04 21:33:46,611 Stage-1 map = 94%, reduce = 0%, Cumulative CPU 65.99 sec
2017-04-04 21:33:47,639 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 70.29 sec
2017-04-04 21:33:55,852 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 72.34 sec
MapReduce Total cumulative CPU time: 1 minutes 12 seconds 340 msec
Ended Job = job_1489485669600_6760508
Copying data to local directory /opt/my/data/talos/raw_data/hive_3e456a52193b11e79f0ba4dcbe04f8c6
Copying data to local directory /opt/my/data/talos/raw_data/hive_3e456a52193b11e79f0ba4dcbe04f8c6
MapReduce Jobs Launched:
Job 0: Map: 18 Reduce: 1 Cumulative CPU: 72.34 sec HDFS Read: 1381720966 HDFS Write: 8 SUCCESS
Total MapReduce CPU Time Spent: 1 minutes 12 seconds 340 msec
OK
Time taken: 42.318 seconds
select dt, count(1) as num from dw.fact_ord_arranged where dt = '20160101' group by dt limit 10
Explain
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: fact_ord_arranged
Statistics: Num rows: 0 Data size: 1379094784 Basic stats: PARTIAL Column stats: COMPLETE
Select Operator
expressions: dt (type: string)
outputColumnNames: dt
Statistics: Num rows: 0 Data size: 1379094784 Basic stats: PARTIAL Column stats: COMPLETE
Group By Operator
aggregations: count(1)
keys: dt (type: string)
mode: hash
outputColumnNames: _col0, _col1
Statistics: Num rows: 0 Data size: 1379094784 Basic stats: PARTIAL Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 0 Data size: 1379094784 Basic stats: PARTIAL Column stats: COMPLETE
value expressions: _col1 (type: bigint)
Reduce Operator Tree:
Group By Operator
aggregations: count(VALUE._col0)
keys: KEY._col0 (type: string)
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: COMPLETE
Select Operator
expressions: _col0 (type: string), _col1 (type: bigint)
outputColumnNames: _col0, _col1
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: COMPLETE
Limit
Number of rows: 10
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: COMPLETE
File Output Operator
compressed: false
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: COMPLETE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: 10
CliDriver update main thread name to 81e0f131-a86c-4424-97b2-8e63264d5964
17/04/04 21:33:47 INFO CliDriver: CliDriver update main thread name to 81e0f131-a86c-4424-97b2-8e63264d5964
Logging initialized using configuration in file:/opt/my/versions/hive_components/all_conf/querier_cli_0.13_write/conf/hive-log4j.properties
OK
Time taken: 1.031 seconds
OK
Time taken: 0.014 seconds
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 2
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapreduce.job.reduces=
Starting Job = job_1489485669600_6760579, Tracking URL = http://rz-data-hdp-rm01.rz.my.com:8088/proxy/application_1489485669600_6760579/
Kill Command = /opt/my/hadoop/bin/hadoop job -kill job_1489485669600_6760579
Hadoop job information for Stage-1: number of mappers: 18; number of reducers: 2
2017-04-04 21:34:07,571 Stage-1 map = 0%, reduce = 0%
2017-04-04 21:34:14,962 Stage-1 map = 6%, reduce = 0%, Cumulative CPU 1.83 sec
2017-04-04 21:34:16,012 Stage-1 map = 22%, reduce = 0%, Cumulative CPU 12.85 sec
2017-04-04 21:34:17,063 Stage-1 map = 72%, reduce = 0%, Cumulative CPU 47.82 sec
2017-04-04 21:34:19,162 Stage-1 map = 83%, reduce = 0%, Cumulative CPU 57.43 sec
2017-04-04 21:34:20,213 Stage-1 map = 89%, reduce = 0%, Cumulative CPU 61.19 sec
2017-04-04 21:34:25,463 Stage-1 map = 94%, reduce = 0%, Cumulative CPU 66.16 sec
2017-04-04 21:34:26,513 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 73.7 sec
2017-04-04 21:34:35,946 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 77.91 sec
MapReduce Total cumulative CPU time: 1 minutes 17 seconds 910 msec
Ended Job = job_1489485669600_6760579
Launching Job 2 out of 2
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapreduce.job.reduces=
Starting Job = job_1489485669600_6760647, Tracking URL = http://rz-data-hdp-rm01.rz.my.com:8088/proxy/application_1489485669600_6760647/
Kill Command = /opt/my/hadoop/bin/hadoop job -kill job_1489485669600_6760647
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
2017-04-04 21:34:46,186 Stage-2 map = 0%, reduce = 0%
2017-04-04 21:34:56,611 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.42 sec
2017-04-04 21:35:05,949 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 3.82 sec
MapReduce Total cumulative CPU time: 3 seconds 820 msec
Ended Job = job_1489485669600_6760647
Copying data to local directory /opt/my/data/talos/raw_data/hive_5344c542193b11e783fca4dcbe04f8c6
Copying data to local directory /opt/my/data/talos/raw_data/hive_5344c542193b11e783fca4dcbe04f8c6
MapReduce Jobs Launched:
Job 0: Map: 18 Reduce: 2 Cumulative CPU: 77.91 sec HDFS Read: 1381720966 HDFS Write: 222 SUCCESS
Job 1: Map: 1 Reduce: 1 Cumulative CPU: 3.82 sec HDFS Read: 915 HDFS Write: 17 SUCCESS
Total MapReduce CPU Time Spent: 1 minutes 21 seconds 730 msec
OK
Time taken: 71.645 seconds