我们都知道,hive在执行的时候会把所对应的SQL语句都会转换成mapreduce代码执行,但是具体的MR执行信息我们怎样才能看出来呢?这里就用到了explain的关键字,他可详细的表示出在执行所对应的语句所对应的MR代码。语法格式如下。extended关键字可以更加详细的列举出代码的执行过程。
EXPLAIN [EXTENDED|DEPENDENCY|AUTHORIZATION] query
1:查询的抽象语法树
2:plane中各个stage的依赖情况
3:每个阶段的具体描述:描述具体来说就是显示出对应的操作算子和与之操作的对应的数据,例如查询算子,filter算子,fetch算子等等。下面我来看一个具体的例子:
<span style="font-size:18px;">explain from emp insert overwrite table emp_explain select job,sum(substr(emp.sal,4)) group by emp.job;</span>
会出现如下的信息:表示如上的代码呗划分成为3个stage。并且stage是一个根stage,stage0依赖于stage1,stage2依赖于stage0。具体表示的是每个stage的依赖信息。
<span style="font-size:24px;">STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree:// TableScan alias: emp//表示对emp表格进行操作 Statistics: Num rows: 6 Data size: 656 Basic stats: COMPLETE Column stats: NONE Select Operator //select算子操作 expressions: job (type: string), sal (type: double)//select对应的数据类型 outputColumnNames: job, sal Statistics: Num rows: 6 Data size: 656 Basic stats: COMPLETE Column stats: NONE Group By Operator// group by 算子 aggregations: sum(substr(sal, 4))//聚合操作 keys: job (type: string) mode: hash outputColumnNames: _col0, _col1//聚合输出的数据 Statistics: Num rows: 6 Data size: 656 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: string) sort order: + Map-reduce partition columns: _col0 (type: string) Statistics: Num rows: 6 Data size: 656 Basic stats: COMPLETE Column stats: NONE value expressions: _col1 (type: double) Reduce Operator Tree: Group By Operator aggregations: sum(VALUE._col0) keys: KEY._col0 (type: string) mode: mergepartial outputColumnNames: _col0, _col1 Statistics: Num rows: 3 Data size: 328 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col0 (type: string), UDFToInteger(_col1) (type: int) outputColumnNames: _col0, _col1 Statistics: Num rows: 3 Data size: 328 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 3 Data size: 328 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: emp_dept.emp_explain Stage: Stage-0 Move Operator tables: replace: true table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: emp_dept.emp_explain Stage: Stage-2 Stats-Aggr Operator//聚合操作</span>
具体的信息的进一步需要了解编译语言编译原理等技术,这里就不进一步了解了。