select s_age,s_score
from student_tb_seq
where s_age=20;
| Explain |
| Stage-1 is a root stage |
| Stage-0 depends on stages: Stage-1 |
| |
| Stage: Stage-1 |
| Map Reduce |
| Map Operator Tree: |
| TableScan |
| alias: student_tb_seq |
| filterExpr: (s_age = 20) (type: boolean) |
| Statistics: Num rows: 5000000 Data size: 5478827365 Basic stats: COMPLETE Column stats: NONE |
| Filter Operator |
| predicate: (s_age = 20) (type: boolean) |
| Statistics: Num rows: 2500000 Data size: 2739413682 Basic stats: COMPLETE Column stats: NONE |
| Select Operator |
| expressions: 20 (type: bigint), s_score (type: bigint) |
| outputColumnNames: _col0, _col1 |
| Statistics: Num rows: 2500000 Data size: 2739413682 Basic stats: COMPLETE Column stats: NONE |
| File Output Operator |
| compressed: false |
| Statistics: Num rows: 2500000 Data size: 2739413682 Basic stats: COMPLETE Column stats: NONE |
| table: |
| input format: org.apache.hadoop.mapred.TextInputFormat |
| output format: |
| serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| |
| Stage: Stage-0 |
| Fetch Operator |
| limit: -1 |
| Processor Tree: |
| ListSink |
Map Operator Tree:
nvl(s_no,'undefine') sno,
case when s_score>20 then '高级评分' else '低级评分' end level
from student_tb_orc
where s_age in (18,19,20);
| Explain |
| Stage-1 is a root stage |
| Stage-0 depends on stages: Stage-1 |
| |
| Stage: Stage-1 |
| Map Reduce |
| Map Operator Tree: |
| TableScan |
| alias: student_tb_orc |
| filterExpr: (s_age) IN (18, 19, 20) (type: boolean) |
| Statistics: Num rows: 5000000 Data size: 7602500000 Basic stats: COMPLETE Column stats: NONE |
| Filter Operator |
| predicate: (s_age) IN (18, 19, 20) (type: boolean) |
| Statistics: Num rows: 2500000 Data size: 3801250000 Basic stats: COMPLETE Column stats: NONE |
| Select Operator |
| expressions: if s_no is null returns'undefine' (type: string), CASE WHEN ((s_score > 20)) THEN ('高级评分') ELSE ('低级评分') END (type: string) |
| outputColumnNames: _col0, _col1 |
| Statistics: Num rows: 2500000 Data size: 3801250000 Basic stats: COMPLETE Column stats: NONE |
| File Output Operator |
| compressed: false |
| Statistics: Num rows: 2500000 Data size: 3801250000 Basic stats: COMPLETE Column stats: NONE |
| table: |
| input format: org.apache.hadoop.mapred.TextInputFormat |
| output format: |
| serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| |
| Stage: Stage-0 |
| Fetch Operator |
| limit: -1 |
| Processor Tree: |
| ListSink |
Map Operator Tree:
select count(1)
from student_tb_seq
where s_age=20;
| Explain |
| Stage-1 is a root stage |
| Stage-0 depends on stages: Stage-1 |
| |
| Stage: Stage-1 |
| Map Reduce |
| Map Operator Tree: |
| TableScan |
| alias: student_tb_seq |
| filterExpr: (s_age = 20) (type: boolean) |
| Statistics: Num rows: 5000000 Data size: 5478827365 Basic stats: COMPLETE Column stats: NONE |
| Filter Operator |
| predicate: (s_age = 20) (type: boolean) |
| Statistics: Num rows: 2500000 Data size: 2739413682 Basic stats: COMPLETE Column stats: NONE |
| Select Operator |
| Statistics: Num rows: 2500000 Data size: 2739413682 Basic stats: COMPLETE Column stats: NONE |
| Group By Operator |
| aggregations: count(1) |
| mode: hash |
| outputColumnNames: _col0 |
| Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE |
| Reduce Output Operator |
| sort order: |
| Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE |
| value expressions: _col0 (type: bigint) |
| Reduce Operator Tree: |
| Group By Operator |
| aggregations: count(VALUE._col0) |
| mode: mergepartial |
| outputColumnNames: _col0 |
| Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE |
| File Output Operator |
| compressed: false |
| Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE |
| table: |
| input format: org.apache.hadoop.mapred.TextInputFormat |
| output format: |
| serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| |
| Stage: Stage-0 |
| Fetch Operator |
| limit: -1 |
| Processor Tree: |
| ListSink |
Map Operator Tree:
Reduce Operator Tree:
如果只算行数count(*),则map可以输出context.write(null, count),即key值能为null。
如果,则map端没有Group By Operator。
select s_age,avg(s_score) avg_score
from student_tb_orc
where s_age<20
group by s_age;
| Explain |
| Stage-1 is a root stage |
| Stage-0 depends on stages: Stage-1 |
| |
| Stage: Stage-1 |
| Map Reduce |
| Map Operator Tree: |
| TableScan |
| alias: student_tb_orc |
| filterExpr: (s_age < 20) (type: boolean) |
| Statistics: Num rows: 5000000 Data size: 7602500000 Basic stats: COMPLETE Column stats: NONE |
| Filter Operator |
| predicate: (s_age < 20) (type: boolean) |
| Statistics: Num rows: 1666666 Data size: 2534165653 Basic stats: COMPLETE Column stats: NONE |
| Group By Operator |
| aggregations: avg(s_score) |
| keys: s_age (type: bigint) |
| mode: hash |
| outputColumnNames: _col0, _col1 |
| Statistics: Num rows: 1666666 Data size: 2534165653 Basic stats: COMPLETE Column stats: NONE |
| Reduce Output Operator |
| key expressions: _col0 (type: bigint) |
| sort order: + |
| Map-reduce partition columns: _col0 (type: bigint) |
| Statistics: Num rows: 1666666 Data size: 2534165653 Basic stats: COMPLETE Column stats: NONE |
| value expressions: _col1 (type: struct) |
| Execution mode: vectorized |
| Reduce Operator Tree: |
| Group By Operator |
| aggregations: avg(VALUE._col0) |
| keys: KEY._col0 (type: bigint) |
| mode: mergepartial |
| outputColumnNames: _col0, _col1 |
| Statistics: Num rows: 833333 Data size: 1267082826 Basic stats: COMPLETE Column stats: NONE |
| File Output Operator |
| compressed: false |
| Statistics: Num rows: 833333 Data size: 1267082826 Basic stats: COMPLETE Column stats: NONE |
| table: |
| input format: org.apache.hadoop.mapred.TextInputFormat |
| output format: |
| serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| |
| Stage: Stage-0 |
| Fetch Operator |
| limit: -1 |
| Processor Tree: |
| ListSink |
Map Operator Tree:
Reduce Operator Tree:
group by算平均数,在map端的聚合是不能算平均数的,因为会有小数点,以免再在reduce端算平均数造成误差。
所以分析中显示的map端的输出的value为 struct
select s_no,row_number() over(partition by s_age order by s_score) rk
from student_tb_orc;
| Explain |
| Stage-1 is a root stage |
| Stage-0 depends on stages: Stage-1 |
| |
| Stage: Stage-1 |
| Map Reduce |
| Map Operator Tree: |
| TableScan |
| alias: student_tb_orc |
| Statistics: Num rows: 5000000 Data size: 7602500000 Basic stats: COMPLETE Column stats: NONE |
| Reduce Output Operator |
| key expressions: s_age (type: bigint), s_score (type: bigint) |
| sort order: ++ |
| Map-reduce partition columns: s_age (type: bigint) |
| Statistics: Num rows: 5000000 Data size: 7602500000 Basic stats: COMPLETE Column stats: NONE |
| value expressions: s_no (type: string) |
| Execution mode: vectorized |
| Reduce Operator Tree: |
| Select Operator |
| expressions: VALUE._col0 (type: string), KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 (type: bigint) |
| outputColumnNames: _col0, _col3, _col5 |
| Statistics: Num rows: 5000000 Data size: 7602500000 Basic stats: COMPLETE Column stats: NONE |
| PTF Operator |
| Function definitions: |
| Input definition |
| input alias: ptf_0 |
| output shape: _col0: string, _col3: bigint, _col5: bigint |
| type: WINDOWING |
| Windowing table definition |
| input alias: ptf_1 |
| name: windowingtablefunction |
| order by: _col5 |
| partition by: _col3 |
| raw input shape: |
| window functions: |
| window function definition |
| alias: _wcol0 |
| name: row_number |
| window function: GenericUDAFRowNumberEvaluator |
| isPivotResult: true |
| Statistics: Num rows: 5000000 Data size: 7602500000 Basic stats: COMPLETE Column stats: NONE |
| Select Operator |
| expressions: _col0 (type: string), _wcol0 (type: int) |
| outputColumnNames: _col0, _col1 |
| Statistics: Num rows: 5000000 Data size: 7602500000 Basic stats: COMPLETE Column stats: NONE |
| File Output Operator |
| compressed: false |
| Statistics: Num rows: 5000000 Data size: 7602500000 Basic stats: COMPLETE Column stats: NONE |
| table: |
| input format: org.apache.hadoop.mapred.TextInputFormat |
| output format: |
| serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| |
| Stage: Stage-0 |
| Fetch Operator |
| limit: -1 |
| Processor Tree: |
| ListSink |
Map Operator Tree:
Reduce Operator Tree:
map端输出row_number() over内的两个参数当作key,s_no当作value。
row_number() over内的两个参数都要进行排序。
select a.s_no,a.s_score,b.s_score
from student_tb_orc a
inner join student_tb_orc b
on a.s_no=b.s_no;
| Explain |
| Stage-5 is a root stage , consists of Stage-1 |
| Stage-1 |
| Stage-0 depends on stages: Stage-1 |
| |
| Stage: Stage-5 |
| Conditional Operator |
| |
| Stage: Stage-1 |
| Map Reduce |
| Map Operator Tree: |
| TableScan |
| alias: a |
| filterExpr: s_no is not null (type: boolean) |
| Statistics: Num rows: 5000000 Data size: 7602500000 Basic stats: COMPLETE Column stats: NONE |
| Filter Operator |
| predicate: s_no is not null (type: boolean) |
| Statistics: Num rows: 2500000 Data size: 3801250000 Basic stats: COMPLETE Column stats: NONE |
| Reduce Output Operator |
| key expressions: s_no (type: string) |
| sort order: + |
| Map-reduce partition columns: s_no (type: string) |
| Statistics: Num rows: 2500000 Data size: 3801250000 Basic stats: COMPLETE Column stats: NONE |
| value expressions: s_score (type: bigint) |
| TableScan |
| alias: b |
| filterExpr: s_no is not null (type: boolean) |
| Statistics: Num rows: 5000000 Data size: 7602500000 Basic stats: COMPLETE Column stats: NONE |
| Filter Operator |
| predicate: s_no is not null (type: boolean) |
| Statistics: Num rows: 2500000 Data size: 3801250000 Basic stats: COMPLETE Column stats: NONE |
| Reduce Output Operator |
| key expressions: s_no (type: string) |
| sort order: + |
| Map-reduce partition columns: s_no (type: string) |
| Statistics: Num rows: 2500000 Data size: 3801250000 Basic stats: COMPLETE Column stats: NONE |
| value expressions: s_score (type: bigint) |
| Reduce Operator Tree: |
| Join Operator |
| condition map: |
| Inner Join 0 to 1 |
| keys: |
| 0 s_no (type: string) |
| 1 s_no (type: string) |
| outputColumnNames: _col0, _col5, _col15 |
| Statistics: Num rows: 2750000 Data size: 4181375090 Basic stats: COMPLETE Column stats: NONE |
| Select Operator |
| expressions: _col0 (type: string), _col5 (type: bigint), _col15 (type: bigint) |
| outputColumnNames: _col0, _col1, _col2 |
| Statistics: Num rows: 2750000 Data size: 4181375090 Basic stats: COMPLETE Column stats: NONE |
| File Output Operator |
| compressed: false |
| Statistics: Num rows: 2750000 Data size: 4181375090 Basic stats: COMPLETE Column stats: NONE |
| table: |
| input format: org.apache.hadoop.mapred.TextInputFormat |
| output format: |
| serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| |
| Stage: Stage-0 |
| Fetch Operator |
| limit: -1 |
| Processor Tree: |
| ListSink |
Map Operator Tree:
Reduce Operator Tree:
set; 这个参数默认是true。