在使用Hive 的时候，我们常常进行聚合统计运算。

聚合统计的运算函数有很多，比如我们最常用的 GROUP BY 函数。但是常常我们需要多维度统计数据，这个时候我们就会用到Hive 的聚合统计函数

这里我们讲解下 ROLLUP, GROUPING SETS, CUBE 的含义以及用法。

我们结合案例讲解下这几个函数的使用规则

数据统计背景：

我们现在有多个公司，多个部门，多位员工的薪水情况。现在我们需要按照多种维度去统计薪水。

我们看下这几个函数的用法ROLLUP, GROUPING SETS, CUBE

我们从 GROUPING SETS , ROLLUP, CUBE 一点点讨论。

GROUPING SETS

GROUPING SETS作为GROUP BY的子句，允许开发人员在GROUP BY语句后面指定多个统计选项，可以简单理解为多条group by语句通过union all把查询结果聚合起来结合起来，下面是几个实例可以帮助我们了解.

首先我们学习下 GROUPING SETS

GROUPING SETS 用法如下：

SELECT

a,b...,f

FROM test_table

GROUP BY

a,b...,f

GROUPING SETS ((?,...,?),(xxx),(yyy))

GROUPING SETS 中间可以填写多个条件。

其中 (?,...,?) 可以为 a~f 中不重复的任意项

具体例子如下：

SELECT

a,b,SUM(c)

FROM test_table

GROUP BY

a,b

GROUPING SETS ((a),(a,b),())

等价于

SELECT

,NULL

,SUM(c)

FROM test_table

GROUP BY

UNION ALL

SELECT

,SUM(c)

FROM test_table

GROUP BY

UNION ALL

SELECT

NULL

,NULL

,SUM(c)

FROM test_table

;

实际案例，我们想按照公司，和整体去统计员工的薪水，但是我们想在一条语句中完成。

我们该如何编写SQL 呢？

SQL如下：

SELECT

grouping__id

,company_name

,dep_name

,user_id

,SUM(salary) AS total_salary

FROM datacube_salary_org

WHERE pt = '20200407'

GROUP BY

company_name

,dep_name

,user_id

GROUPING SETS ((company_name), ())

ORDER BY

grouping__id

;

输出如下：

INFO : MapReduce Jobs Launched:

INFO : Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 4.13 sec HDFS Read: 11666 HDFS Write: 175 SUCCESS

INFO : Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 3.73 sec HDFS Read: 7060 HDFS Write: 188 SUCCESS

INFO : Total MapReduce CPU Time Spent: 7 seconds 860 msec

INFO : Completed executing command(queryId=hive_20200408032038_18e04047-b8c0-4d07-a5de-00ccbc7cb4cc); Time taken: 51.459 seconds

INFO : OK

+---------------+---------------+-----------+----------+---------------+

+---------------+---------------+-----------+----------+---------------+

| 1 | x.qx | NULL | NULL | 59500.00 |

| 1 | s.zh | NULL | NULL | 74000.00 |

+---------------+---------------+-----------+----------+---------------+

可以看到 grouping_id 为0，计算的是整体的薪水和,而grouping_id 为1, 计算的是分公司的薪水

我们刚才说过用 GROUPING SETS 和 GROUP BY + UNION ALL 是等价的，但是它们真的一样么。我们运行 EXPLAIN 检验一下。

下面给出两个运行结果等价的SQL 的 EXPLAIN 结果。

EXPLAIN

SELECT

FROM

(

SELECT

0 AS mark

,NULL

,SUM(salary) AS total_salary

FROM datacube_salary_org

WHERE pt = '20200407'

UNION ALL

SELECT

1 AS mark

,company_name

,NULL

,SUM(salary) AS total_salary

FROM datacube_salary_org

WHERE pt = '20200407'

GROUP BY

company_name

) tmp

ORDER BY mark

;

SQL2

EXPLAIN

SELECT

grouping__id

,company_name

,dep_name

,user_id

,SUM(salary) AS total_salary

FROM datacube_salary_org

WHERE pt = '20200407'

GROUP BY

company_name

,dep_name

,user_id

GROUPING SETS ((company_name), ())

ORDER BY

grouping__id

;

先贴出2个SQL的分别EXPLAIN 结果

UNION ALL 的SQL

注意： UNION ALL 中做聚合增加更多的SQL 会增加 JOB 数量。

STAGE DEPENDENCIES:

Stage-1 is a root stage

Stage-2 depends on stages: Stage-1, Stage-3

Stage-3 is a root stage

Stage-0 depends on stages: Stage-2

STAGE PLANS:

Stage: Stage-1

Map Reduce

Map Operator Tree:

TableScan

alias: datacube_salary_org

filterExpr: (pt = '20200407') (type: boolean)

Statistics: Num rows: 7 Data size: 340 Basic stats: COMPLETE Column stats: NONE

Select Operator

expressions: salary (type: decimal(10,2))

outputColumnNames: salary

Statistics: Num rows: 7 Data size: 340 Basic stats: COMPLETE Column stats: NONE

Group By Operator

aggregations: sum(salary)

mode: hash

outputColumnNames: _col0

Statistics: Num rows: 1 Data size: 112 Basic stats: COMPLETE Column stats: NONE

Reduce Output Operator

sort order:

Statistics: Num rows: 1 Data size: 112 Basic stats: COMPLETE Column stats: NONE

value expressions: _col0 (type: decimal(20,2))

Reduce Operator Tree:

Group By Operator

aggregations: sum(VALUE._col0)

mode: mergepartial

outputColumnNames: _col0

Statistics: Num rows: 1 Data size: 112 Basic stats: COMPLETE Column stats: NONE

Select Operator

expressions: 0 (type: int), null (type: string), _col0 (type: decimal(20,2))

outputColumnNames: _col0, _col1, _col4

Statistics: Num rows: 1 Data size: 112 Basic stats: COMPLETE Column stats: NONE

File Output Operator

compressed: false

table:

input format: org.apache.hadoop.mapred.SequenceFileInputFormat

output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat

serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe

Stage: Stage-2

Map Reduce

Map Operator Tree:

TableScan

Union

Statistics: Num rows: 4 Data size: 257 Basic stats: COMPLETE Column stats: NONE

Select Operator

expressions: _col0 (type: int), _col1 (type: string), _col4 (type: decimal(20,2))

outputColumnNames: _col0, _col1, _col4

Statistics: Num rows: 4 Data size: 257 Basic stats: COMPLETE Column stats: NONE

Reduce Output Operator

key expressions: _col0 (type: int)

sort order: +

Statistics: Num rows: 4 Data size: 257 Basic stats: COMPLETE Column stats: NONE

value expressions: _col1 (type: string), _col4 (type: decimal(20,2))

TableScan

Union

Statistics: Num rows: 4 Data size: 257 Basic stats: COMPLETE Column stats: NONE

Select Operator

expressions: _col0 (type: int), _col1 (type: string), _col4 (type: decimal(20,2))

outputColumnNames: _col0, _col1, _col4

Statistics: Num rows: 4 Data size: 257 Basic stats: COMPLETE Column stats: NONE

Reduce Output Operator

key expressions: _col0 (type: int)

sort order: +

Statistics: Num rows: 4 Data size: 257 Basic stats: COMPLETE Column stats: NONE

value expressions: _col1 (type: string), _col4 (type: decimal(20,2))

Reduce Operator Tree:

Select Operator

expressions: KEY.reducesinkkey0 (type: int), VALUE._col0 (type: string), null (type: void), null (type: void), VALUE._col3 (type: decimal(20,2))

outputColumnNames: _col0, _col1, _col2, _col3, _col4

Statistics: Num rows: 4 Data size: 257 Basic stats: COMPLETE Column stats: NONE

File Output Operator

compressed: false

Statistics: Num rows: 4 Data size: 257 Basic stats: COMPLETE Column stats: NONE

table:

input format: org.apache.hadoop.mapred.SequenceFileInputFormat

output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat

serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

Stage: Stage-3

Map Reduce

Map Operator Tree:

TableScan

alias: datacube_salary_org

filterExpr: (pt = '20200407') (type: boolean)

Statistics: Num rows: 7 Data size: 340 Basic stats: COMPLETE Column stats: NONE

Select Operator

expressions: company_name (type: string), salary (type: decimal(10,2))

outputColumnNames: company_name, salary

Statistics: Num rows: 7 Data size: 340 Basic stats: COMPLETE Column stats: NONE

Group By Operator

aggregations: sum(salary)

keys: company_name (type: string)

mode: hash

outputColumnNames: _col0, _col1

Statistics: Num rows: 7 Data size: 340 Basic stats: COMPLETE Column stats: NONE

Reduce Output Operator

key expressions: _col0 (type: string)

sort order: +

Map-reduce partition columns: _col0 (type: string)

Statistics: Num rows: 7 Data size: 340 Basic stats: COMPLETE Column stats: NONE

value expressions: _col1 (type: decimal(20,2))

Reduce Operator Tree:

Group By Operator

aggregations: sum(VALUE._col0)

keys: KEY._col0 (type: string)

mode: mergepartial

outputColumnNames: _col0, _col1

Statistics: Num rows: 3 Data size: 145 Basic stats: COMPLETE Column stats: NONE

Select Operator

expressions: 1 (type: int), _col0 (type: string), _col1 (type: decimal(20,2))

outputColumnNames: _col0, _col1, _col4

Statistics: Num rows: 3 Data size: 145 Basic stats: COMPLETE Column stats: NONE

File Output Operator

compressed: false

table:

input format: org.apache.hadoop.mapred.SequenceFileInputFormat

output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat

serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe

Stage: Stage-0

Fetch Operator

limit: -1

Processor Tree:

ListSink

GROUPING SETS 的SQL

STAGE DEPENDENCIES:

Stage-1 is a root stage

Stage-2 depends on stages: Stage-1

Stage-0 depends on stages: Stage-2

STAGE PLANS:

Stage: Stage-1

Map Reduce

Map Operator Tree:

TableScan

alias: datacube_salary_org

filterExpr: (pt = '20200407') (type: boolean)

Statistics: Num rows: 7 Data size: 340 Basic stats: COMPLETE Column stats: NONE

Select Operator

expressions: company_name (type: string), dep_name (type: string), user_id (type: bigint), salary (type: decimal(10,2))

outputColumnNames: company_name, dep_name, user_id, salary

Statistics: Num rows: 7 Data size: 340 Basic stats: COMPLETE Column stats: NONE

Group By Operator

aggregations: sum(salary)

keys: company_name (type: string), dep_name (type: string), user_id (type: bigint), 0 (type: int)

mode: hash

outputColumnNames: _col0, _col1, _col2, _col3, _col4

Statistics: Num rows: 14 Data size: 680 Basic stats: COMPLETE Column stats: NONE

Reduce Output Operator

key expressions: _col0 (type: string), _col1 (type: string), _col2 (type: bigint), _col3 (type: int)

sort order: ++++

Map-reduce partition columns: _col0 (type: string), _col1 (type: string), _col2 (type: bigint), _col3 (type: int)

Statistics: Num rows: 14 Data size: 680 Basic stats: COMPLETE Column stats: NONE

value expressions: _col4 (type: decimal(20,2))

Reduce Operator Tree:

Group By Operator

aggregations: sum(VALUE._col0)

keys: KEY._col0 (type: string), KEY._col1 (type: string), KEY._col2 (type: bigint), KEY._col3 (type: int)

mode: mergepartial

outputColumnNames: _col0, _col1, _col2, _col3, _col4

Statistics: Num rows: 7 Data size: 340 Basic stats: COMPLETE Column stats: NONE

Select Operator

expressions: _col3 (type: int), _col0 (type: string), _col1 (type: string), _col2 (type: bigint), _col4 (type: decimal(20,2))

outputColumnNames: _col0, _col1, _col2, _col3, _col4

Statistics: Num rows: 7 Data size: 340 Basic stats: COMPLETE Column stats: NONE

File Output Operator

compressed: false

table:

input format: org.apache.hadoop.mapred.SequenceFileInputFormat

output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat

serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe

Stage: Stage-2

Map Reduce

Map Operator Tree:

TableScan

Reduce Output Operator

key expressions: _col0 (type: int)

sort order: +

Statistics: Num rows: 7 Data size: 340 Basic stats: COMPLETE Column stats: NONE

value expressions: _col1 (type: string), _col2 (type: string), _col3 (type: bigint), _col4 (type: decimal(20,2))

Reduce Operator Tree:

Select Operator

expressions: KEY.reducesinkkey0 (type: int), VALUE._col0 (type: string), VALUE._col1 (type: string), VALUE._col2 (type: bigint), VALUE._col3 (type: decimal(20,2))

outputColumnNames: _col0, _col1, _col2, _col3, _col4

Statistics: Num rows: 7 Data size: 340 Basic stats: COMPLETE Column stats: NONE

File Output Operator

compressed: false

Statistics: Num rows: 7 Data size: 340 Basic stats: COMPLETE Column stats: NONE

table:

input format: org.apache.hadoop.mapred.SequenceFileInputFormat

output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat

serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

Stage: Stage-0

Fetch Operator

limit: -1

Processor Tree:

ListSink

通过以上分析比较，我们不难发现 GROUPING SETS 的作业划分数量更少，

实际情况下 : GROUPING SETS 的运行效率也要高于 UNION ALL 的 GROUP BY 形式

下面是对比的执行时间

GROUPING SETS

INFO : MapReduce Jobs Launched:

INFO : Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.62 sec HDFS Read: 11666 HDFS Write: 175 SUCCESS

INFO : Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 3.51 sec HDFS Read: 7060 HDFS Write: 188 SUCCESS

INFO : Total MapReduce CPU Time Spent: 7 seconds 130 msec

INFO : Completed executing command(queryId=hive_20200408045412_4ab9e09f-436e-4433-9a1f-a03d5b32ef3e); Time taken: 49.676 seconds

INFO : OK

+---------------+---------------+-----------+----------+---------------+

+---------------+---------------+-----------+----------+---------------+

| 1 | x.qx | NULL | NULL | 59500.00 |

| 1 | s.zh | NULL | NULL | 74000.00 |

+---------------+---------------+-----------+----------+---------------+

UNION ALL 的 GROUP BY

INFO : MapReduce Jobs Launched:

INFO : Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.7 sec HDFS Read: 10541 HDFS Write: 119 SUCCESS

INFO : Stage-Stage-3: Map: 1 Reduce: 1 Cumulative CPU: 4.34 sec HDFS Read: 10919 HDFS Write: 152 SUCCESS

INFO : Stage-Stage-2: Map: 2 Reduce: 1 Cumulative CPU: 5.08 sec HDFS Read: 12932 HDFS Write: 188 SUCCESS

INFO : Total MapReduce CPU Time Spent: 13 seconds 120 msec

INFO : Completed executing command(queryId=hive_20200408045141_2033cbf6-a457-4bdb-aaec-65900b386972); Time taken: 84.365 seconds

INFO : OK

+-----------+----------+----------+----------+-------------------+

+-----------+----------+----------+----------+-------------------+

| 1 | x.qx | NULL | NULL | 59500.00 |

| 1 | s.zh | NULL | NULL | 74000.00 |

+-----------+----------+----------+----------+-------------------+

可以看到 GROUPING SETS 的执行效率更高

ROLLUP

rollup可以实现从右到左递减多级的统计，显示统计某一层次结构的聚合。（即指定字段组合的递减汇总，如with rollup a,b,c,d 那对应的二进制只能是1111，1110，1100，1000，0000）

即

SELECT

grouping__id

,company_name

,dep_name

,user_id

,SUM(salary) AS total_salary

FROM datacube_salary_org

WHERE pt = '20200407'

GROUP BY

company_name

,dep_name

,user_id

WITH ROLLUP

ORDER BY

grouping__id

;

等价于

SELECT

grouping__id

,company_name

,dep_name

,user_id

,SUM(salary) AS total_salary

FROM datacube_salary_org

WHERE pt = '20200407'

GROUP BY

company_name

,dep_name

,user_id

GROUPING SETS ((company_name, dep_name, user_id), (company_name, dep_name), (company_name),())

ORDER BY

grouping__id

;

等价于

SELECT

company_name

,dep_name

,user_id

,SUM(salary) AS total_salary

FROM datacube_salary_org

WHERE pt = '20200407'

GROUP BY

company_name

,dep_name

,user_id

UNION ALL

SELECT

company_name

,dep_name

,NULL

,SUM(salary) AS total_salary

FROM datacube_salary_org

WHERE pt = '20200407'

GROUP BY

company_name

,dep_name

UNION ALL

SELECT

company_name

,NULL

,SUM(salary) AS total_salary

FROM datacube_salary_org

WHERE pt = '20200407'

GROUP BY

company_name

UNION ALL

SELECT

NULL

,NULL

,SUM(salary) AS total_salary

FROM datacube_salary_org

WHERE pt = '20200407'

;

运算结果

INFO : MapReduce Jobs Launched:

INFO : Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 4.21 sec HDFS Read: 11674 HDFS Write: 563 SUCCESS

INFO : Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 3.91 sec HDFS Read: 7448 HDFS Write: 602 SUCCESS

INFO : Total MapReduce CPU Time Spent: 8 seconds 120 msec

INFO : Completed executing command(queryId=hive_20200408052638_740f42b9-6f08-49a6-8123-9a77aedc6b19); Time taken: 50.563 seconds

INFO : OK

+---------------+---------------+-----------+----------+---------------+

+---------------+---------------+-----------+----------+---------------+

| 1 | s.zh | NULL | NULL | 74000.00 |

| 1 | x.qx | NULL | NULL | 59500.00 |

| 3 | s.zh | tester | NULL | 20000.00 |

| 3 | x.qx | kiccp | NULL | 8600.00 |

| 3 | x.qx | finance | NULL | 50900.00 |

| 3 | s.zh | enginer | NULL | 54000.00 |

| 7 | x.qx | kiccp | 7 | 8600.00 |

| 7 | x.qx | finance | 6 | 13000.00 |

| 7 | x.qx | finance | 5 | 24500.00 |

| 7 | x.qx | finance | 4 | 13400.00 |

| 7 | s.zh | enginer | 2 | 26000.00 |

| 7 | s.zh | enginer | 1 | 28000.00 |

| 7 | s.zh | tester | 3 | 20000.00 |

+---------------+---------------+-----------+----------+---------------+

这里我们简要的说一下 GROUPING__ID 的计算规则

SELECT

grouping__id

,company_name

,dep_name

,user_id

,SUM(salary) AS total_salary

FROM datacube_salary_org

WHERE pt = '20200407'

GROUP BY

company_name

,dep_name

,user_id

WITH ROLLUP

ORDER BY

grouping__id

;

可以看到 GROUP BY 中有三个字段，依次是 company_name, dep_name, user_id，可以看作3位二进制（0/1),(0/1),(0/1) ，低位对应 company_name，高位对应 user_id ，如果这一位聚合了（GROUP BY 中存在该字段），则为0，GROUPING 函数对应也返回0 ，最后 GROUPING_ID 为对应2进制转换的 10进制数。

1) ROLLUP 的全分组的子集

SELECT

company_name

,dep_name

,user_id

,SUM(salary) AS total_salary

FROM datacube_salary_org

WHERE pt = '20200407'

GROUP BY

company_name

,dep_name

,user_id

对应的结果

+---------------+---------------+-----------+----------+---------------+

+---------------+---------------+-----------+----------+---------------+

| 7 | x.qx | kiccp | 7 | 8600.00 |

| 7 | x.qx | finance | 6 | 13000.00 |

| 7 | x.qx | finance | 5 | 24500.00 |

| 7 | x.qx | finance | 4 | 13400.00 |

| 7 | s.zh | enginer | 2 | 26000.00 |

| 7 | s.zh | enginer | 1 | 28000.00 |

| 7 | s.zh | tester | 3 | 20000.00 |

+---------------+---------------+-----------+----------+---------------+

因为 GROUP BY 这三个字段都有，则为 111 ，所以 GROUPING__ID 为7

2) ROLLUP company_name, dep_name 分组的子集

SELECT

company_name

,dep_name

,NULL

,SUM(salary) AS total_salary

FROM datacube_salary_org

WHERE pt = '20200407'

GROUP BY

company_name

,dep_name

对应的子集

+---------------+---------------+-----------+----------+---------------+

+---------------+---------------+-----------+----------+---------------+

| 3 | s.zh | tester | NULL | 20000.00 |

| 3 | x.qx | kiccp | NULL | 8600.00 |

| 3 | x.qx | finance | NULL | 50900.00 |

| 3 | s.zh | enginer | NULL | 54000.00 |

+---------------+---------------+-----------+----------+---------------+

CUBE

cube简称数据魔方，可以实现hive多个任意维度的查询，cube(a,b,c)则首先会对(a,b,c)进行group by，然后依次是(a,b),(a,c),(a),(b,c),(b),(c),最后在对全表进行group by，它会统计所选列中值的所有组合的聚合（即所有字段的组合）

SELECT

grouping__id

,company_name

,dep_name

,user_id

,SUM(salary) AS total_salary

FROM datacube_salary_org

WHERE pt = '20200407'

GROUP BY

company_name

,dep_name

,user_id

WITH CUBE

ORDER BY

grouping__id

;

等价于

(company_name,dep_name,user_id)

SELECT

company_name

,dep_name

,user_id

,SUM(salary) AS total_salary

FROM datacube_salary_org

WHERE pt = '20200407'

GROUP BY

company_name

,dep_name

,user_id

UNION ALL

SELECT

company_name

,dep_name

,NULL

,SUM(salary) AS total_salary

FROM datacube_salary_org

WHERE pt = '20200407'

GROUP BY

company_name

,dep_name

UNION ALL

SELECT

company_name

,NULL

,user_id

,SUM(salary) AS total_salary

FROM datacube_salary_org

WHERE pt = '20200407'

GROUP BY

company_name

,user_id

UNION ALL

SELECT

company_name

,NULL

,SUM(salary) AS total_salary

FROM datacube_salary_org

WHERE pt = '20200407'

GROUP BY

company_name

UNION ALL

SELECT

NULL

,dep_name

,user_id

,SUM(salary) AS total_salary

FROM datacube_salary_org

WHERE pt = '20200407'

GROUP BY

dep_name

,user_id

UNION ALL

SELECT

NULL

,dep_name

,NULL

,SUM(salary) AS total_salary

FROM datacube_salary_org

WHERE pt = '20200407'

GROUP BY

dep_name

UNION ALL

SELECT

NULL

,NULL

,user_id

,SUM(salary) AS total_salary

FROM datacube_salary_org

WHERE pt = '20200407'

GROUP BY

user_id

UNION ALL

SELECT

NULL

,NULL

,SUM(salary) AS total_salary

FROM datacube_salary_org

WHERE pt = '20200407'

;

结果如下：

+---------------+---------------+-----------+----------+---------------+

+---------------+---------------+-----------+----------+---------------+

| 1 | s.zh | NULL | NULL | 74000.00 |

| 1 | x.qx | NULL | NULL | 59500.00 |

| 3 | s.zh | tester | NULL | 20000.00 |

| 3 | s.zh | enginer | NULL | 54000.00 |

| 3 | x.qx | kiccp | NULL | 8600.00 |

| 3 | x.qx | finance | NULL | 50900.00 |

| 4 | NULL | NULL | 7 | 8600.00 |

| 4 | NULL | NULL | 5 | 24500.00 |

| 4 | NULL | NULL | 4 | 13400.00 |

| 4 | NULL | NULL | 3 | 20000.00 |

| 4 | NULL | NULL | 2 | 26000.00 |

| 4 | NULL | NULL | 1 | 28000.00 |

| 4 | NULL | NULL | 6 | 13000.00 |

| 5 | s.zh | NULL | 2 | 26000.00 |

| 5 | s.zh | NULL | 3 | 20000.00 |

| 5 | x.qx | NULL | 5 | 24500.00 |

| 5 | x.qx | NULL | 6 | 13000.00 |

| 5 | s.zh | NULL | 1 | 28000.00 |

| 5 | x.qx | NULL | 7 | 8600.00 |

| 5 | x.qx | NULL | 4 | 13400.00 |

| 6 | NULL | enginer | 1 | 28000.00 |

| 6 | NULL | finance | 4 | 13400.00 |

| 6 | NULL | tester | 3 | 20000.00 |

| 6 | NULL | finance | 5 | 24500.00 |

| 6 | NULL | kiccp | 7 | 8600.00 |

| 6 | NULL | enginer | 2 | 26000.00 |

| 6 | NULL | finance | 6 | 13000.00 |

| 7 | x.qx | finance | 5 | 24500.00 |

| 7 | x.qx | finance | 4 | 13400.00 |

| 7 | x.qx | kiccp | 7 | 8600.00 |

| 7 | s.zh | tester | 3 | 20000.00 |

| 7 | s.zh | enginer | 2 | 26000.00 |

| 7 | s.zh | enginer | 1 | 28000.00 |

| 7 | x.qx | finance | 6 | 13000.00 |

+---------------+---------------+-----------+----------+---------------+

总结：

grouping sets：指定字段的组合来做group by

with rollup：指定字段的组合依次从右向左来递减做group by

with cube：指定字段的全组合

————————————————

原文链接：https://blog.csdn.net/u010003835/article/details/105353510

Hive_Hive ROLLUP, GROUPING SETS, CUBE 聚合函数与 GROUPING 函数

ROLLUP

你可能感兴趣的:(Hive_Hive ROLLUP, GROUPING SETS, CUBE 聚合函数与 GROUPING 函数)

Hive_Hive ROLLUP, GROUPING SETS, CUBE 聚合函数 与 GROUPING 函数

ROLLUP

你可能感兴趣的:(Hive_Hive ROLLUP, GROUPING SETS, CUBE 聚合函数 与 GROUPING 函数)

Hive_Hive ROLLUP, GROUPING SETS, CUBE 聚合函数与 GROUPING 函数

你可能感兴趣的:(Hive_Hive ROLLUP, GROUPING SETS, CUBE 聚合函数与 GROUPING 函数)