HQL聚集函数可以使用GROUPING SETS, CUBE, 和ROLLUP等关键词。
- GROUPING SETS
该子句等同于GROUP BY子句和UNION ALL子句一起组合使用。另外该子句是在单一阶段一次性完成相关处理,效率相对更高。GROUPING SETS这个子句后是空集合的话,会计算整体聚集。GROUPING SETS这个子句后()之外的部分,用于确定 UNION ALL的执行方式和个数;()之内的部分,用于确定GROUP BY的执行方式。
例1.1 一个元素:一个两列组合
SELECT
name, start_date, count(sin_number) as sin_cnt
FROM employee_hr
GROUP BY name, start_date
GROUPING SETS((name, start_date));
--||-- equals to
SELECT
name, start_date, count(sin_number) as sin_cnt
FROM employee_hr
GROUP BY name, start_date;
+---------+------------+---------+
| name | start_date | sin_cnt |
+---------+------------+---------+
| Lucy | 2010-01-03 | 1 |
| Michael | 2014-01-29 | 1 |
| Steven | 2012-11-03 | 1 |
| Will | 2013-10-02 | 1 |
+---------+------------+---------+
4 rows selected (26.3 seconds)
例1.2 两个元素:两个列
SELECT
name, start_date, count(sin_number) as sin_cnt
FROM employee_hr
GROUP BY name, start_date
GROUPING SETS(name, start_date);
--||-- equals to
SELECT
name, null as start_date, count(sin_number) as sin_cnt
FROM employee_hr
GROUP BY name
UNION ALL
SELECT
null as name, start_date, count(sin_number) as sin_cnt
FROM employee_hr
GROUP BY start_date;
----------+------------+---------+
| name | start_date | sin_cnt |
+---------+------------+---------+
| NULL | 2010-01-03 | 1 |
| NULL | 2012-11-03 | 1 |
| NULL | 2013-10-02 | 1 |
| NULL | 2014-01-29 | 1 |
| Lucy | NULL | 1 |
| Michael | NULL | 1 |
| Steven | NULL | 1 |
| Will | NULL | 1 |
+---------+------------+---------+
8 rows selected (22.658 seconds)
例1.3 两个元素:一个两列组合,一个列
SELECT
name, start_date, count(sin_number) as sin_cnt
FROM employee_hr
GROUP BY name, start_date
GROUPING SETS((name, start_date), name);
--||-- equals to
SELECT
name, start_date, count(sin_number) as sin_cnt
FROM employee_hr
GROUP BY name, start_date
UNION ALL
SELECT
name, null as start_date, count(sin_number) as sin_cnt
FROM employee_hr
GROUP BY name;
+---------+------------+---------+
| name | start_date | sin_cnt |
+---------+------------+---------+
| Lucy | NULL | 1 |
| Lucy | 2010-01-03 | 1 |
| Michael | NULL | 1 |
| Michael | 2014-01-29 | 1 |
| Steven | NULL | 1 |
| Steven | 2012-11-03 | 1 |
| Will | NULL | 1 |
| Will | 2013-10-02 | 1 |
+---------+------------+---------+
8 rows selected (22.503 seconds)
例1.4 四个元素:两列的所有排列组合
SELECT
name, start_date, count(sin_number) as sin_cnt
FROM employee_hr
GROUP BY name, start_date
GROUPING SETS((name, start_date), name, start_date, ());
--||-- equals to
SELECT
name, start_date, count(sin_number) as sin_cnt
FROM employee_hr
GROUP BY name, start_date
UNION ALL
SELECT
name, null as start_date, count(sin_number) as sin_cnt
FROM employee_hr
GROUP BY name
UNION ALL
SELECT
null as name, start_date, count(sin_number) as sin_cnt
FROM employee_hr
GROUP BY start_date
UNION ALL
SELECT
null as name, null as start_date, count(sin_number) as sin_cnt
FROM employee_hr
+---------+------------+---------+
| name | start_date | sin_cnt |
+---------+------------+---------+
| NULL | NULL | 4 |
| NULL | 2010-01-03 | 1 |
| NULL | 2012-11-03 | 1 |
| NULL | 2013-10-02 | 1 |
| NULL | 2014-01-29 | 1 |
| Lucy | NULL | 1 |
| Lucy | 2010-01-03 | 1 |
| Michael | NULL | 1 |
| Michael | 2014-01-29 | 1 |
| Steven | NULL | 1 |
| Steven | 2012-11-03 | 1 |
| Will | NULL | 1 |
| Will | 2013-10-02 | 1 |
+---------+------------+---------+
13 rows selected (24.916 seconds)
- ROLLUP
提供n+1层级的聚集计算,这里n为参与分组的列的个数。例如GROUP BY a,b,c WITH ROLLUP 等效于 GROUP BY a,b,c GROUPING SETS ((a,b,c),(a,b),(a),()) - CUBE
提供2的n次方个层级的聚集计算,这里n为参与分组的列的个数,这个层级数为n个元素所有组合数。例如GROUP BY a,b,c WITH CUBE等效于 GROUP BY a,b,c GROUPING SETS ((a,b,c),(a,b),(b,c),(a,c),(a),(b),(c),()) - GROUPING__ID 和 GROUPING 函数
GROUPING__ID函数,无需输入参数,返回值用来标识用于聚集计算的层次,这个值是GROUP BY后具体列组合的位向量的数字值。具有相同GROUP BY后具体列组合的行,该函数返回相同的数字ID。
GROUPING函数用于判断某列是否包含在当前行的聚集计算(也即是否包含在该行的GROUP BY之后)。0,指不包含在GROUP BY之后的列中;1,指包含在GROUP BY之后的列中。请看以下示例,
SELECT
name, start_date, count(employee_id) as emp_id_cnt,
GROUPING__ID,
grouping(name) as gp_name,
grouping(start_date) as gp_sd
FROM employee_hr
GROUP BY name, start_date
WITH CUBE ORDER BY name, start_date;
+---------+------------+------------+-----+---------+-------+
| name | start_date | emp_id_cnt | gid | gp_name | gp_sd |
+---------+------------+------------+-----+---------+-------+
| NULL | NULL | 4 | 3 | 1 | 1 |
| NULL | 2010-01-03 | 1 | 2 | 1 | 0 |
| NULL | 2012-11-03 | 1 | 2 | 1 | 0 |
| NULL | 2013-10-02 | 1 | 2 | 1 | 0 |
| NULL | 2014-01-29 | 1 | 2 | 1 | 0 |
| Lucy | NULL | 1 | 1 | 0 | 1 |
| Lucy | 2010-01-03 | 1 | 0 | 0 | 0 |
| Michael | NULL | 1 | 1 | 0 | 1 |
| Michael | 2014-01-29 | 1 | 0 | 0 | 0 |
| Steven | NULL | 1 | 1 | 0 | 1 |
| Steven | 2012-11-03 | 1 | 0 | 0 | 0 |
| Will | NULL | 1 | 1 | 0 | 1 |
| Will | 2013-10-02 | 1 | 0 | 0 | 0 |
+---------+------------+------------+-----+---------+-------+
13 rows selected (55.507 seconds)