SQL-rollup、cube(hive)

先举个例:

group by WITH ROLLUP

mysql> select dep,pos,avg(sal) from employee group by dep,pos with rollup;  
+------+------+-----------+  
| dep | pos | avg(sal) |  
+------+------+-----------+  
| 01 | 01 | 1500.0000 |  
| 01 | 02 | 1950.0000 |  
| 01 | NULL | 1725.0000 |  
| 02 | 01 | 1500.0000 |  
| 02 | 02 | 2450.0000 |  
| 02 | NULL | 2133.3333 |  
| 03 | 01 | 2500.0000 |  
| 03 | 02 | 2550.0000 |  
| 03 | NULL | 2533.3333 |  
| NULL | NULL | 2090.0000 |  

首先会根据dep变量,将原始数据分为三个01,02,03三个组,从数据表上看,第5,6行是一个聚合,group by WITH ROLLUP会在每个分组后面加上本组类的信息,di7行数据就是5,6行数据聚合所执行avg(sal)所得的结果,依次类推,02,03也是一样,同时在最后,会将全部的分组聚合。

Hive:

GROUPING SETS

在一个GROUP BY查询中,根据不同的维度组合进行聚合,等价于将不同维度的GROUP BY结果集进行UNION ALL

SELECT 
month,
day,
COUNT(DISTINCT cookieid) AS uv,
GROUPING__ID 
FROM lxw1234 
GROUP BY month,day 
GROUPING SETS (month,day) 
ORDER BY GROUPING__ID; 
month      day            uv      GROUPING__ID
------------------------------------------------
2015-03    NULL            5       1 
2015-04    NULL            6       1
NULL       2015-03-10      4       2
NULL       2015-03-12      1       2
NULL       2015-04-12      2       2
NULL       2015-04-13      3       2
NULL       2015-04-15      2       2
NULL       2015-04-16      2       2  
等价于 
SELECT 
month,
NULL,
COUNT(DISTINCT cookieid) AS uv,
1 AS GROUPING__ID 
FROM lxw1234 
GROUP BY month 
UNION ALL 
SELECT 
NULL,
day,
COUNT(DISTINCT cookieid) AS uv,
2 AS GROUPING__ID 
FROM lxw1234 
GROUP BY day

再如:

SELECT 
month,
day,
COUNT(DISTINCT cookieid) AS uv,
GROUPING__ID 
FROM lxw1234 
GROUP BY month,day 
GROUPING SETS (month,day,(month,day)) 
ORDER BY GROUPING__ID; 
month         day             uv      GROUPING__ID
------------------------------------------------
2015-03       NULL            5       1
2015-04       NULL            6       1
NULL          2015-03-10      4       2
NULL          2015-03-12      1       2
NULL          2015-04-12      2       2
NULL          2015-04-13      3       2
NULL          2015-04-15      2       2
NULL          2015-04-16      2       2
2015-03       2015-03-10      4       3
2015-03       2015-03-12      1       3
2015-04       2015-04-12      2       3
2015-04       2015-04-13      3       3
2015-04       2015-04-15      2       3
2015-04       2015-04-16      2       3  
等价于
SELECT month,
NULL,
COUNT(DISTINCT cookieid) AS uv,
1 AS GROUPING__ID 
FROM lxw1234 
GROUP BY month 
UNION ALL 
SELECT NULL,
day,
COUNT(DISTINCT cookieid) AS uv,
2 AS GROUPING__ID 
FROM lxw1234 
GROUP BY day
UNION ALL 
SELECT month,
day,
COUNT(DISTINCT cookieid) AS uv,
3 AS GROUPING__ID 
FROM lxw1234 
GROUP BY month,day

其中的 GROUPING__ID,表示结果属于哪一个分组集合。

CUBE

根据GROUP BY的维度的所有组合进行聚合。

SELECT 
month,
day,
COUNT(DISTINCT cookieid) AS uv,
GROUPING__ID 
FROM lxw1234 
GROUP BY month,day 
WITH CUBE 
ORDER BY GROUPING__ID;
  month            day           uv     GROUPING__ID
  --------------------------------------------
  NULL            NULL            7       0
  2015-03         NULL            5       1
  2015-04         NULL            6       1
  NULL            2015-04-12      2       2
  NULL            2015-04-13      3       2
  NULL            2015-04-15      2       2
  NULL            2015-04-16      2       2
  NULL            2015-03-10      4       2
  NULL            2015-03-12      1       2
  2015-03         2015-03-10      4       3
  2015-03         2015-03-12      1       3
  2015-04         2015-04-16      2       3
  2015-04         2015-04-12      2       3
  2015-04         2015-04-13      3       3
  2015-04         2015-04-15      2       3   
  等价于
  SELECT 
  NULL,
  NULL,
  COUNT(DISTINCT cookieid) AS uv,
  0 AS GROUPING__ID 
  FROM lxw1234
  UNION ALL 
  SELECT 
  month,
  NULL,
  COUNT(DISTINCT cookieid) AS uv,
  1 AS GROUPING__ID 
  FROM lxw1234 
  GROUP BY month 
  UNION ALL 
  SELECT 
  NULL,
  day,
  COUNT(DISTINCT cookieid) AS uv,
  2 AS GROUPING__ID 
  FROM lxw1234 
  GROUP BY day
  UNION ALL 
  SELECT 
  month,
  day,
  COUNT(DISTINCT cookieid) AS uv,
  3 AS GROUPING__ID 
  FROM lxw1234 
  GROUP BY month,day

ROLLUP

CUBE的子集,以最左侧的维度为主,从该维度进行层级聚合。

比如,以month维度进行层级聚合:
SELECT 
month,
day,
COUNT(DISTINCT cookieid) AS uv,
GROUPING__ID  
FROM lxw1234 
GROUP BY month,day
WITH ROLLUP 
ORDER BY GROUPING__ID;
 month            day             uv     GROUPING__ID
 ---------------------------------------------------
 NULL             NULL            7       0
 2015-03          NULL            5       1
 2015-04          NULL            6       1
 2015-03          2015-03-10      4       3
 2015-03          2015-03-12      1       3
 2015-04          2015-04-12      2       3
 2015-04          2015-04-13      3       3
 2015-04          2015-04-15      2       3
 2015-04          2015-04-16      2       3 
 可以实现这样的上钻过程:月天的UV->月的UV->总UV
--把month和day调换顺序,则以day维度进行层级聚合: 
SELECT 
day,
month,
COUNT(DISTINCT cookieid) AS uv,
GROUPING__ID  
FROM lxw1234 
GROUP BY day,month 
WITH ROLLUP 
ORDER BY GROUPING__ID;
  day  		month             uv     GROUPING__ID
-------------------------------------------------------
NULL            NULL               7       0
2015-04-13      NULL               3       1
2015-03-12      NULL               1       1
2015-04-15      NULL               2       1
2015-03-10      NULL               4       1
2015-04-16      NULL               2       1
2015-04-12      NULL               2       1
2015-04-12      2015-04            2       3
2015-03-10      2015-03            4       3
2015-03-12      2015-03            1       3
2015-04-13      2015-04            3       3
2015-04-15      2015-04            2       3
2015-04-16      2015-04            2       3 
可以实现这样的上钻过程:天月的UV->天的UV->总UV
(这里,根据天和月进行聚合,和根据天聚合结果一样,因为有父子关系,如果是其他维度组合的话,就会不一样)

你可能感兴趣的:(HIVE,SQL)