hive GROUPING SETS通过GROUPING__ID 得到聚合的字段

Grouping_ID函数

当我们没有统计某一列时,它的值显示为null,这可能与列本身就有null值冲突,这就需要一种方法区分是没有统计还是值本来就是null。(写一个排列组合的算法,就马上理解了,grouping_id其实就是所统计各列二进制和)


Column1 (key) Column2 (value)
1 NULL
1
1
2
2
3
3
3
NULL
4
5

hql统计:

  SELECT key, value, GROUPING__ID, count(*) from T1 GROUP BY key, value WITH ROLLUP

统计结果如下:

       
NULL NULL 0     00 6
1 NULL 1     10 2
1 NULL 3     11 1
1 1 3     11 1
2 NULL 1     10 1
2 2 3     11 1
3 NULL 1     10 2
3 NULL 3     11 1
3 3 3     11 1
4 NULL 1     10 1
4 5 3     11 1




如果列中没有有null值

SELECT fact_1_id,
       fact_2_id,
       SUM(sales_value) AS sales_value,
       (case when fact_1_id is null then 1 else 0 end) as f1g,
       (case when fact_2_id is null then 1 else 0 end) as f2
FROM   dimension_tab
GROUP BY fact_1_id, fact_2_id WITH CUBE
ORDER BY fact_1_id, fact_2_id;
如果列中本来就有null值

SELECT fact_1_id,
       fact_2_id,
       SUM(sales_value) AS sales_value,
       (case when (CAST (GROUPING__ID AS INT) & 1) = 0 then 1 else 0 end) as f1g,
       (case when (CAST (GROUPING__ID AS INT) & 2) = 0 then 1 else 0 end) as f2g
FROM   dimension_tab
GROUP BY fact_1_id, fact_2_id WITH CUBE
ORDER BY fact_1_id, fact_2_id;


同理 第一个字段为1  第二字段为2 第三地段为4  第四个字段 16 

如下


(case when (CAST (GROUPING__ID AS INT) & 1) = 1 then '1 ' else 0 end),
(case when (CAST (GROUPING__ID AS INT) & 2) = 2 then '2' else 0 end) ,
(case when (CAST (GROUPING__ID AS INT) & 4) = 4 then '3' else 0 end) ,

第一位如果和1与为1,证明第一位存在同时包括本来存在的null值,

用来判断每一个字段知否为聚合的字段中,null值本来的null还是grouping set的null值。


参考

http://stackoverflow.com/questions/29577887/grouping-in-hive

https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation,+Cube,+Grouping+and+Rollup

你可能感兴趣的:(hive GROUPING SETS通过GROUPING__ID 得到聚合的字段)