Hive实现累计报表查询

1.需求

有如下访客访问次数的统计表 t_access
访客	月份	访问次数
A	2015-01	5
A	2015-01	15
B	2015-01	5
A	2015-01	8
B	2015-01	25
A	2015-01	5
A	2015-02	4
A	2015-02	6
B	2015-02	10
B	2015-02	5
……	……	……

要求输出每个客户在每个月的总访问次数,以及在当前月份之前所有月份的累积访问次数。
输出表

访客	月份	月访问总计	累计访问总计
A	2015-01	33	33
A	2015-02	10	43
…….	…….	…….	…….
B	2015-01	30	30
B	2015-02	15	45
…….	…….	…….	…….

2.思路

1)第一步,先求每个用户的月总访问次数
select username,month,sum(count) as salary from t_access_times group by username,month
+-----------+----------+---------+--+
| username  |  month   | count   |
+-----------+----------+---------+--+
| A         | 2015-01  | 33      |
| A         | 2015-02  | 10      |
| B         | 2015-01  | 30      |
| B         | 2015-02  | 15      |
+-----------+----------+---------+--+
2)第二步,将月总访问次数表 自己连接 自己连接(内连接)
(select username,month,sum(count) as salary from t_access_times group by username,month) A 
join 
(select username,month,sum(count) as salary from t_access_times group by username,month) B
on 
A.username=B.username
+-------------+----------+-----------+-------------+----------+-----------+--+
| a.username  | a.month  | a.salary  | b.username  | b.month  | b.salary  |
+-------------+----------+-----------+-------------+----------+-----------+--+
| A           | 2015-01  | 33        | A           | 2015-01  | 33        |
| A           | 2015-01  | 33        | A           | 2015-02  | 10        |
| A           | 2015-02  | 10        | A           | 2015-01  | 33        |
| A           | 2015-02  | 10        | A           | 2015-02  | 10        |
| B           | 2015-01  | 30        | B           | 2015-01  | 30        |
| B           | 2015-01  | 30        | B           | 2015-02  | 15        |
| B           | 2015-02  | 15        | B           | 2015-01  | 30        |
| B           | 2015-02  | 15        | B           | 2015-02  | 15        |
+-------------+----------+-----------+-------------+----------+-----------+--+

3)第三步,从上一步的结果中进行分组查询,分组的字段是a.username a.month,求月累计值:  将b.month <= a.month的所有b.salary求和即可

3.HQL

select A.username,A.month,max(A.count) ,sum(B.count) 
from 
(select username,month,sum(count) as count from t_accessgroup by username,month) A 
inner join 
(select username,month,sum(count) as count from t_access group by username,month) B
on
A.username=B.username
where B.month <= A.month
group by A.username,A.month
order by A.username,A.month;


你可能感兴趣的:(Hive,大数据)