业务场景:
有如下访客每次金额统计表 t_access_times
访客 |
月份 |
金额 |
A |
2015-01 |
5 |
A |
2015-01 |
15 |
B |
2015-01 |
5 |
A |
2015-01 |
8 |
B |
2015-01 |
25 |
A |
2015-01 |
5 |
A |
2015-02 |
4 |
A |
2015-02 |
6 |
B |
2015-02 |
10 |
B |
2015-02 |
5 |
…… |
…… |
…… |
需要输出报表:t_access_times_accumulate
访客 |
月份 |
月访问总计 |
累计金额 |
A |
2015-01 |
33 |
33 |
A |
2015-02 |
10 |
43 |
……. |
……. |
……. |
……. |
B |
2015-01 |
30 |
30 |
B |
2015-02 |
15 |
45 |
……. |
……. |
……. |
……. |
实现步骤:
1.创建表,并导入数据
create table t_access_times(username string,month string,salary int)
row format delimited fields terminated by ',';
load data local inpath '/home/hadoop/t_access_times.dat' into table t_access_times;
2.第一步,先求每个用户的月总金额
select username,month,sum(salary) as salary from t_access_times group by username,month
结果如下:
+-----------+----------+---------+--+
| username | month | salary |
+-----------+----------+---------+--+
| A | 2015-01 | 33 |
| A | 2015-02 | 10 |
| B | 2015-01 | 30 |
| B | 2015-02 | 15 |
+-----------+----------+---------+--+
第二步,将月总金额表 自己跟自己join
select A.*,B.* FROM
(select username,month,sum(salary) as salary from t_access_times group by username,month) A
inner join
(select username,month,sum(salary) as salary from t_access_times group by username,month) B
on
A.username=B.username;
结果如下:
+-------------+----------+-----------+-------------+----------+-----------+--+
| a.username | a.month | a.salary | b.username | b.month | b.salary |
+-------------+----------+-----------+-------------+----------+-----------+--+
| A | 2015-01 | 33 | A | 2015-01 | 33 |
| A | 2015-01 | 33 | A | 2015-02 | 10 |
| A | 2015-02 | 10 | A | 2015-01 | 33 |
| A | 2015-02 | 10 | A | 2015-02 | 10 |
| B | 2015-01 | 30 | B | 2015-01 | 30 |
| B | 2015-01 | 30 | B | 2015-02 | 15 |
| B | 2015-02 | 15 | B | 2015-01 | 30 |
| B | 2015-02 | 15 | B | 2015-02 | 15 |
+-------------+----------+-----------+-------------+----------+-----------+--+
3、第三步,从上一步的结果中
进行分组查询,分组的字段是a.username和 a.month
求月累计值: 将b.month <= a.month的所有b.salary求和即可
将三步结合,SQL如下:
select A.username,A.month,max(A.salary) as salary,sum(B.salary) as accumulate
from
(select username,month,sum(salary) as salary from t_access_times group by username,month) A
inner join
(select username,month,sum(salary) as salary from t_access_times group by username,month) B
on
A.username=B.username
where B.month <= A.month
group by A.username,A.month
order by A.username,A.month;
输出结果:
访客 月份 月金额总计 累计金额
A 2015-01 33 33
A 2015-02 10 43
……. ……. ……. …….
B 2015-01 30 30
B 2015-02 15 45
……. ……. ……. …….