hive实现累计报表详解

业务场景:

有如下访客每次金额统计表 t_access_times

访客

月份

金额

A

2015-01

5

A

2015-01

15

B

2015-01

5

A

2015-01

8

B

2015-01

25

A

2015-01

5

A

2015-02

4

A

2015-02

6

B

2015-02

10

B

2015-02

5

……

……

……

 

需要输出报表:t_access_times_accumulate

访客

月份

月访问总计

累计金额

A

2015-01

33

33

A

2015-02

10

43

…….

…….

…….

…….

B

2015-01

30

30

B

2015-02

15

45

…….

…….

…….

…….

 

实现步骤:

1.创建表,并导入数据

create table t_access_times(username string,month string,salary int)
row format delimited fields terminated by ',';

load data local inpath '/home/hadoop/t_access_times.dat' into table t_access_times;

2.第一步,先求每个用户的月总金额

select username,month,sum(salary) as salary from t_access_times group by username,month

结果如下:

+-----------+----------+---------+--+
| username  |  month   | salary  |
+-----------+----------+---------+--+
| A         | 2015-01  | 33      |
| A         | 2015-02  | 10      |
| B         | 2015-01  | 30      |
| B         | 2015-02  | 15      |
+-----------+----------+---------+--+

第二步,将月总金额表 自己跟自己join

select A.*,B.* FROM
(select username,month,sum(salary) as salary from t_access_times group by username,month) A  
inner join 
(select username,month,sum(salary) as salary from t_access_times group by username,month) B 
on
A.username=B.username;

结果如下:

+-------------+----------+-----------+-------------+----------+-----------+--+
| a.username  | a.month  | a.salary  | b.username  | b.month  | b.salary  |
+-------------+----------+-----------+-------------+----------+-----------+--+
| A           | 2015-01  | 33        | A           | 2015-01  | 33        |
| A           | 2015-01  | 33        | A           | 2015-02  | 10        |
| A           | 2015-02  | 10        | A           | 2015-01  | 33        |
| A           | 2015-02  | 10        | A           | 2015-02  | 10        |
| B           | 2015-01  | 30        | B           | 2015-01  | 30        |
| B           | 2015-01  | 30        | B           | 2015-02  | 15        |
| B           | 2015-02  | 15        | B           | 2015-01  | 30        |
| B           | 2015-02  | 15        | B           | 2015-02  | 15        |
+-------------+----------+-----------+-------------+----------+-----------+--+

3、第三步,从上一步的结果中
进行分组查询,分组的字段是a.username和 a.month
求月累计值:  将b.month <= a.month的所有b.salary求和即可

将三步结合,SQL如下:

select A.username,A.month,max(A.salary) as salary,sum(B.salary) as accumulate
from 
(select username,month,sum(salary) as salary from t_access_times group by username,month) A 
inner join 
(select username,month,sum(salary) as salary from t_access_times group by username,month) B
on
A.username=B.username
where B.month <= A.month
group by A.username,A.month
order by A.username,A.month;

输出结果:

访客    月份    月金额总计    累计金额
A        2015-01       33            33
A        2015-02       10            43
…….        …….    …….       …….        
B        2015-01       30            30
B        2015-02       15            45
…….        …….    …….    …….

你可能感兴趣的:(hive)