hive:级联报表查询

级联报表查询

假如,有如下数据(文件名:accumulate.dat):

A,2015-01,5
A,2015-01,15
B,2015-01,5
A,2015-01,8
B,2015-01,25
A,2015-01,5
C,2015-01,10
C,2015-01,20
A,2015-02,4
A,2015-02,6
C,2015-02,30
C,2015-02,10
B,2015-02,10
B,2015-02,5
A,2015-03,14
A,2015-03,6
B,2015-03,20
B,2015-03,25
C,2015-03,10
C,2015-03,20

需求:统计每个用户的月总额、累计到当月的总额:

-----统计结果如下累计报表

用户

月份

月总额

累计到当月的总额

A

2015-01

33

33

A

2015-02

10

43

A

2015-03

30

73

B

2015-01

30

30

B

2015-02

15

45

...

...

...

...

 

解决方法1: 复杂sql--累计报表

建表映射:

create table t_access_times(username string,month string,counts int)
row format delimited fields terminated by ',';

导入数据(accumulate.dat):

load data local inpath '/root/accumulate.dat' into table t_access_times;

1、第一步,先求个用户的月总金额

select username,month,sum(salary) as salary from t_access_times group by username,month

  输出结果

+-----------+----------+---------+--+

| username  |  month   | salary  |

+-----------+----------+---------+--+

| A         | 2015-01  | 33      |

| A         | 2015-02  | 10      |

| B         | 2015-01  | 30      |

| B         | 2015-02  | 15      |

+-----------+----------+---------+--+

2、第二步,将月总金额表自己连接 自己连接

select A.*,B.* FROM

(select username,month,sum(salary) as salary from t_access_times group by username,month) A 

inner join 

(select username,month,sum(salary) as salary from t_access_times group by username,month) B

on

A.username=B.username

where B.month <= A.month

  输出结果

+-------------+----------+-----------+-------------+----------+-----------+--+

| a.username  | a.month  | a.salary  | b.username  | b.month  | b.salary  |

+-------------+----------+-----------+-------------+----------+-----------+--+

| A           | 2015-01  | 33        | A           | 2015-01  | 33        |

| A           | 2015-01  | 33        | A           | 2015-02  | 10        |

| A           | 2015-02  | 10        | A           | 2015-01  | 33        |

| A           | 2015-02  | 10        | A           | 2015-02  | 10        |

| B           | 2015-01  | 30        | B           | 2015-01  | 30        |

| B           | 2015-01  | 30        | B           | 2015-02  | 15        |

| B           | 2015-02  | 15        | B           | 2015-01  | 30        |

| B           | 2015-02  | 15        | B           | 2015-02  | 15        |

+-------------+----------+-----------+-------------+----------+-----------+--+

第3步:得到最终结果

select auname,amonth,acnts,sum(bcnts)

from t_tmp2

group by auname,amonth,acnts;

 

当然,也可以把整个逻辑过程写成一个SQL语句:

select A.username,A.month,max(A.salary) as salary,sum(B.salary) as accumulate

from 

(select username,month,sum(salary) as salary from t_access_times group by username,month) A 

inner join 

(select username,month,sum(salary) as salary from t_access_times group by username,month) B

on

A.username=B.username

where B.month <= A.month

group by A.username,A.month

order by A.username,A.month;

 

解决2:使用窗口分析函数--累计报表

-- 窗口分析函数  sum() over()  :可以实现在窗口中进行逐行累加

 

0: jdbc:hive2://localhost:10000> select * from  t_access_amount;

+----------------------+------------------------+-------------------------+--+

| t_access_amount.uid  | t_access_amount.month  | t_access_amount.amount  |

+----------------------+------------------------+-------------------------+--+

| A                    | 2015-01                | 33                      |

| A                    | 2015-02                | 10                      |

| A                    | 2015-03                | 20                      |

| B                    | 2015-01                | 30                      |

| B                    | 2015-02                | 15                      |

| B                    | 2015-03                | 45                      |

| C                    | 2015-01                | 30                      |

| C                    | 2015-02                | 40                      |

| C                    | 2015-03                | 30                      |

+----------------------+------------------------+-------------------------+--+

求出每个人截止到每个月的总额

select uid,month,amount,

sum(amount) over(partition by uid order by month rows between unbounded preceding and current row) as accumulate

from t_access_amount;

 

你可能感兴趣的:(Hadoop)