神级Hive的SQL案列

Hive实战案例——级联求和
需求:
有如下访客访问次数统计表 t_access_times

访客  月份  访问次数
A       2015-01     5
A       2015-01     15
B       2015-01     5
A       2015-01     8
B       2015-01     25
A       2015-01     5
A       2015-02     4
A       2015-02     6
B       2015-02     10
B       2015-02     5
……  ……  ……

需要输出报表:t_access_times_accumulate

访客  月份  月访问总计   累计访问总计
A       2015-01     33          33
A       2015-02     10          43
……. ……. ……. …….
B       2015-01     30          30
B       2015-02     15          45
……. ……. ……. …….

实现步骤
可以用一个hql语句即可实现:

select A.username,A.month,max(A.salary) as salary,sum(B.salary) as accumulate
from 
(select username,month,sum(salary) as salary from t_access_times group by username,month) A 
inner join 
(select username,month,sum(salary) as salary from t_access_times group by username,month) B
on
A.username=B.username
where B.month <= A.month
group by A.username,A.month
order by A.username,A.month;

实现的原理就是形成A、B俩个虚拟表进行join连接(自己连接自己)

+-------------+----------+-----------+-------------+----------+-----------+--+
| a.username  | a.month  | a.salary  | b.username  | b.month  | b.salary  |
+-------------+----------+-----------+-------------+----------+-----------+--+
| A           | 2015-01  | 33        | A           | 2015-01  | 33        |
| A           | 2015-01  | 33        | A           | 2015-02  | 10        |
| A           | 2015-02  | 10        | A           | 2015-01  | 33        |
| A           | 2015-02  | 10        | A           | 2015-02  | 10        |
| B           | 2015-01  | 30        | B           | 2015-01  | 30        |
| B           | 2015-01  | 30        | B           | 2015-02  | 15        |
| B           | 2015-02  | 15        | B           | 2015-01  | 30        |
| B           | 2015-02  | 15        | B           | 2015-02  | 15        |
+-------------+----------+-----------+-------------+----------+-----------+--+

对这样的虚拟表进行合并得到总的数量

你可能感兴趣的:(大数据)