hive的写法和sql类似,却又有一点不一样,本次采用模拟数据编写hql统计访问次数:
求出当月的访问次数,截至当月前的每个月最大访问次数、截至当月前每个用户总的访问次数。
数据表如下
A,2015-01,5 A,2015-01,15 B,2015-01,5 A,2015-01,8 B,2015-01,25 A,2015-01,5 A,2015-02,4 A,2015-02,6 B,2015-02,10 B,2015-02,5 A,2015-03,16 A,2015-03,22 B,2015-03,23 B,2015-03,10 B,2015-03,1
解法一:
--(1) # 先求出每个用户每个月总访问量 create table record2 as
select uname,umonth,sum(ucount) as current_month_cnt
from t_access
group by uname,umonth; # record_2 表中内容为: A 2015-01 33 A 2015-02 10 A 2015-03 38 B 2015-01 30 B 2015-02 15 B 2015-03 44 --(2)
select t1.uname,t1.umonth,t1.current_month_cnt,max(t2.current_month_cnt) as max_cnt,sum(t2.current_month_cnt) as sum_cnt
from record2 t1
join record2 t2
on t1.uname=t2.uname
where t1.umonth >=t2.umonth
group by t1.uname,t1.umonth,t1.current_month_cnt;
# 最终结果: A 2015-01 33 33 33 A 2015-02 10 43 33 A 2015-03 38 81 38 B 2015-01 30 30 30 B 2015-02 15 45 30 B 2015-03 44 89 44
解法二:
--(1) # 先求出每个用户每个月总访问量 create table record2 as
select uname,umonth,sum(ucount) as current_month_cnt
from t_access
group by uname,umonth; # record_2 表中内容为: A 2015-01 33 A 2015-02 10 A 2015-03 38 B 2015-01 30 B 2015-02 15 B 2015-03 44
--2
select uname,umonth,current_month_cnt ,
max (current_month_cnt) over(partition by uname order by umonth) as mac_cnt,
sum(current_month_cnt) over(partition by uname order by umonth) as sum_cnt
from
record2;
结果: A 2015-01 33 33 33 A 2015-02 10 43 33 A 2015-03 38 81 38 B 2015-01 30 30 30 B 2015-02 15 45 30 B 2015-03 44 89 44
代码参考:https://www.jianshu.com/p/cdde7125bc77