hive--连续登陆天数问题

登陆表:login_table

字段user_id,login_dt

连续登陆
user_id login_dt
a 2020-06-01
a 2020-06-02
a 2020-06-03
b 2020-06-01
b 2020-06-02

 

 

 

 

 

 

 

创建表:

create table if not exists adm_sdk_activity_analysis_total(
     user_id               string  comment '用户id'
    ,login_date            string  comment '登录日期'
) STORED as orc;

解题思路:(借助rownumber)

1.根据user_id进行去重,按照登陆日期进行排序后,拿到rn

2.连续登陆的日期减去rn会得到相同的日期

3.根据user_id和相减后的值进行分组就能得到连续登陆天数

sql:

select user_id
     ,unify_login_date
     ,min(login_date) as start_date
     ,max(login_date) as end_date
     ,count(login_date)
from
    (
        select a.user_id
             ,a.login_date
             ,date_sub(a.login_date,rn) as unify_login_date --如果是连续登陆的日期,此操作会得到同样的归一化unify_login_date
        from
            (
                select user_id
                     ,login_date
                     ,row_number()over(partition by user_id order by login_date asc) as rn
                from  user_login_table
            ) as a
    ) as b
group by user_id,unify_login_date
having count(login_date) > N;

 

你可能感兴趣的:(hive)