获取每个用户连续登录的最长天数
drop table if EXISTS login_log;
create table login_log ( times timestamp, user_id string);
truncate table login_log;
insert into login_log VALUES
(from_unixtime(unix_timestamp('2022-03-16 12:00:00','yyyy-MM-dd HH:mm:ss')), '100001'),
(from_unixtime(unix_timestamp('2022-03-16 12:01:00','yyyy-MM-dd HH:mm:ss')), '100001'),
(from_unixtime(unix_timestamp('2022-03-13 12:01:00','yyyy-MM-dd HH:mm:ss')), '100011',
(from_unixtime(unix_timestamp('2022-03-13 12:01:00','yyyy-MM-dd HH:mm:ss')), '100011'),
(from_unixtime(unix_timestamp('2022-03-13 02:01:00','yyyy-MM-dd HH:mm:ss')), '100009'),
(from_unixtime(unix_timestamp('2022-03-12 12:01:00','yyyy-MM-dd HH:mm:ss')), '100004'),
(from_unixtime(unix_timestamp('2022-03-12 02:01:00','yyyy-MM-dd HH:mm:ss')), '100009'),
(from_unixtime(unix_timestamp('2022-03-11 02:01:00','yyyy-MM-dd HH:mm:ss')), '100009'),
(from_unixtime(unix_timestamp('2022-03-11 11:01:00','yyyy-MM-dd HH:mm:ss')), '100012'),
(from_unixtime(unix_timestamp('2022-03-10 11:18:00','yyyy-MM-dd HH:mm:ss')), '100012'),
(from_unixtime(unix_timestamp('2022-03-09 11:01:00','yyyy-MM-dd HH:mm:ss')), '100009'),
(from_unixtime(unix_timestamp('2022-03-08 11:18:00','yyyy-MM-dd HH:mm:ss')), '100009');
with tmp_user_serial as (
select b.user_id,
case when b.last_exists = 0 and b.next_exists = 0 then 1
when b.next_exists = 1 then datediff(lead(b.date_date,1) over(partition by b.user_id order by b.date_date), b.date_date)+1
else 0 end serial_day
from (select a.user_id, a.date_date,
case when lag(a.date_date,1) over(partition by a.user_id order by a.date_date) = last_date then 1 else 0 end last_exists,
case when lead(a.date_date,1) over(partition by a.user_id order by a.date_date) = next_date then 1 else 0 end next_exists
from (select user_id,to_date(times) date_date,date_sub(times,1) last_date, date_add(times,1) next_date
from login_log
group by user_id,to_date(times), date_sub(times,1),date_add(times,1)) a
) b
where last_exists = 0 or next_exists = 0)
select user_id, max(serial_day) max_seriral_day
from tmp_user_serial
group by user_id
实现考虑如下场景:
1、某用户就只登录过一次;
2、某用户登录过多次,但都不连续;
3、某用户连续登录过多次;
4、某用户一天内多登录;
实现思路:
1、对于一天多次登录的情况,用user_id ,登录日期 两个数据分组,得到某人某天的唯一数据;
并获取该日期的前一天及后一天数值,便于后续对比取用;
对应sql中的最内层
select user_id,to_date(times) date_date,date_sub(times,1) last_date, date_add(times,1) next_date
from login_log
group by user_id,to_date(times), date_sub(times,1),date_add(times,1)
2、通过对数据的lead lag处理,判断按照用户、日期分组排序后的数据前一条日期是否是当前登录日期的前一天 last_exists ,后一条是否是当前登录日期的后一天 next_exists
对应sql 中的第二层
select a.user_id, a.date_date,
case when lag(a.date_date,1) over(partition by a.user_id order by a.date_date) = last_date then 1 else 0 end last_exists,
case when lead(a.date_date,1) over(partition by a.user_id order by a.date_date) = next_date then 1 else 0 end next_exists
from (select user_id,to_date(times) date_date,date_sub(times,1) last_date, date_add(times,1) next_date
from login_log
group by user_id,to_date(times), date_sub(times,1),date_add(times,1)) a
3、根据登录日期的前后一天数据是否存在判断是否连续
(1)若都存在,则说明该数据是连续日期中的中间数据,last_exists = 1 and next_exists =1
(2)若仅前一天数据存在,则说明该数据是连续日期中的最后一天,last_exists = 1 and next_exists =0
(3)若仅后一天数据存在,则说明该数据是连续日期中的第一天, last_exists = 0 and next_exists =1
(4)若前后一天都不存在,则说明该数据前后不连续,连续天数为1, last_exists = 0 and next_exists =0
该处计算思路:
1、对于连续的日期数据(两条及以上),排除掉连续日期的中间数据后,按照时间顺序排列,前一天和最后一天相连,lead一下可以直接获取到其日期差,从而得到连续登录天数;
2、对于不连续的日期数据,直接给值1 即可;
对应sql中的第三层
select b.user_id,
case when b.last_exists = 0 and b.next_exists = 0 then 1
when b.next_exists = 1 then datediff(lead(b.date_date,1) over(partition by b.user_id order by b.date_date), b.date_date)+1
else 0 end serial_day
from (select a.user_id, a.date_date,
case when lag(a.date_date,1) over(partition by a.user_id order by a.date_date) = last_date then 1 else 0 end last_exists,
case when lead(a.date_date,1) over(partition by a.user_id order by a.date_date) = next_date then 1 else 0 end next_exists
from (select user_id,to_date(times) date_date,date_sub(times,1) last_date, date_add(times,1) next_date
from login_log
group by user_id,to_date(times), date_sub(times,1),date_add(times,1)) a
) b
where last_exists = 0 or next_exists = 0
4、根据用户获取其最大连续登录天数,以用户分组,获取连续天数的最大值即可
对应sql
select user_id, max(serial_day) max_seriral_day
from tmp_user_serial
group by user_id