数据: 现有用户登录记录表user_log
用户id 登录日期
userid login_data
yh001 2019-11-25
yh001 2019-11-26
yh001 2019-11-27
yh001 2019-12-01
yh001 2019-12-02
yh001 2019-12-26
yh001 2019-12-27
yh002 2019-12-29
yh002 2019-12-30
解题关键:如何判断连续
思路:窗口函数
可通过对用户进行分组排序后,用登录日期减去排序序号,如果连续的话最终得到的日期相等。
(1)首先按用户登录时间升序排序
select userid
,login_data
,row_number() over(partition by userid order by login_data) as rk
from user_log
结果:
userid login_data rk
yh001 2019-11-25 1
yh001 2019-11-26 2
yh001 2019-11-27 3
yh001 2019-12-01 4
yh001 2019-12-02 5
yh001 2019-12-26 6
yh001 2019-12-27 7
yh002 2019-12-29 1
yh002 2019-12-30 2
(2)将用户登录日期与相应的排序序号做日期差值
DATE_SUB日期差值函数
select userid
,date_sub(log_date,interval rk day) dsub
from
(select user_id
,log_date
,row_number() over(partition by user_id order by log_date) as rk
from user_log) a
结果:
userid dsub
yh001 2019-11-24
yh001 2019-11-24
yh001 2019-11-24
yh001 2019-11-27
yh001 2019-11-27
yh001 2019-12-20
yh001 2019-12-20
yh002 2019-12-28
yh002 2019-12-28
(3)分别统计用户登录时间相同的数量,即分组求和
select userid,dsub,count(1) as num
from
(select userid
,date_sub(log_date,interval rk day) dsub
from
(select user_id
,log_date
,row_number() over(partition by user_id order by log_date) as rk
from user_log) a
) b
group by userid,dsub
结果:
userid dsub num
yh001 2019-11-24 3
yh001 2019-11-27 2
yh001 2019-12-20 2
yh002 2019-12-28 2
(4)筛选出连续N天登陆的用户(N自定义)
select userid
from
(select userid,dsub,count(1) as num
from
(select userid
,date_sub(log_date,interval rk day) dsub
from
(select user_id
,log_date
,row_number() over(partition by user_id order by log_date) as rk
from user_log) a
) b
group by userid,dsub
)
where num=3
结果:
userid
yh001
思路:与第一道前三步一样,差异在第四步求最大值
select userid,max(num) num_max
from
(select userid,dsub,count(1) as num
from
(select userid
,date_sub(log_date,interval rk day) dsub
from
(select user_id
,log_date
,row_number() over(partition by user_id order by log_date) as rk
from user_log) a
) b
group by userid,dsub
)
结果:
userid num_max
yh001 3
yh001 2
问题:
a表记录了点击的流水信息,包括用户id ,和点击时间
user_id click_time
a t1
a t2
b t3
a t4
a t5
a t6
a t7
思路:此题是连续登陆天数的变形题。
1、连续点击N次,是一个时间上的连续,不能使用日期的相减
2、对点击时间进行一个分组排序,得到rk_1 1 2 3 4 5 6 7
row_number() over(order by click_time) as rk_1
3、对用户进行一个分组,并以点击时间进行排序,得到rk_2 1 2 1 3 4 5 6
row_number() over(partition by usr_id order by click_time)
4、如果用户连续点击,则对应的diff=rk_1-rk_2值相同即 0 0 2 1 1 1 1
这时我们发现只需要对diff进行分组计数大于N个就是连续点击大于N且中间没有其他人点击的用户
整体代码:
select distinct user_id
from
(select user_id,rk_1-rk_2 diff
from
(select user_id
,row_number() over(order by click_time) rk_1
,row_number() over(partition by usr_id order by click_time) rk_2
from a
) b
) c
group by user_id,diff
having count(diff)=4