解题思路:对于每一个用户,首先求出它的日期排名ranking
, 然后再求出今天距离 登陆日期的日期天数 dates
,拿这个天数 减去 - 它的排名, 得到一个辅助列 diff
,然后求出diff的最大计数,即为用户的最大连续登陆天数。是不是看的很迷糊?最大计数?没事,下面对这个思路进行详细的证明。
思路证明:现在有三个相邻(不一定连续)的日期d1——>d2——>d3,那么肯定有
ranking_d2 - rankingd1 = 1 (1)
ranking_d3 - rankingd2 = 1 (2)
假设d1和d2是连续的, 即
dates_d2 - date2_d1 = 1 (3)
而d2和d3是不连续的(不妨假设相隔了2天),即
dates_d3 - date2_d2 = 2 (4)
那么(1)-(3)有:
(ranking_d2 - dates_d2) - (ranking_d1 - dates_d1)= 0
即
ranking_d2 - dates_d2 = ranking_d1 - dates_d1,即diff1 = diff2
(2)-(4)有:
(ranking_d3 - dates_d3) - (ranking_d2 - dates_d2)= -1 ≠0
即
ranking_d2 - dates_d2 ≠ ranking_d1 - dates_d1,即 diff1 ≠ diff2
这就说明,如果两个日期d1和d2是连续的,那么它们的diff相同,如果不连续,那么它们的diff不同!
因此可以通过计数count(diff)来判断每一段连续的登陆天数持续了多久(这主要是因为存在用户a,一段时间连续登陆了2天,然后中间断了,后面又连续登陆了3天的情况), 然后再求每个计数的最大值,即为最大连续登陆天数,也就是最大计数。
下面以一个列子来说明这种算法,用到的数据表如下所示:
解题步骤:
(1)求排名ranking,间隔天数dates,以及排名-天数得到diff
select *, datediff(now(), sigindate)-1267 as dates,
row_number() over(partition by userid order by sigindate desc) as ranking,
(datediff(now(), sigindate)-1267 - row_number() over(partition by userid order by sigindate desc) ) as diff
from t;
可以看到,对于每一个用户而言,如果它的登陆日期sigindate是连续的,那么diff就会相同(具体是多少不重要)。
(2)对diff进行计数,即求出用户的所有连续活跃天数
select userid,diff, count(diff) as 'diff计数'
from (
select *, datediff(now(), sigindate)-1267 as dates,
row_number() over(partition by userid order by sigindate desc) as ranking,
(datediff(now(), sigindate)-1267 -
row_number() over(partition by userid order by sigindate desc) ) as diff
from t
) as t1
group by userid, diff;
(3)求diff计数的最大值,即求用户的最大活跃天数
select userid, max(diff计数) as '最大活跃天数'
from (
select userid,diff, count(diff) as 'diff计数'
from (
select *, datediff(now(), sigindate)-1267 as dates,
row_number() over(partition by userid order by sigindate desc) as ranking,
(datediff(now(), sigindate)-1267 - row_number() over(partition by userid order by sigindate desc) ) as diff
from t
) as t1
group by userid, diff
) as t2
group by userid;
整个过程可以简化为:
from (
select userid,diff, count(diff) as 'diff计数'
from (
select *, (datediff(now(), sigindate)-1267 -
row_number() over(partition by userid order by sigindate desc) ) as diff
from t
) as t1
group by userid, diff
) as t2
group by userid;