数据分析/运营——SQL面试题:如何求用户的最大连续登陆天数

本文介绍一个经典的面试题:如何求用户的最大连续登陆天数。

题目描述:有一个用户登陆表,表中有两列,一列是userid,代表用户的id;另一列是sigindate,代表用户的登陆日期。现在要求每个用户的最大连续登陆天数。

解题思路:对于每一个用户,首先求出它的日期排名ranking, 然后再求出今天距离 登陆日期的日期天数 dates ,拿这个天数 减去 - 它的排名, 得到一个辅助列 diff,然后求出diff的最大计数,即为用户的最大连续登陆天数。是不是看的很迷糊?最大计数?没事,下面对这个思路进行详细的证明。
思路证明:现在有三个相邻(不一定连续)的日期d1——>d2——>d3,那么肯定有

						ranking_d2 - rankingd1 = 1 (1)
						ranking_d3 - rankingd2 = 1 (2)

假设d1和d2是连续的, 即

						dates_d2 - date2_d1 = 1  (3)

而d2和d3是不连续的(不妨假设相隔了2天),即

						dates_d3 - date2_d2 = 2  (4)

那么(1)-(3)有:

		(ranking_d2 - dates_d2) - (ranking_d1 - dates_d1)= 0

		ranking_d2 - dates_d2 = ranking_d1 - dates_d1,即diff1 = diff2

(2)-(4)有:

		(ranking_d3 - dates_d3) - (ranking_d2 - dates_d2)= -1 ≠0

		ranking_d2 - dates_d2 ≠ ranking_d1 - dates_d1,即 diff1 ≠ diff2

这就说明,如果两个日期d1和d2是连续的,那么它们的diff相同,如果不连续,那么它们的diff不同!
因此可以通过计数count(diff)来判断每一段连续的登陆天数持续了多久(这主要是因为存在用户a,一段时间连续登陆了2天,然后中间断了,后面又连续登陆了3天的情况), 然后再求每个计数的最大值,即为最大连续登陆天数,也就是最大计数。

下面以一个列子来说明这种算法,用到的数据表如下所示:

数据分析/运营——SQL面试题:如何求用户的最大连续登陆天数_第1张图片

解题步骤:

(1)求排名ranking,间隔天数dates,以及排名-天数得到diff

select *, datediff(now(), sigindate)-1267 as dates, 
row_number() over(partition by userid order by sigindate desc) as ranking, 
(datediff(now(), sigindate)-1267 - row_number() over(partition by userid order by sigindate desc) ) as diff
from t;
数据分析/运营——SQL面试题:如何求用户的最大连续登陆天数_第2张图片

可以看到,对于每一个用户而言,如果它的登陆日期sigindate是连续的,那么diff就会相同(具体是多少不重要)。

(2)对diff进行计数,即求出用户的所有连续活跃天数

select userid,diff, count(diff) as 'diff计数'
from (
	select *, datediff(now(), sigindate)-1267 as dates, 
	row_number() over(partition by userid order by sigindate desc) as ranking, 
	(datediff(now(), sigindate)-1267 - 
	row_number() over(partition by userid order by sigindate desc) ) as diff
	from t
) as t1
group by userid, diff;
数据分析/运营——SQL面试题:如何求用户的最大连续登陆天数_第3张图片

(3)求diff计数的最大值,即求用户的最大活跃天数

select userid, max(diff计数) as '最大活跃天数'
from (
select userid,diff, count(diff) as 'diff计数'
from (
select *, datediff(now(), sigindate)-1267 as dates, 
row_number() over(partition by userid order by sigindate desc) as ranking, 
(datediff(now(), sigindate)-1267 - row_number() over(partition by userid order by sigindate desc) ) as diff
from t
) as t1
group by userid, diff
) as t2
group by userid;
数据分析/运营——SQL面试题:如何求用户的最大连续登陆天数_第4张图片

整个过程可以简化为:

from (
	select userid,diff, count(diff) as 'diff计数'
	from (
		select *, (datediff(now(), sigindate)-1267 - 
		row_number() over(partition by userid order by sigindate desc) ) as diff
		from t
		) as t1
		group by userid, diff
	) as t2
group by userid;

你可能感兴趣的:(数据分析/运营)