n套SQL面试题--行转列、留存、日活等

目录

第二套【窗口函数 实现分组取TOP N】

第三套 【日活、留存:行转列+datediff函数】

第六套 【窗口函数 sum() over()】

第七套【建立临时表】

第八套 【行列转换:单列拆分多行(更优解),字符串处理】

第九套【DAU各类实战】(重要)

第十套


题目来源 n套SQL面试题--行转列、留存、日活等,原文答案有错误,这里完全按题目需求进行查询

(题目清晰版本可以参考数据分析SQL面试题目9套汇总,答案同样存在错误)


第二套【窗口函数 实现分组取TOP N】

n套SQL面试题--行转列、留存、日活等_第1张图片

思路:

(1)先处理场景重复的情况,建立子表a

(2)添加一列row_n,按id分组进行排序

(3)取每组前两名,按id分组后,在组内用连接字符串

select concat(temp.id, '-', group_concat(temp.scene seperator '-'))

from
    (select id, scene, time, row_number() over(partition by id order by scene, time) as row_n
    from
        (select id, scene, min(time) as time
        from tb
        group by id, scene
        order by id, scene) a
    ) temp
where row_n<=2
group by id

 

第三套 【日活、留存:行转列+datediff函数】

n套SQL面试题--行转列、留存、日活等_第2张图片

这里的留存定义比较奇葩,但这样子计算比较简单,正常来说留存应该考虑的是当日注册用户在N日仍然登录的比例。

思路:

(1)复用表a,连接的条件为uid相同

(2)通过datediff()筛选出b.dayno和a.dayno相差1,3,7天,行转列 

update userinfo set dayno=str_to_date(dayno,'%Y-%m-%d');
select
    a.dayno 日期,count(distinct a.uid) 活跃,
    count(distinct case when datediff(b.dayno,a.dayno)=1 then a.uid end) 次留,
    count(distinct case when datediff(b.dayno,a.dayno)=3 then a.uid end) 三留,
    count(distinct case when datediff(b.dayno,a.dayno)=7 then a.uid end) 七留,
    concat(count(distinct case when datediff(b.dayno,a.dayno)=1 then a.uid end)/count(distinct a.uid)*100,'%') 次日留存率,
    concat(count(distinct case when datediff(b.dayno,a.dayno)=3 then a.uid end)/count(distinct a.uid)*100,'%') 三日留存率,
    concat(count(distinct case when datediff(b.dayno,a.dayno)=7 then a.uid end)/count(distinct a.uid)*100,'%') 七日留存率
from userinfo a 
left join userinfo b
on a.uid=b.uid
where a.app_name='相机'
AND
b.app_name='相机'
group by a.dayno;

 

第六套 【窗口函数 sum() over()】

n套SQL面试题--行转列、留存、日活等_第3张图片

select fyear, fmonth, `value`,
sum(`value`) over(partition by fyear order by fyear, fmonth) as ysum,
sum(`value`) over(order by fyear, fmonth) `sum`
from
    (select year(fdate) as fyear,
    month(fdate) as fmonth,
    sum(`value`) as `value`
    from a2
    group by fyear, fmonth
    order by fyear, fmonth) b

 

第七套【建立临时表】

n套SQL面试题--行转列、留存、日活等_第4张图片

select ID, Name, EmailAddress, max(LastLogon) latestlogon, count(distinct(date(LastLogon))) countlogon
from tb
group by ID
create temporary table temptb as

select name,Lastlogon, 
row_number() over(partition by ID order by Lastlogon) num_logontime,
dense_rank() over(partition by ID order by date(Lastlogon)) num_logonday
from tb

 

第八套 【行列转换:单列拆分多行(更优解),字符串处理】

n套SQL面试题--行转列、留存、日活等_第5张图片

--表a变成表b

select qq, group_concat(game seperator '_')
from a
group by qq
--先创建临时的序列表seq

create table temporary seq (
id int auto_increment not null,
primary key(id));

--插入的value个数跟最终生成的行数相同
insert into seq values(),(),(),(),();


select b.qq, 
substring(replace(substring_index(b.game,'_',seq.id),'_',''),seq.id) game
from seq cross join
(select b.*, 
((length(game)-length(replace(game,'_','')))/length('_'))+1 as size
from b) b
on seq.id<=b.size

 

从表B变成表A还有一种不需要新建序列表的方法,来自知乎原文,运用的是mysql内置的表格属性,这种方法更好,数据多的时候不用新建序列表,推介这种做法!!

需要注意help_topic表格的id列是从0开始计算的:

select qq,
substring_index(substring_index(game,"_",help_topic_id+1),"_",-1) as game
from a 
left join mysql.help_topic as b 
on help_topic_id < (length(game)-length(replace(game,"_",""))+1); 

 

第九套【DAU各类实战】(重要)

3.1 计算2019年6月1日至今,每日DAU(活跃用户量,即有登陆的用户)

select imp_date, count(distinct qimei) DAU
from tmp_liujg_dau_based d
where imp_date>=20190601
group by imp_date

3.2 计算20190601至今,每日领取红包的新用户数,老用户数,及人均领取金额,人均领取次数

这里的坑是:存在用户未登录,但领取了红包,这样的用户的is_new是null值,在计算时会被忽略。

select a.imp_date,
count(distinct case when a.is_new=1 then a.qimei else null end) as '新用户数',
count(distinct case when a.is_new=0 then a.qimei else null end) as '老用户数',
count(distinct case when a.is_new=2 then a.qimei else null end) as '未登录用户',
FORMAT(sum(a.add_money)/count(distinct a.qimei),2) as '人均领取金额',
format(count(a.qimei)/count(distinct a.qimei),0) as '人均领取次数'
from
(
	select p.imp_date,p.qimei,p.add_money,
	(case 
		when d.is_new=1 then 1
		when d.is_new=0 then 0
		else 2
	end) as is_new
	from tmp_liujg_packed_based p
	left join tmp_liujg_dau_based d
	on p.imp_date=d.imp_date and p.qimei=d.qimei
	where p.imp_date>=20190601
) a
group by a.imp_date

3、计算2019年3月至今,每个月按领红包取天数为1、2、3……30、31天区分,计算每个月领取天数,每个月领取红包的用户数,人均领取金额,人均领取次数

select substring(imp_date,1,6),count(distinct imp_date) as '领取天数',
count(distinct qimei) as '领取人数',
format(sum(add_money)/count(distinct qimei),2) as '人均领取金额',
format(count(qimei)/count(distinct qimei),0) as '人均领取次数'
from tmp_liujg_packed_based
where imp_date>=20190301
group by substring(imp_date,1,6)

4、计算2019年3月至今,每个月领过红包用户和未领红包用户的数量,平均月活跃天数(即本月平均活跃多少天)

select left(imp_date,6) '日期',
is_packed,
count(distinct qimei) '用户数量',
round(count(*)/count(distinct qimei)) '平均月活跃天数'
from(
select d.imp_date, d.qimei,
(case
	when p.qimei is null then '未领取红包'
	else '领取红包'
end) is_packed
from tmp_liujg_dau_based d
left join tmp_liujg_packed_based p
on d.imp_date=p.imp_date and d.qimei=p.qimei) a
group by left(imp_date,6), is_packed

5、计算2019年3月至今,每个月活跃用户的注册日期,2019年3月1日前注册的用户日期填空即可

select left(a.imp_date,6) '日期', a.qimei '活跃用户', d.imp_date '注册日期'
from tmp_liujg_dau_based d
right join
(select *
from tmp_liujg_dau_based
where imp_date>=20190301) a
on a.qimei=d.qimei and d.is_new=1
order by '日期'

6、计算2019年3月至今,每日的用户次日留存率,领取红包用户的次日留存,未领取红包用户的次日留存率

select imp_date,
count(distinct case when datediff(liu_date,imp_date)=1 and is_new=1 then qimei else null end)/count(distinct case when is_new=1 then qimei else null end) as '次留',
count(distinct case when datediff(liu_date,imp_date)=1 and is_new=1 and is_packed is not null then qimei else null end)/count(distinct case when is_packed is not null and is_new=1 then qimei else null end) as '领取红包用户次留',
count(distinct case when datediff(liu_date,imp_date)=1 and is_new=1 and is_packed is null then qimei else null end)/count(distinct case when is_packed is null and is_new=1 then qimei else null end) as '未领取红包用户次留'

from
		(select a.*, p.qimei as is_packed
		from
				(select d1.*, d2.imp_date as liu_date, d2.qimei as liu_qimei
				from tmp_liujg_dau_based d1
				left join tmp_liujg_dau_based d2
				on d1.imp_date>=20190301 and d1.qimei=d2.qimei) a
		left join tmp_liujg_packed_based p
		on p.imp_date = a.imp_date and p.qimei=a.qimei
		) tmp
group by imp_date

思路:

(1)看到计算次留,就要left join原表,获得同一个id不同登录时间的组合,从而挑选出时间组合相差1天的数据

(2)left join领取红包表,获得某个id在当天是否领取了红包,如果领取了is_packed会记录下id,如果没有则为null

(3)count(distinct case when)组合计算,不要忘记is_new=1这个约束条件,因为正常来说,我们将次日留存率=(当日新增用户在第n日登陆人数)/(当日新增用户),领取红包用户的次日留存=(当日领取了红包的新增用户在第n日登陆人数)/(当日领取了红包的新增用户)【感觉这个题目是想分析是否领取红包对新增用户的次留影响
 

7、计算2019年6月1日至今,每日新用户领取得第一个红包的金额

select imp_date, qimei,add_money
from
		(select b.*, 
		row_number() over(partition by imp_date,qimei order by report_time) as seq
		from(
				select a.imp_date,a.qimei, p.report_time, p.add_money
				from
						(select *
						from tmp_liujg_dau_based 
						where is_new=1 and imp_date>=20190601) a
				inner join tmp_liujg_packed_based p
				on a.qimei=p.qimei) b 
		) tmp
where seq=1


8.计算2019年3月1日至今,每个新用户领取的第一个红包和第二个红包的时间差(只计算注册当日有领取红包的用户,注册当日及以后的DAU表中新用户为1的用户)

select imp_date,qimei,first_date,second_date,TIMESTAMPDIFF(minute,first_date,second_date) as '时间差'
from
(# 行转列方便求差值
select imp_date,qimei,
max(case when seq=1 then report_time else null end) as first_date,
max(case when seq=2 then report_time else null end) as second_date
from
		(# 为了选出分组top2, 添加一列分组排序
		select a.*,row_number() over(partition by imp_date,qimei order by report_time) as seq
				from
						(# 获取这些id所有的领取红包记录
						select d.imp_date, d.qimei, p.report_time
						from tmp_liujg_dau_based d
						inner join tmp_liujg_packed_based p
						on d.qimei=p.qimei
						where d.qimei in
								(# 筛选出注册当日有领取红包的用户id
								select distinct d.qimei
								from tmp_liujg_dau_based d
								left join tmp_liujg_packed_based p
								on d.imp_date = p.imp_date and d.qimei=p.qimei
								where d.is_new=1 and p.report_time is not null) 
						and d.is_new=1) a
		) tmp
group by imp_date,qimei) b

思路:

(1)首先 筛选出注册当日有领取红包的用户id

(2)left join 红包记录表,得到这些id的所有获取红包记录

(3)按日期、id分组后,组内根据获取时间进行排序

(4)对排名第一和第二的记录进行行转列,方便进行求差

 

第十套

n套SQL面试题--行转列、留存、日活等_第6张图片

select g.department as department, g.game_name as game_name, sum(i.income_money) as sum_income_money
from game g
left join income i
on g.game_id=i.game_id
where i.income_time BETWEEN '2020-01-01' and '2020-03-31'
group by g.department, g.game_id

 

你可能感兴趣的:(mysql,面试,sql)