511题. 游戏玩法分析 I&II&III
SQL架构:
Create table If Not Exists Activity (player_id int, device_id int, event_date date, games_played int)
Truncate table Activity
insert into Activity (player_id, device_id, event_date, games_played) values ('1', '2', '2016-03-01', '5')
insert into Activity (player_id, device_id, event_date, games_played) values ('1', '2', '2016-05-02', '6')
insert into Activity (player_id, device_id, event_date, games_played) values ('2', '3', '2017-06-25', '1')
insert into Activity (player_id, device_id, event_date, games_played) values ('3', '1', '2016-03-02', '0')
insert into Activity (player_id, device_id, event_date, games_played) values ('3', '4', '2018-07-03', '5')
活动表 Activity:
Column Name | Type |
---|---|
player_id | int |
device_id | int |
event_date | date |
games_played | int |
表的主键是 (player_id, event_date)。
这张表展示了一些游戏玩家在游戏平台上的行为活动。
每行数据记录了一名玩家在退出平台之前,当天使用同一台设备登录平台后打开的游戏的数目(可能是 0 个)。
问题I:写一条 SQL 查询语句获取每位玩家 第一次登陆平台的日期。
查询结果的格式如下所示:
Activity 表:
player_id | device_id | event_date | games_played |
---|---|---|---|
1 | 2 | 2016-03-01 | 5 |
1 | 2 | 2016-05-02 | 6 |
2 | 3 | 2017-06-25 | 1 |
3 | 1 | 2016-03-02 | 0 |
3 | 4 | 2018-07-03 | 5 |
Result 表:
player_id | first_login |
---|---|
1 | 2016-03-01 |
2 | 2017-06-25 |
3 | 2016-03-02 |
解答
select player_id,min(event_date) as first_login from Activity
group by player_id;
或者
select distinct player_id,min(event_date) over(partition by player_id) as first_login from Activity;
问题II:请编写一个 SQL 查询,描述每一个玩家首次登陆的设备名称
查询结果格式在以下示例中:
Activity table:
player_id | device_id | event_date | games_played |
---|---|---|---|
1 | 2 | 2016-03-01 | 5 |
1 | 2 | 2016-05-02 | 6 |
2 | 3 | 2017-06-25 | 1 |
3 | 1 | 2016-03-02 | 0 |
3 | 4 | 2018-07-03 | 5 |
Result table:
player_id | device_id |
---|---|
1 | 2 |
2 | 3 |
3 | 1 |
解答
select player_id,device_id from Activity
where (player_id,event_date) in (
select player_id,min(event_date) from Activity
group by player_id
);
或者
select player_id, device_id from activity a1
where a1.event_date <= all(
select a2.event_date from activity a2 where a1.player_id = a2.player_id
);
问题III:编写一个 SQL 查询,同时报告每组玩家和日期,以及玩家到目前为止玩了多少游戏。也就是说,在此日期之前玩家所玩的游戏总数。详细情况请查看示例。
查询结果格式如下所示:
Activity table:
player_id | device_id | event_date | games_played |
---|---|---|---|
1 | 2 | 2016-03-01 | 5 |
1 | 2 | 2016-05-02 | 6 |
1 | 3 | 2017-06-25 | 1 |
3 | 1 | 2016-03-02 | 0 |
3 | 4 | 2018-07-03 | 5 |
Result table:
player_id | event_date | games_played_so_far |
---|---|---|
1 | 2016-03-01 | 5 |
1 | 2016-05-02 | 11 |
1 | 2017-06-25 | 12 |
3 | 2016-03-02 | 0 |
3 | 2018-07-03 | 5 |
对于 ID 为 1 的玩家,2016-05-02 共玩了 5+6=11 个游戏,2017-06-25 共玩了 5+6+1=12 个游戏。
对于 ID 为 3 的玩家,2018-07-03 共玩了 0+5=5 个游戏。
请注意,对于每个玩家,我们只关心玩家的登录日期。
解答
select a1.player_id,a1.event_date,sum(a2.games_played) as games_played_so_far from
Activity a1 ,Activity a2
where a1.event_date >= a2.event_date and a1.player_id = a2.player_id
group by a1.player_id,a1.event_date;
或
select player_id,event_date,
sum(games_played) over(partition by player_id order by event_date) as games_played_so_far
from Activity;
注意:
方法一:因为a1写在左边,相当于以左边为基础,需要统计a2表的时长;刚开始是我用的是a1表的时长,显然答案不符!!!
方法二:sum(字段1) over(partition by 字段2 order by 字段3 rows between unbounded preceding and current row) as 新字段名
函数说明:
sum(字段) 的求和是针对后面over()窗口的求和,
over中partition by 字段1 order by 字段2 ----- 针对 字段1 这一组按照 字段2 排序,
rows between unbounded preceding and current row ----- unbounded(无限的);preceding(在之前的) ; current row(当前行)。限定了行是按照在当前行不限定的往前处理,换而言之就是处理当前以及之前的所有行的sum。 如上例:sum(games_played) over(partition by player_id order by event_date) 即event_date=2016-05-02时,sum(games_played)求的是2016-03-01、2016-05-02的和;2017-06-25时,sum(games_played)求的是2016-03-01、2016-05-02、2017-06-25的和。
问题IV: 编写一个 SQL 查询,报告在首次登录的第二天再次登录的玩家的比率,四舍五入到小数点后两位。换句话说,您需要计算从首次登录日期开始至少连续两天登录的玩家的数量,然后除以玩家总数。
查询结果格式如下所示:
Activity table:
player_id | device_id | event_date | games_played |
---|---|---|---|
1 | 2 | 2016-03-01 | 5 |
1 | 2 | 2016-03-02 | 6 |
2 | 3 | 2017-06-25 | 1 |
3 | 1 | 2016-03-02 | 0 |
3 | 4 | 2018-07-03 | 5 |
Result table:
fraction |
---|
0.33 |
只有 ID 为 1 的玩家在第一天登录后才重新登录,所以答案是 1/3 = 0.33
解答
select round(count(distinct a1.player_id)/(select count(distinct player_id) from Activity),2) as fraction
from Activity a1,(select player_id,min(event_date) as first_login from Activity group by player_id ) a2
where a1.player_id = a2.player_id and datediff(a1.event_date,a2.first_login) = 1;
或
with temp as(
select player_id,event_date,
dateDiff(event_date,min(event_date) over(partition by player_id order by event_date asc) ) as diff
from Activity
)
select
round( sum( case when diff = 1 then 1 else 0 end ) / count(distinct player_id),2) as fraction
from temp;
或
select round(avg(a.event_date is not null), 2) as fraction
from (select player_id, min(event_date) as first_login from activity group by player_id) b
left join activity a on b.player_id=a.player_id and datediff(a.event_date, b.login) = 1;
方法二(此处参考用户Gump题解):
第一步:定义临时表 temp (用来获取每一个用户的第一次登录时间与当前这条的时间差值)——写临时表主要是思路表达清晰点
第二步:使用case when 从临时表中计算,选出与第一天登录相差一天的登录用户
方法三(此处参考用户luanhz题解)-很有借鉴意义,一开始是没有想明白的:
1、avg() 聚合函数使用不一定要分组,分组后一定跟聚合函数;
2、使用 is not null判断,eventdate值不为null时返回1,为null的返回0,此处的avg相当于求和后(即符合条件的id个数)除以总id数!!!
题目来源:力扣(LeetCode)
链接:https://leetcode-cn.com/problems/game-play-analysis-i