现有用户信息表user_info(uid用户ID,nick_name昵称, achievement成就值, level等级, job职业方向, register_time注册时间):
id | uid | nick_name | achievement | level | job | register_time |
1 | 1001 | 牛客1号 | 3200 | 7 | 算法 | 2020-01-01 10:00:00 |
2 | 1002 | 牛客2号 | 2500 | 6 | 算法 | 2020-01-01 10:00:00 |
3 | 1003 | 牛客3号♂ | 2200 | 5 | 算法 | 2020-01-01 10:00:00 |
试卷信息表examination_info(exam_id试卷ID, tag试卷类别, difficulty试卷难度, duration考试时长, release_time发布时间):
id | exam_id | tag | difficulty | duration | release_time |
1 | 9001 | SQL | hard | 60 | 2020-01-01 10:00:00 |
2 | 9002 | SQL | hard | 80 | 2020-01-01 10:00:00 |
3 | 9003 | 算法 | hard | 80 | 2020-01-01 10:00:00 |
4 | 9004 | PYTHON | medium | 70 | 2020-01-01 10:00:00 |
试卷作答记录表exam_record(uid用户ID, exam_id试卷ID, start_time开始作答时间, submit_time交卷时间, score得分):
id | uid | exam_id | start_time | submit_time | score |
1 | 1001 | 9001 | 2020-01-01 09:01:01 | 2020-01-01 09:21:59 | 90 |
15 | 1002 | 9001 | 2020-01-01 18:01:01 | 2020-01-01 18:59:02 | 90 |
13 | 1001 | 9001 | 2020-01-02 10:01:01 | 2020-01-02 10:31:01 | 89 |
2 | 1002 | 9001 | 2020-01-20 10:01:01 | ||
3 | 1002 | 9001 | 2020-02-01 12:11:01 | ||
5 | 1001 | 9001 | 2020-03-01 12:01:01 | ||
6 | 1002 | 9001 | 2020-03-01 12:01:01 | 2020-03-01 12:41:01 | 90 |
4 | 1003 | 9001 | 2020-03-01 19:01:01 | ||
7 | 1002 | 9001 | 2020-05-02 19:01:01 | 2020-05-02 19:32:00 | 90 |
14 | 1001 | 9002 | 2020-01-01 12:11:01 | ||
8 | 1001 | 9002 | 2020-01-02 19:01:01 | 2020-01-02 19:59:01 | 69 |
9 | 1001 | 9002 | 2020-02-02 12:01:01 | 2020-02-02 12:20:01 | 99 |
10 | 1002 | 9002 | 2020-02-02 12:01:01 | ||
11 | 1002 | 9002 | 2020-02-02 12:01:01 | 2020-02-02 12:43:01 | 81 |
12 | 1002 | 9002 | 2020-03-02 12:11:01 | ||
17 | 1001 | 9002 | 2020-05-05 18:01:01 | ||
16 | 1002 | 9003 | 2020-05-06 12:01:01 |
请统计SQL试卷上未完成率较高的50%用户中,6级和7级用户在有试卷作答记录的近三个月中,每个月的答卷数目和完成数目。按用户ID、月份升序排序。
由示例数据结果输出如下:
uid | start_month | total_cnt | complete_cnt |
1002 | 202002 | 3 | 1 |
1002 | 202003 | 2 | 1 |
1002 | 202005 | 2 | 1 |
解释:各个用户对SQL试卷的未完成数、作答总数、未完成率如下:
uid | incomplete_cnt | total_cnt | incomplete_rate |
1001 | 3 | 7 | 0.4286 |
1002 | 4 | 8 | 0.5000 |
1003 | 1 | 1 | 1.0000 |
1001、1002、1003分别排在1.0、0.5、0.0的位置,因此较高的50%用户(排位<=0.5)为1002、1003;
1003不是6级或7级;
有试卷作答记录的近三个月为202005、202003、202002;
这三个月里1002的作答题数分别为3、2、2,完成数目分别为1、1、1。
方法一:使用date_format
with cte as(
select
uid
from
(select
uid,incomplete_rate,
percent_rank() over(order by incomplete_rate) as rnk1
from
(select
uid,
sum(if(submit_time is null,1,0)) as incomplete_cnt,
count(start_time) as total_cnt,
sum(if(submit_time is null,1,0))/count(start_time) as incomplete_rate
from
exam_record
where
exam_id in (select exam_id from examination_info where tag='SQL')
group by uid)t
)t2 left join user_info using(uid)
where rnk1>=0.5 and level>=6
)
select
uid,start_month,total_cnt,complete_cnt
from
(select
uid,date_format(start_time,'%Y%m') as start_month,
count(start_time) as total_cnt,
sum(if(submit_time is null,0,1)) as complete_cnt,
dense_rank() over (partition by uid order by date_format(start_time,'%Y%m') desc) as rnk2
from
exam_record
where uid in (select uid from cte)
group by uid,date_format(start_time,'%Y%m'))t
where rnk2<=3
order by uid,start_month
方法二:使用left+replace
with cte as(
select
uid
from
(select
uid,incomplete_rate,
percent_rank() over(order by incomplete_rate) as rnk1
from
(select
uid,
sum(if(submit_time is null,1,0)) as incomplete_cnt,
count(start_time) as total_cnt,
sum(if(submit_time is null,1,0))/count(start_time) as incomplete_rate
from
exam_record
where
exam_id in (select exam_id from examination_info where tag='SQL')
group by uid)t
)t2 left join user_info using(uid)
where rnk1>=0.5 and level>=6
)
select
uid,
replace(act_time, '-', '') as start_month,
total_cnt,complete_cnt
from
(select
uid,left(start_time,7) as act_time,
count(start_time) as total_cnt,
sum(if(submit_time is null,0,1)) as complete_cnt,
dense_rank() over (partition by uid order by date_format (start_time,'%Y%m') desc) as rnk2
from
exam_record
where uid in (select uid from cte)
group by uid,left(start_time,7))t
where rnk2<=3
order by uid,act_time