MYSQL学习笔记(5)——聚合窗口函数

题目复盘——聚合窗口函数1

SQL33 对试卷得分做min-max归一化

MYSQL学习笔记(5)——聚合窗口函数_第1张图片
MYSQL学习笔记(5)——聚合窗口函数_第2张图片
MYSQL学习笔记(5)——聚合窗口函数_第3张图片

问题分析

(1)高难度试卷——exam_record left join examination_info on…where difficulty=‘hard’
(2)得分在每份试卷作答记录内执行min-max归一化后缩放到[0,100]——
聚合窗口函数找每个exam_id试卷内的最大最小值:
select uid,exam_id, score,min(score)over(partition by exam_id) min_score,max(score)over(partition by exam_id) max_score from exam_record
归一化缩放到[0,100]:((score-min_score)/(max_score-min_score))*100;
(3)输出用户ID试卷ID,归一化后分数平均值——
按照uid,exam_id分组:group by uid,exam_id;
平均值:avg(((score-min_score)/(max_score-min_score))*100);
保留整数:round(avg(((score-min_score)/(max_score-min_score))*100),0);
(4)如果某试卷作答记录只有一个得分则无需使用公式——
只有一个得分意味着:max(score)=min(score)分母为0不能用公式;
if语句判断:if(exam_min_score=exam_max_score,score,((score-min_score)/(max_score-min_score))*100)
(5)排序——order by uid asc,avg_new_score desc;这里题目描述的不太严谨,题目说按照归一化分数降序让我一开始以为是平均之前的分数,但正确答案和表格示例都显示应该是平均分的降序;

答案重写

SELECT uid,exam_id,
round(avg(
if(exam_min_score=exam_max_score,score,((score-min_score)/(max_score-min_score))*100)
),0) avg_new_score
FROM 
(SELECT e_r.uid,exam_id,score,
min(score)over(partition by exam_id) exam_min_score,
max(score)over(partition by exam_id) exam_max_score
FROM exam_record e_r LEFT JOIN examination_info e_i 
ON e_r.exam_id=e_i.exam_id
WHERE difficulty='hard' 
AND score IS NOT NULL) t1
GROUP BY uid,exam_id
ORDER BY uid asc,avg_new_score desc;

细节问题

不要忘记写score is not null‍♀️‍♀️‍♀️
MIN()OVER() :不改变表结构的前提下,计算出最小值
MAX()OVER():不改变表结构的前提下,计算出最大值
COUNT()OVER():不改变表结构的前提下,计数
SUM()OVER():不改变表结构的前提下,求和
AVG()OVER():不改变表结构的前提下,求平均值

SQL34 每份试卷每月作答数和截止当月的作答总数

MYSQL学习笔记(5)——聚合窗口函数_第4张图片
MYSQL学习笔记(5)——聚合窗口函数_第5张图片

问题分析

(1)截至当月作答数——累加用到聚合窗口函数sum()over();以及count(start_time)
(2)月份转换——date_format(start_time,’%Y%m’)
(3)每月作答数——count(start_time);group by exam_id,start_month
(4)排序——order by exam_id,start_time;

答案重写

SELECT exam_id,date_format(start_time,'%Y%m') start_month,
count(start_time) month_cnt,
sum(count(start_time))over(partition by exam_id order by date_format(start_time,'%Y%m')) cum_exam_cnt
FROM exam_record
GROUP BY exam_id,start_month
ORDER BY exam_id,start_time;

细节问题

(1)已经GROUP BY 了,在窗口函数中还要用到PARTITION BY?
回答1:因为如果没有partition by ,直接order by 那么按照examid的排序就被打乱了,就成了之按照月份排序的累计和。没按照每份试卷。
回答2:因为少走了一步,用一个子查询解释

SELECT *,SUM(month_cnt)OVER(PARTITION BY exam_id ORDER BY start_month) cum_exam_cnt
FROM (
	SELECT exam_id,DATE_FORMAT(start_time,'%Y%m')start_month, COUNT(start_time) month_cnt
	FROM  exam_record
	GROUP BY exam_id,start_month
)t1

SQL35 每月及截止当月的答题情况

MYSQL学习笔记(5)——聚合窗口函数_第6张图片
MYSQL学习笔记(5)——聚合窗口函数_第7张图片
MYSQL学习笔记(5)——聚合窗口函数_第8张图片

问题分析

(1)月活用户数——date_format(start_time,’%Y%m’) start_month;count(distinct uid) mau;group by start_month;
(2)新增用户数——这里在第一遍做题时没想出来简洁的表达方式,其实反过来看每个用户开始有做题记录的第一个月就是新增加在那个月的用户,通过判断sign_time与start_time是否相等确认是否为0还是month_add_uv;
if(sign_time=start_time,month_add_uv,0)
select sign_time,count(uid) month_add_uv from
(select uid,min(date_format(start_time,’%Y%m’)) sign_time from exam_record group by uid) t1;
(3)截止当月的单月最大新增用户数——聚合窗口函数max(month_add_uv)over(order by start_month);
(4)截止当月的累积用户数——聚合窗口函数sum(month_add_uv)over(order by start_month);
(5)排序——order by start_month;

答案重写

SELECT date_format(start_time,'%Y%m') start_month,
count(distinct uid) mau,
if(sign_time=start_time,month_add_uv,0) month_add_uv,
max(month_add_uv)over(order by start_month) max_month_add_uv,
sum(month_add_uv)over(order by start_month) cum_sum_uv
FROM exam_record e_r LEFT JOIN 
(SELECT sign_time,count(uid) month_add_uv
FROM
(SELECT uid,min(date_format(start_time,'%Y%m')) sign_time
FROM exam_record
GROUP BY uid) t1) t2
ON e_r.start_time=t2.sign_time
ORDER BY start_time;

细节问题

脑子得灵活一点才行嘞,不然就是想不出这定义怎么表达怎么办‍♀️‍♀️

你可能感兴趣的:(mysql,学习)