从版本8.0开始,MySQL支持窗口函数。窗口函数允许您以新的,更简单的方式解决查询问题,并具有更好的性能。
窗口函数语法:
调用窗口函数的一般语法如下:
window_function_name(expression)
OVER (
[partition_defintion]
[order_definition]
[frame_definition]
)
各个部分介绍:
1>partition_defintion(分区定义):将partition_clause行分成块或分区。两个分区由分区边界分隔。窗口函数在分区内执行,并在跨越分区边界时重新初始化。
语法:PARTITION BY
您可以在PARTITION BY子句中指定一个或多个表达式。多个表达式用逗号分隔。
2>order_definition(顺序定义):指定行在分区中的排序方式.
语法:ORDER BY
3>frame_definition(帧定义):帧是当前分区的子集。要定义子集,请使用frame子句.
语法:frame_unit {
数据准备:
-- 1.案例分析:假设我们有一个sales表,按员工和财政年度存储销售额,如下所示:
-- 创建sales表
CREATE TABLE sales(
sales_employee VARCHAR(50) NOT NULL,
fiscal_year INT NOT NULL,
sale DECIMAL(14,2) NOT NULL,
PRIMARY KEY(sales_employee,fiscal_year)
);
-- 插入数据
INSERT INTO sales(sales_employee,fiscal_year,sale)
VALUES('Bob',2016,100),
('Bob',2017,150),
('Bob',2018,200),
('Alice',2016,150),
('Alice',2017,100),
('Alice',2018,200),
('John',2016,200),
('John',2017,150),
('John',2018,250);
-- 验证查询数据
SELECT *
FROM sales;
理解窗口函数可能更容易从聚合函数开始。聚合函数将来自多行的数据汇总到单个结果行中。例如,以下SUM()函数返回记录年份中所有员工的总销售额:
SELECT
SUM(sale)
FROM
sales;
GROUP BY子句允许您将聚合函数应用于行的子集。例如,您可能希望按会计年度计算总销售额:
SELECT
fiscal_year,
SUM(sale)
FROM
sales
GROUP BY
fiscal_year;
在这两个示例中,聚合函数都会减少查询返回的行数。与带有GROUP BY子句的聚合函数一样,窗口函数也对行的子集进行操作,但它们不会减少查询返回的行数。例如,以下查询返回每个员工的销售额,以及按会计年度计算的员工总销售额:
SELECT
fiscal_year,
sales_employee,
sale,
SUM(sale) OVER (PARTITION BY fiscal_year) total_sales
FROM
sales;
上面的结果中可以看出,SUM()函数用作窗口函数,函数对由OVER子句内容定义的一组行进行操作。其返回的计算结果不会减少查询返回的行数,而且作用于每一行中。
数据准备:
-- 2.案例分析:各大厂商手机利润:
CREATE TABLE phone(
manufacturer VARCHAR(50),
product VARCHAR(50),
profit DECIMAL(14,2),
PRIMARY KEY(manufacturer,product)
);
-- 插入数据
INSERT INTO phone
VALUES('huawei','huawei mate 50',500),
('huawei','huawei P50',650),
('huawei','huawei P40',300),
('huawei','nove 10',560),
('vivo','vivo X80',300),
('vivo','vivo X70',480),
('xiaomi','xiaomi 12',350),
('xiaomi','redmi 50',560),
('xiaomi','redmi 40',440);
执行下面这个脚本,可以看出:
SELECT
manufacturer, product, profit,
SUM(profit) OVER() AS total_profit,
SUM(profit) OVER(PARTITION BY manufacturer) AS manufacturer_profit
FROM phone;
分析:
1> 第一个 OVER 子句是空的,它将整个查询行集视为一个分区。窗口函数因此产生一个全局和,但对每一行都这样做。
2> 第二个 OVER 子句按 manufacturer 划分行,产生每个分区(每个manufacturer)的总和。该函数为每个分区行生成此总和。
常用窗口函数:
ROW_NUMBER() ——它生成其分区内每一行的行号。默认情况下,分区行是无序的,行编号是不确定的。若要对分区行进行排序,请在窗口定义中包含一个ORDER BY子句。下面的示例中,查询使用无序分区和有序分区(row_num1和row_num2列)来说明省略和包含ORDER BY之间的区别:
SELECT
manufacturer, product, profit,
ROW_NUMBER() OVER(PARTITION BY manufacturer) AS row_num1,
ROW_NUMBER() OVER(PARTITION BY manufacturer ORDER BY profit) AS row_num2
FROM phone;
-- 3.案例分析:学生成绩:
CREATE TABLE stu_score(
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(50),
subject VARCHAR(50),
score DECIMAL(10,2)
);
-- 插入数据
INSERT INTO stu_score(name,subject,score)
VALUES('祝兰星','语文',97),
('祝兰星','数学',62),
('祝兰星','英语',79),
('冯宝宝','语文',99),
('冯宝宝','数学',100),
('冯宝宝','英语',0),
('晓晓庆','语文',82),
('晓晓庆','数学',80),
('晓晓庆','英语',81),
('姗姗来迟','语文',62),
('姗姗来迟','数学',91),
('姗姗来迟','英语',79),
('银凤','语文',99),
('银凤','数学',97),
('银凤','英语',98),
('艺伟','语文',99),
('艺伟','数学',100),
('艺伟','英语',60);
row_number():当前行在分区中的序号(行号)序号函数
rank():当前行在分区中的序号(排名),跳过重复的序号
dense_rank():当前行在分区中的序号(排名),不跳过重复的序号
select
name, subject, score,
rank() over w as 'rank',
dense_rank() over w as 'dense_rank',
row_number() over w as 'row_number'
from stu_score
window w as (partition by subject order by score desc);
各种练习:
-- 查询每科第一名
select *
from (
select name,
subject,
score,
dense_rank() over(partition by subject order by score desc) as 'rn'
from stu_score
) tmp where tmp.rn = 1;
-- 每科前三名
select *
from (
select
name,
subject,
score,
row_number() over(partition by subject order by score desc) as 'rn'
from stu_score
) tmp where tmp.rn <= 3;
-- 每科高于平均分数(写法一)
select *
from (
select
name,
subject,
score,
avg(score) over(partition by subject) as 'avg_score'
from stu_score
) tmp where tmp.score > tmp.avg_score;
-- 非窗口函数写法
select
name,
subject,
score
from stu_score s
where s.score > (select avg(score) from stu_score s2 where s2.subject = s.subject)
order by s.subject asc;
-- 单科最高分、科目最高分、科目最低分、科目平均分、总分、学生总分、参加的学科数
select
name,
subject,
score,
first_value(score) over(partition by subject order by score desc) as '单科最高分',
max(score) over(partition by subject) as '科目最高分',
min(score) over(partition by subject) as '科目最低分',
avg(score) over(partition by subject) as '科目平均分',
sum(score) over(partition by subject order by score desc rows between unbounded preceding and current row) as '总分',
sum(score) over(partition by name) as '学生总分',
count(subject) over (partition by name) as '参加的学科数'
from stu_score order by subject;
-- 4.案例分析:商品价格
-- 创建表
CREATE TABLE goods(
id INT PRIMARY KEY AUTO_INCREMENT,
category_id INT,
category VARCHAR(15),
NAME VARCHAR(30),
price DECIMAL(10,2),
stock INT,
upper_time DATETIME
);
-- 插入数据
INSERT INTO goods(category_id,category,NAME,price,stock,upper_time)
VALUES
(1, '女装/女士精品', 'T恤', 39.90, 1000, '2020-11-10 00:00:00'),
(1, '女装/女士精品', '连衣裙', 79.90, 2500, '2020-11-10 00:00:00'),
(1, '女装/女士精品', '卫衣', 89.90, 1500, '2020-11-10 00:00:00'),
(1, '女装/女士精品', '牛仔裤', 89.90, 3500, '2020-11-10 00:00:00'),
(1, '女装/女士精品', '百褶裙', 29.90, 500, '2020-11-10 00:00:00'),
(1, '女装/女士精品', '呢绒外套', 399.90, 1200, '2020-11-10 00:00:00'),
(2, '户外运动', '自行车', 399.90, 1000, '2020-11-10 00:00:00'),
(2, '户外运动', '山地自行车', 1399.90, 2500, '2020-11-10 00:00:00'),
(2, '户外运动', '登山杖', 59.90, 1500, '2020-11-10 00:00:00'),
(2, '户外运动', '骑行装备', 399.90, 3500, '2020-11-10 00:00:00'),
(2, '户外运动', '运动外套', 799.90, 500, '2020-11-10 00:00:00'),
(2, '户外运动', '滑板', 499.90, 1200, '2020-11-10 00:00:00');
各种练习:
-- 3.1 查询 goods 数据表中按商品分类下价格降序排列的各个商品信息。
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY price DESC) AS row_num
FROM goods;
-- 3.2 查询 goods 数据表中每个商品分类下价格最高的3种商品信息。
SELECT *
FROM
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY price DESC) AS top3Price
FROM goods
) AS t
WHERE top3Price <= 3;
-- 3.3 使用RANK()函数获取goods 数据表中各类别的价格从高到低排序的各商品信息
SELECT *,
RANK() OVER (PARTITION BY category ORDER BY price DESC) AS rank_num
FROM goods;
-- 3.4 使用RANK()函数获取 goods 数据表中类别为“女装/女士精品”的价格最高的4款商品信息。
# 常规思路
SELECT *
FROM goods
WHERE category = '女装/女士精品'
ORDER BY price DESC
LIMIT 4
#窗口函数rank: 并列
SELECT *,
RANK() OVER (PARTITION BY category ORDER BY price DESC) AS top4Price
FROM goods
WHERE category = '女装/女士精品'
LIMIT 4;
-- 3.5 使用DENSE_RANK()函数获取 goods 数据表中各类别的价格从高到低排序的各商品信息。
SELECT *,
DENSE_RANK() OVER (PARTITION BY category ORDER BY price DESC) AS rank_num
FROM goods;
-- 3.6 使用DENSE_RANK()函数获取 goods数据表中类别为"女装/女士精品"的价格最高的4款商品信息。
SELECT *,
DENSE_RANK() OVER (PARTITION BY category ORDER BY price DESC) AS rank_num
FROM goods
WHERE category = '女装/女士精品'
LIMIT 4;