MySQL 8.0 开始支持窗口函数,对于查询中的每一行,它使用与该行相关的行执行计算。 以下部分讨论如何使用窗口函数,包括 OVER 和 WINDOW 子句的描述。第一部分提供非聚合窗口函数的描述。
window_function_name(expression)
OVER (
[partition_defintion]
[order_definition]
[frame_definition]
)
在这个语法中:
OVER子句后面的开括号和右括号是强制性的,若OVER子句是空的,它将整个查询行集视为一个分区。因此窗口函数会对每一行产生一个全局和,例如:
window_function_name(expression) OVER()
名称 | 描述 |
---|---|
CUME_DIST() | 计算一组值中值的累积分布 |
DENSE_RANK() | 根据ORDER BY子句为其分区中的每一行分配一个排名。它为具有相同值的行分配相同的排名。如果两行或更多行具有相同的等级,则排序值序列中将没有间隙。 |
FIRST_VALUE() | 返回指定表达式相对于窗口框架中第一行的值。 |
LAG() | 返回分区中当前行之前的第N行的值。 如果不存在前一行,则返回NULL。 |
LAST_VALUE() | 返回指定表达式相对于窗口框架中最后一行的值。 |
LEAD() | 返回分区中当前行之后的第N行的值。 如果不存在后续行,则返回NULL。 |
NTH_VALUE() | 返回窗口框架第N行的参数值 |
NTILE() | 将每个窗口分区的行分配到指定数量的已排名组中。 |
PERCENT_RANK() | 计算分区或结果集中行的百分位数 |
RANK() | 与DENSE_RANK()函数类似,只是当两行或更多行具有相同的排名时,排序值序列中存在间隙 |
ROW_NUMBER() | 为其分区中的每一行分配一个连续整数 |
名称 | 描述 |
---|---|
AVG() | 计算一组值或表达式的平均值。 |
COUNT() | 计算表中的行数。 |
INSTR() | 返回字符串中第一次出现的子字符串的位置。 |
SUM() | 计算一组值或表达式的总和。 |
MIN() | 在一组值中找到最小值 |
MAX() | 在一组值中找到最大值 |
… | … |
over_clause:
{OVER (window_spec) | OVER window_name}
对于OVER (window_spec)语法,窗口规范有几个部分,都是可选的:
window_spec:
[window_name] [partition_clause] [order_clause] [frame_clause]
window_name:由查询中其他地方的子句定义的窗口的名称。
## 允许的语法 使用窗口 w + order_clause
OVER (w ORDER BY country)
...WINDOW w AS (PARTITION BY country)
## 不允许的语法 使用窗口 w + partition_clause
OVER (w PARTITION BY year)
...WINDOW w AS (PARTITION BY country)
## 允许的语法
WINDOW w1 AS (w2), w2 AS (), w3 AS (w1)
## 错误的语法
WINDOW w1 AS (w2), w2 AS (w3), w3 AS (w1)
PARTITION BY子句指示如何将查询行划分为组。给定行的窗口函数结果基于包含该行的分区的行。如果省略PARTITION BY,则由所有查询行组成的一个分区。partition_clause语法如下所示:
partition_clause:
PARTITION BY expr [, expr] ...
标准 SQL 要求PARTITION BY后跟列名。MySQL扩展是允许表达式,而不仅仅是列名。例如,如果一个表包含一个名为 ts 的TIMESTAMP列,则标准 SQL 允许PARTITION BY ts,但不允许PARTITION BY HOUR(ts),而 MySQL 允许两者。
ORDER BY子句指定行在分区中的排序方式。可以在多个键上的分区内对数据进行排序,每个键由表达式指定。多个表达式用逗号分隔。根据 ORDER BY 子句,相等的分区行被视为对等项。如果省略 ORDER BY,则分区行是无序的,不隐含任何处理顺序,并且所有分区行都是对等的。order_clause具有以下语法:
order_clause:
ORDER BY expr [ASC|DESC] [, expr [ASC|DESC]] ...
frame_clause是当前分区的子集。frame是相对于当前行确定的,这使得frame能够根据当前行在其分区中的位置在分区内移动。比如:
frame_clause(如果给定)具有以下语法:
frame_clause:
frame_units frame_extent
frame_units:
{ROWS | RANGE}
frame_units值指示当前行和框架行之间的关系类型:
frame_extent值指示帧的起点和终点。可以仅指定帧的开头(在这种情况下,当前行隐式指定结尾),也可以使用 BETWEEN 指定两个帧端点:
frame_extent:
{frame_start | frame_between}
frame_between:
BETWEEN frame_start AND frame_end
frame_start, frame_end: {
CURRENT ROW
| UNBOUNDED PRECEDING
| UNBOUNDED FOLLOWING
| expr PRECEDING
| expr FOLLOWING
}
使用 BETWEEN 语法时,frame_start不得晚于frame_end.
允许frame_start和frame_end值具有以下含义:
在没有frame_clause子句的情况下,默认框架取决于是否存在 ORDER BY 子句:
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
由于默认frame_claus因是否存在 ORDER BY 而异,因此将 ORDER BY 添加到查询可能会更改最终的结果。
当当前行值为 NULL 时,frame_claus的含义可能不同。假设是这种情况,这些示例说明了各种框架规范如何应用:
1. ORDER BY X ASC RANGE BETWEEN 10 FOLLOWING AND 15 FOLLOWING
帧从NULL开始,以NULL结束,因此仅包含值为 NULL 的行.
2. ORDER BY X ASC RANGE BETWEEN 10 FOLLOWING AND UNBOUNDED FOLLOWING
帧从 NULL 开始,在分区的末尾结束。因为 ASC 排序将 NULL 值放在第一位,所以帧就是整个分区。
3. ORDER BY X DESC RANGE BETWEEN 10 FOLLOWING AND UNBOUNDED FOLLOWING
帧从 NULL 开始,在分区的末尾停止。由于 DESC 排序将NULL值放在最后,因此帧只是NULL值。
4. ORDER BY X ASC RANGE BETWEEN 10 PRECEDING AND UNBOUNDED FOLLOWING
帧从 NULL 开始,在分区的末尾停止。因为 ASC 排序将 NULL 值放在第一位,所以帧就是整个分区。
5. ORDER BY X ASC RANGE BETWEEN 10 PRECEDING AND 10 FOLLOWING
帧从NULL开始,以NULL结束,因此仅包含值为 NULL 的行.
6. ORDER BY X ASC RANGE BETWEEN 10 PRECEDING AND 1 PRECEDING
帧从NULL开始,以NULL结束,因此仅包含值为 NULL 的行.
7. ORDER BY X ASC RANGE BETWEEN UNBOUNDED PRECEDING AND 10 FOLLOWING
帧从分区的开头开始,在值为 NULL 的行处停止。由于 ASC 排序将NULL值放在首位,因此帧只是NULL值。
select goods_name,
goods_id,
pay_price,
-- 根据pay_price为每个商品的每一行分配一个连续整数
row_number() over (partition by goods_id order by pay_price desc) as goods_row_number,
-- 根据pay_price为每个商品的每一行分配一个排名。具有相同值的行分配相同的排名。当两行或更多行具有相同的排名时,排序值序列中存在间隙
rank() over (w1 order by pay_price desc ) as goods_rank,
-- 根据pay_price为每个商品的每一行分配一个排名。具有相同值的行分配相同的排名。当两行或更多行具有相同的排名时,排序值序列中不存在间隙
dense_rank() over (w2) as goods_dense_rank
from ods_goods_order
where order_status between 0 and 2
and pay_price > 0
WINDOW w1 AS (partition by goods_id)
, w2 as (w1 order by pay_price desc);
-- 根据商品总销量为每个商品分配一个排名
select goods_id,
sum(pay_price) goods_sales,
row_number() over (w3) as goods_row_number,
rank() over (w3) as goods_rank,
dense_rank() over (w3) as goods_dense_rank
from ods_goods_order
where order_status between 0 and 2
and pay_price > 0
group by goods_id
WINDOW w3 as (order by sum(pay_price) desc rows between unbounded preceding and unbounded following);
-- 根据商品总销量为每一行分配一个排名
with temp_goods_sales as (select id,
goods_name,
goods_id,
pay_price,
sum(pay_price)
over (partition by goods_id order by pay_price desc rows between unbounded preceding and unbounded following) goods_sales
from ods_goods_order
where order_status between 0 and 2
and pay_price > 0)
select *,
row_number() over (order by goods_sales desc) as goods_row_number,
rank() over (order by goods_sales desc) as goods_rank,
dense_rank() over (order by goods_sales desc) as goods_dense_rank
from temp_goods_sales
order by goods_row_number;
-- sum()
select id,
pay_price,
goods_id,
goods_name,
-- 统计截止到当前,商品的销售额
sum(pay_price) over (partition by goods_id order by id rows unbounded preceding) as goods_current_sales,
-- 统计商品的总销售额
sum(pay_price)
over (partition by goods_id rows between unbounded preceding and unbounded following) as goods_total_sale
from ods_goods_order
where order_status between 0 and 2
and pay_price > 0;
-- avg()
select id,
pay_price,
market_id,
market_name,
-- 统计截止到当前,商户的平均销售额
avg(pay_price) over (partition by market_id order by id rows unbounded preceding) as market_current_avg_sales,
-- 统计商户的平均销售额
avg(pay_price)
over (partition by market_id rows between unbounded preceding and unbounded following) as market_total_avg_sales
from ods_goods_order
where order_status between 0 and 2
and pay_price > 0;
-- max() min() 获取每个商户 商品最高的销量额和商品最低的销量额
with temp_goods_sale as (select id,
pay_price,
goods_id,
goods_name,
market_id,
sum(pay_price)
over (partition by goods_id order by id rows unbounded preceding) as goods_current_sales,
sum(pay_price)
over (partition by goods_id rows between unbounded preceding and unbounded following) as goods_total_sale
from ods_goods_order
where order_status between 0 and 2
and pay_price > 0)
select id,
pay_price,
goods_id,
goods_name,
market_id,
goods_current_sales,
goods_total_sale,
max(goods_current_sales)
over (partition by market_id order by goods_id, id rows unbounded preceding) as shop_current_max_price,
max(goods_total_sale)
over (partition by market_id rows between unbounded preceding and unbounded following) as shop_total_max_price,
min(goods_current_sales)
over (partition by market_id order by goods_id, id rows unbounded preceding) as shop_current_min_price,
min(goods_total_sale)
over (partition by market_id rows between unbounded preceding and unbounded following) as shop_current_min_price
from temp_goods_sale
order by market_id, id;
分组内小于、等于当前rank值的行数 / 分组内总行数
select id,
pay_price,
goods_id,
goods_name,
rank() over (partition by goods_id order by id ) as rn,
CUME_DIST() over (partition by goods_id order by id ) as rn2
from ods_goods_order
where order_status between 0 and 2
and pay_price > 0;
返回位于当前行的前n行(LAG(expr,n))或后n行(LEAD(expr,n))的expr的值
with temp_goods_sale as (select id,
goods_id,
goods_name,
market_id,
sum(pay_price)
over (partition by goods_id rows between unbounded preceding and unbounded following) as goods_sales
from ods_goods_order
where order_status between 0 and 2
and pay_price > 0)
select *,
lag(goods_sales, 1) over (partition by market_id order by goods_sales, goods_id, id) as last_goods_sales,
lead(goods_sales, 1) over (partition by market_id order by goods_sales, goods_id, id) as next_goods_sales
from temp_goods_sale;
返回第一个(FIRST_VALUE(expr))或最后一个(LAST_VALUE(expr))expr的值
with temp_goods_sale as (select id,
goods_id,
goods_name,
market_id,
sum(pay_price)
over (partition by goods_id rows between unbounded preceding and unbounded following) as goods_sales
from ods_goods_order
where order_status between 0 and 2
and pay_price > 0)
select *,
first_value(goods_sales)
over (partition by market_id order by goods_sales RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as min_goods_sales,
last_value(goods_sales)
over (partition by market_id order by goods_sales RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as max_goods_sales
from temp_goods_sale;
将分区中的有序数据分为n个等级,记录等级数
with temp_goods_sale as (select id,
goods_id,
goods_name,
market_id,
sum(pay_price)
over (partition by goods_id rows between unbounded preceding and unbounded following) as goods_sales
from ods_goods_order
where order_status between 0 and 2
and pay_price > 0)
select *,
ntile(3) over (partition by market_id order by goods_sales desc) as ntile
from temp_goods_sale;