mysql-窗口函数 快速学习

窗口函数

MySQL 8.0 开始支持窗口函数,对于查询中的每一行,它使用与该行相关的行执行计算。 以下部分讨论如何使用窗口函数,包括 OVER 和 WINDOW 子句的描述。第一部分提供非聚合窗口函数的描述。

窗口函数语法

window_function_name(expression)
                    OVER (
                             [partition_defintion]
                             [order_definition]
                             [frame_definition]
                         ) 

在这个语法中:

  • 首先,指定窗口函数名称,后跟表达式。
  • 其次,指定OVER具有三个可能元素的子句:分区定义,排序定义和帧定义。

OVER子句后面的开括号和右括号是强制性的,若OVER子句是空的,它将整个查询行集视为一个分区。因此窗口函数会对每一行产生一个全局和,例如:

window_function_name(expression) OVER() 

窗口函数

非聚合窗口函数

名称 描述
CUME_DIST() 计算一组值中值的累积分布
DENSE_RANK() 根据ORDER BY子句为其分区中的每一行分配一个排名。它为具有相同值的行分配相同的排名。如果两行或更多行具有相同的等级,则排序值序列中将没有间隙。
FIRST_VALUE() 返回指定表达式相对于窗口框架中第一行的值。
LAG() 返回分区中当前行之前的第N行的值。 如果不存在前一行,则返回NULL。
LAST_VALUE() 返回指定表达式相对于窗口框架中最后一行的值。
LEAD() 返回分区中当前行之后的第N行的值。 如果不存在后续行,则返回NULL。
NTH_VALUE() 返回窗口框架第N行的参数值
NTILE() 将每个窗口分区的行分配到指定数量的已排名组中。
PERCENT_RANK() 计算分区或结果集中行的百分位数
RANK() 与DENSE_RANK()函数类似,只是当两行或更多行具有相同的排名时,排序值序列中存在间隙
ROW_NUMBER() 为其分区中的每一行分配一个连续整数

聚合函数

名称 描述
AVG() 计算一组值或表达式的平均值。
COUNT() 计算表中的行数。
INSTR() 返回字符串中第一次出现的子字符串的位置。
SUM() 计算一组值或表达式的总和。
MIN() 在一组值中找到最小值
MAX() 在一组值中找到最大值

句法

over_clause 句法

over_clause:
    {OVER (window_spec) | OVER window_name}
  • 在第一种情况下,window_spec窗口规范直接出现在OVER子句中的括号之间。
  • 在第二种情况下,是由查询中其他地方的子句窗口规范定义的window_name窗口名称。

对于OVER (window_spec)语法,窗口规范有几个部分,都是可选的:

window_spec:
    [window_name] [partition_clause] [order_clause] [frame_clause]

window_name

window_name:由查询中其他地方的子句定义的窗口的名称。

  • 如果命名窗口定义包含partition_clause、order_clause或frame_clause属性,则OVER引用窗口名称的子句后续也不能包含相同类型的属性。
## 允许的语法 使用窗口 w + order_clause
OVER (w ORDER BY country)
...WINDOW w AS (PARTITION BY country)

## 不允许的语法 使用窗口 w + partition_clause
OVER (w PARTITION BY year)
...WINDOW w AS (PARTITION BY country)
  • 命名窗口的定义本身可以以 window_name 开头。在这种情况下,允许向前和向后引用,但不允许循环。
## 允许的语法
WINDOW w1 AS (w2), w2 AS (), w3 AS (w1)

## 错误的语法
WINDOW w1 AS (w2), w2 AS (w3), w3 AS (w1)

partition_clause 句法

PARTITION BY子句指示如何将查询行划分为组。给定行的窗口函数结果基于包含该行的分区的行。如果省略PARTITION BY,则由所有查询行组成的一个分区。partition_clause语法如下所示:

partition_clause:
    PARTITION BY expr [, expr] ...

标准 SQL 要求PARTITION BY后跟列名。MySQL扩展是允许表达式,而不仅仅是列名。例如,如果一个表包含一个名为 ts 的TIMESTAMP列,则标准 SQL 允许PARTITION BY ts,但不允许PARTITION BY HOUR(ts),而 MySQL 允许两者。

order_clause 句法

ORDER BY子句指定行在分区中的排序方式。可以在多个键上的分区内对数据进行排序,每个键由表达式指定。多个表达式用逗号分隔。根据 ORDER BY 子句,相等的分区行被视为对等项。如果省略 ORDER BY,则分区行是无序的,不隐含任何处理顺序,并且所有分区行都是对等的。order_clause具有以下语法:

order_clause:
    ORDER BY expr [ASC|DESC] [, expr [ASC|DESC]] ...

frame_clause 句法

frame_clause是当前分区的子集。frame是相对于当前行确定的,这使得frame能够根据当前行在其分区中的位置在分区内移动。比如:

  • 通过将frame定义为从分区开始到当前行的所有行,可以计算每行的运行总计。
  • 通过将frame定义为在当前行的任一侧扩展 N 行,可以计算滚动平均值。

frame_clause(如果给定)具有以下语法:

frame_clause:
    frame_units frame_extent

frame_units:
    {ROWS | RANGE} 
  1. frame_units值指示当前行和框架行之间的关系类型:

    • ROWS:框架由开始行和结束行位置定义。偏移量是行号与当前行号的差异。
    • RANGE:框架由值范围内的行定义。偏移量是行值与当前行值的差异。
  2. frame_extent值指示帧的起点和终点。可以仅指定帧的开头(在这种情况下,当前行隐式指定结尾),也可以使用 BETWEEN 指定两个帧端点:

frame_extent:
    {frame_start | frame_between}

frame_between:
    BETWEEN frame_start AND frame_end

frame_start, frame_end: {
    CURRENT ROW
  | UNBOUNDED PRECEDING
  | UNBOUNDED FOLLOWING
  | expr PRECEDING
  | expr FOLLOWING
}

使用 BETWEEN 语法时,frame_start不得晚于frame_end.
允许frame_start和frame_end值具有以下含义:

  • CURRENT ROW: 当前计算的行。
  • UNBOUNDED PRECEDING: 开始于分区的第一行。
  • UNBOUNDED FOLLOWING: 结束于分区的最后一行。
  • expr PRECEDING: 对于ROWS,当前行之前的expr行。expr可以是文字数字或计算结果的表达式。对于 RANGE,值等于当前行值减去 expr 的行;如果当
    前行值为 NULL,则是该行的对等项。对于 expr PRECEDING(和 expr FOLLOWING),expr 可以是 ? 参数标记(用于预编译语句)、非负数字文本或INTERVAL val unit的时间间隔。对于 INTERVAL 表达式,val 指定非负间隔值,unit 是指示应在其中解释值的单位的关键字。
  • expr FOLLOWING: 对于ROWS,是当前行之后的 expr 行。对于 RANGE,值等于当前行值加 expr 的行;如果当前行值为 NULL,则是该行的对等项。

在没有frame_clause子句的情况下,默认框架取决于是否存在 ORDER BY 子句:

  • 使用 ORDER BY: 默认frame_clause包括从分区开始到当前行的行,包括当前行的所有对等项。默认值等效于:
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  • 不使用ORDER BY: 默认frame_clause包括所有分区行(因为如果没有ORDER BY则所有分区行都是对等的)。默认值等效于:
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

由于默认frame_claus因是否存在 ORDER BY 而异,因此将 ORDER BY 添加到查询可能会更改最终的结果。

当当前行值为 NULL 时,frame_claus的含义可能不同。假设是这种情况,这些示例说明了各种框架规范如何应用:

1. ORDER BY X ASC RANGE BETWEEN 10 FOLLOWING AND 15 FOLLOWING
   帧从NULL开始,以NULL结束,因此仅包含值为 NULL 的行.

2. ORDER BY X ASC RANGE BETWEEN 10 FOLLOWING AND UNBOUNDED FOLLOWING
   帧从 NULL 开始,在分区的末尾结束。因为 ASC 排序将 NULL 值放在第一位,所以帧就是整个分区。

3. ORDER BY X DESC RANGE BETWEEN 10 FOLLOWING AND UNBOUNDED FOLLOWING
   帧从 NULL 开始,在分区的末尾停止。由于 DESC 排序将NULL值放在最后,因此帧只是NULL值。

4. ORDER BY X ASC RANGE BETWEEN 10 PRECEDING AND UNBOUNDED FOLLOWING
   帧从 NULL 开始,在分区的末尾停止。因为 ASC 排序将 NULL 值放在第一位,所以帧就是整个分区。

5. ORDER BY X ASC RANGE BETWEEN 10 PRECEDING AND 10 FOLLOWING
   帧从NULL开始,以NULL结束,因此仅包含值为 NULL 的行.

6. ORDER BY X ASC RANGE BETWEEN 10 PRECEDING AND 1 PRECEDING
   帧从NULL开始,以NULL结束,因此仅包含值为 NULL 的行.

7. ORDER BY X ASC RANGE BETWEEN UNBOUNDED PRECEDING AND 10 FOLLOWING
   帧从分区的开头开始,在值为 NULL 的行处停止。由于 ASC 排序将NULL值放在首位,因此帧只是NULL值。

示例

序号函数- ROW_NUMBER、RANK、DENSE_RANK

select goods_name,
       goods_id,
       pay_price,
       -- 根据pay_price为每个商品的每一行分配一个连续整数
       row_number() over (partition by goods_id order by pay_price desc) as goods_row_number,
       -- 根据pay_price为每个商品的每一行分配一个排名。具有相同值的行分配相同的排名。当两行或更多行具有相同的排名时,排序值序列中存在间隙
       rank() over (w1 order by pay_price desc )                         as goods_rank,
       -- 根据pay_price为每个商品的每一行分配一个排名。具有相同值的行分配相同的排名。当两行或更多行具有相同的排名时,排序值序列中不存在间隙
       dense_rank() over (w2)                                            as goods_dense_rank
from ods_goods_order
where order_status between 0 and 2
  and pay_price > 0
    WINDOW w1 AS (partition by goods_id)
        , w2 as (w1 order by pay_price desc);

-- 根据商品总销量为每个商品分配一个排名
select goods_id,
       sum(pay_price)            goods_sales,
       row_number() over (w3) as goods_row_number,
       rank() over (w3)       as goods_rank,
       dense_rank() over (w3) as goods_dense_rank
from ods_goods_order
where order_status between 0 and 2
  and pay_price > 0
group by goods_id
    WINDOW w3 as (order by sum(pay_price) desc rows between unbounded preceding and unbounded following);

-- 根据商品总销量为每一行分配一个排名
with temp_goods_sales as (select id,
                                 goods_name,
                                 goods_id,
                                 pay_price,
                                 sum(pay_price)
                                     over (partition by goods_id order by pay_price desc rows between unbounded preceding and unbounded following) goods_sales
                          from ods_goods_order
                          where order_status between 0 and 2
                            and pay_price > 0)
select *,
       row_number() over (order by goods_sales desc) as goods_row_number,
       rank() over (order by goods_sales desc)       as goods_rank,
       dense_rank() over (order by goods_sales desc) as goods_dense_rank
from temp_goods_sales
order by goods_row_number;

聚合函数- SUM,AVG,MIN,MAX

-- sum()
select id,
       pay_price,
       goods_id,
       goods_name,
-- 统计截止到当前,商品的销售额
       sum(pay_price) over (partition by goods_id order by id rows unbounded preceding)          as goods_current_sales,
-- 统计商品的总销售额
       sum(pay_price)
           over (partition by goods_id rows between unbounded preceding and unbounded following) as goods_total_sale
from ods_goods_order
where order_status between 0 and 2
  and pay_price > 0;

-- avg()
select id,
       pay_price,
       market_id,
       market_name,
-- 统计截止到当前,商户的平均销售额
       avg(pay_price) over (partition by market_id order by id rows unbounded preceding)          as market_current_avg_sales,
-- 统计商户的平均销售额
       avg(pay_price)
           over (partition by market_id rows between unbounded preceding and unbounded following) as market_total_avg_sales
from ods_goods_order
where order_status between 0 and 2
  and pay_price > 0;

-- max() min() 获取每个商户 商品最高的销量额和商品最低的销量额
with temp_goods_sale as (select id,
                                pay_price,
                                goods_id,
                                goods_name,
                                market_id,
                                sum(pay_price)
                                    over (partition by goods_id order by id rows unbounded preceding)                     as goods_current_sales,
                                sum(pay_price)
                                    over (partition by goods_id rows between unbounded preceding and unbounded following) as goods_total_sale
                         from ods_goods_order
                         where order_status between 0 and 2
                           and pay_price > 0)
select id,
       pay_price,
       goods_id,
       goods_name,
       market_id,
       goods_current_sales,
       goods_total_sale,
       max(goods_current_sales)
           over (partition by market_id order by goods_id, id rows unbounded preceding)           as shop_current_max_price,
       max(goods_total_sale)
           over (partition by market_id rows between unbounded preceding and unbounded following) as shop_total_max_price,
       min(goods_current_sales)
           over (partition by market_id order by goods_id, id rows unbounded preceding)           as shop_current_min_price,
       min(goods_total_sale)
           over (partition by market_id rows between unbounded preceding and unbounded following) as shop_current_min_price
from temp_goods_sale
order by market_id, id;

分布函数- CUME_DIST

分组内小于、等于当前rank值的行数 / 分组内总行数

select id,
       pay_price,
       goods_id,
       goods_name,
       rank() over (partition by goods_id order by id )      as rn,
       CUME_DIST() over (partition by goods_id order by id ) as rn2
from ods_goods_order
where order_status between 0 and 2
  and pay_price > 0;

前后函数-LAG和LEAD

返回位于当前行的前n行(LAG(expr,n))或后n行(LEAD(expr,n))的expr的值

with temp_goods_sale as (select id,
                                goods_id,
                                goods_name,
                                market_id,
                                sum(pay_price)
                                    over (partition by goods_id rows between unbounded preceding and unbounded following) as goods_sales
                         from ods_goods_order
                         where order_status between 0 and 2
                           and pay_price > 0)
select *,
       lag(goods_sales, 1) over (partition by market_id order by goods_sales, goods_id, id)  as last_goods_sales,
       lead(goods_sales, 1) over (partition by market_id order by goods_sales, goods_id, id) as next_goods_sales
from temp_goods_sale;

头尾函数-FIRST_VALUE和LAST_VALUE

返回第一个(FIRST_VALUE(expr))或最后一个(LAST_VALUE(expr))expr的值

with temp_goods_sale as (select id,
                                goods_id,
                                goods_name,
                                market_id,
                                sum(pay_price)
                                    over (partition by goods_id rows between unbounded preceding and unbounded following) as goods_sales
                         from ods_goods_order
                         where order_status between 0 and 2
                           and pay_price > 0)
select *,
       first_value(goods_sales)
                   over (partition by market_id order by goods_sales RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as min_goods_sales,
       last_value(goods_sales)
                  over (partition by market_id order by goods_sales RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)  as max_goods_sales
from temp_goods_sale;

其他函数-NTH_VALUE(expr, n)、NTILE(n)

将分区中的有序数据分为n个等级,记录等级数

with temp_goods_sale as (select id,
                                goods_id,
                                goods_name,
                                market_id,
                                sum(pay_price)
                                    over (partition by goods_id rows between unbounded preceding and unbounded following) as goods_sales
                         from ods_goods_order
                         where order_status between 0 and 2
                           and pay_price > 0)
select *,
       ntile(3) over (partition by market_id order by goods_sales desc) as ntile
from temp_goods_sale;

你可能感兴趣的:(大数据,mysql,学习,数据库)