在学习开窗函数前可以先复习下之前的内容:Mysql 常用函数和基础查询,还有遗漏的一些函数如下:
转换函数:CAST(expression AS data_type)
用于将某种数据类型的表达式显式转换为另一种数据类型,CAST()函数的参数是一个表达式,它包括用 AS关键字分隔的源值和目标数据类型
select cast('9.0' AS decimal);
| cast('9.0' AS decimal) |
| 9 |
判断第一个表达式是否为 NULL,如果为 NULL 则返回第二个参数的值,如果不为 NULL 则返回第一个 参数的值。
开窗函数名([<字段名>]) over([partition by <分组字段>] [order by <排序字段> [desc]] [< 窗口分区>])
开窗函数的一个概念是当前行,当前行属于某个窗口,窗口由over关键字用来指定函数执行的窗口范围,如果后面括号中什么都不写,则意味着窗口包含满足where条件的所有行,开窗函数基于所有行进 行计算;如果不为空,则有三个参数来设置窗口:
partition by <分组字段>
:窗口按照哪些字段进行分组,开窗函数在不同的分组上分别执行。order by <排序字段>
:按照哪些字段进行排序,开窗函数将按照排序后的记录顺序进行编号。可以和 partition by <分组字段>配合使用,也可以单独使用。< 窗口分区>
:它是排序之后的功能扩展,标识在排序之后的一个范围rows|range between start_expr and end_expr
是物理范围,即根据order by子句排序后,取的前N行及后N行的数据计算(与当前行的值无关,只与排序后的行号相关);range
是逻辑范围,根据order by子句排序后,指定当前行对应值的范围取值,行数不固定,只要行值在范围内,对应行都包含在内。between start_expr and end_expr
current row
:以当前行为起点。unbounded preceding
:指明窗口开始于分组的第一行,以排序之后的第一行为起点。unbounded following
:以排序之后的最后一行为终点。n preceding
:以当前行的前面第n行为起点。n following
rows between 1 preceding and 1 following 窗口范围是当前行、前一行、后一行一共三行记录。
rows unbounded preceding 窗口范围是分区中的第一行到当前行。
rows between unbounded preceding and unbounded following 窗口范围是当前分区中所有行, 等同于不写。
create table test(id int,name varchar(10),sale int);
insert into test values(1,'aaa',100);
insert into test values(1,'bbb',200);
insert into test values(1,'ccc',200);
insert into test values(1,'ddd',300);
insert into test values(2,'eee',400);
insert into test values(2,'fff',200);
create table test2( id int,val int);
insert into test2
values (1, 1),
(1, 2),
(1, 3),
(1, 4),
(1, 5),
(2, 6),
(2, 7),
(2, 8),
(2, 9),
(1, 3),
(1, 5);
select t.id
, t.name
, t.sale
, row_number() over (order by sale) as row_number1 -- 只有排序,没有分组,则全部数据视为一个分组进行排序
, row_number() over (partition by id order by sale) as row_number2 -- 指定分组排序
, rank() over (order by sale) as rank1
, rank() over (partition by id order by sale) as rank2
, dense_rank() over (order by sale) as dense_rank1
, dense_rank() over (partition by id order by sale) as dense_rank2
from test as t;
| id | name | sale | row_number1 | row_number2 | rank1 | rank2 | dense_rank1 | dense_rank2 |
| 1 | aaa | 100 | 1 | 1 | 1 | 1 | 1 | 1 |
| 1 | bbb | 200 | 2 | 2 | 2 | 2 | 2 | 2 |
| 1 | ccc | 200 | 3 | 3 | 2 | 2 | 2 | 2 |
| 1 | ddd | 300 | 5 | 4 | 5 | 4 | 3 | 3 |
| 2 | fff | 200 | 4 | 1 | 2 | 1 | 2 | 1 |
| 2 | eee | 400 | 6 | 2 | 6 | 2 | 4 | 2 |
rank1列中 sale=200 有三个值,分组排序后都一样,但后面的排序序号不连续,直接跳到了5,rank2同理理解。
dense_rank1列中 sale=200 有三个值,分组排序后都一样,但后面的排序序号是连续的,dense_rank2同理理解。
想要深度了解排名,可以参考:Mysql 常见排名实现。
lead(expr,<offset>,<default>) over(partition by col1 order by col2)
lag(expr,<offset>,<default>) over(partition by col1 order by col2)
select t.id
, t.name
, t.sale
, lead(sale) over (order by sale) as lead1 -- 只有排序,没有分组,则全部数据视为一个分组进行排序
, lead(sale) over (partition by id order by sale) as lead2 -- 指定分组排序
, lead(sale, 2, 'empty') over (order by sale) as lead3 -- 分组排序,向下偏移2,没有数据显示 empty
, lag(sale) over (order by sale) as lag1
, lag(sale) over (partition by id order by sale) as lag2
, lag(sale, 2, 'empty') over (order by sale) as lag3
from test as t;
| id | name | sale | lead1 | lead2 | lead3 | lag1 | lag2 | lag3 |
| 1 | aaa | 100 | 200 | 200 | 200 | NULL | NULL | empty |
| 1 | bbb | 200 | 200 | 200 | 200 | 100 | 100 | empty |
| 1 | ccc | 200 | 200 | 300 | 300 | 200 | 200 | 100 |
| 1 | ddd | 300 | 400 | NULL | empty | 200 | 200 | 200 |
| 2 | fff | 200 | 300 | 400 | 400 | 200 | NULL | 200 |
| 2 | eee | 400 | NULL | NULL | empty | 300 | 200 | 200 |
first_value(expr) over(partition by col1 order by col2)
last_value(expr) over(partition by col1 order by col2)
select t.id
, t.name
, t.sale
, first_value(sale) over () as first_value1 -- 没有分组和排序
, first_value(sale) over (order by sale) as first_value2 -- 没有分组只有排序
, first_value(sale) over (partition by id order by sale) as first_value3 -- 分组排序,第一个值
, last_value(sale) over () as last_value1
, last_value(sale) over (order by sale) as last_value2
, last_value(sale) over (partition by id order by sale) as last_value3
from test as t;
| id | name | sale | first_value1 | first_value2 | first_value3 | last_value1 | last_value2 | last_value3 |
| 1 | aaa | 100 | 100 | 100 | 100 | 400 | 100 | 100 |
| 1 | bbb | 200 | 100 | 100 | 100 | 400 | 200 | 200 |
| 1 | ccc | 200 | 100 | 100 | 100 | 400 | 200 | 200 |
| 1 | ddd | 300 | 100 | 100 | 100 | 400 | 300 | 300 |
| 2 | fff | 200 | 100 | 100 | 200 | 400 | 200 | 200 |
| 2 | eee | 400 | 100 | 100 | 200 | 400 | 400 | 400 |
ntile(ntile_num) over(partition by col1 order by col2)
是一个整数,用于创建“桶”的数量,即分组的数量,不能小于等于0。其次需要注意的是,在over函数内,尽量要有排序order by 子句。
select t.id
, t.name
, t.sale
, ntile(4) over (order by sale) as ntile1
, ntile(4) over (partition by id order by sale) as ntile2
from test as t;
| id | name | sale | ntile1 | ntile2 |
| 1 | aaa | 100 | 1 | 1 |
| 1 | bbb | 200 | 1 | 2 |
| 1 | ccc | 200 | 2 | 3 |
| 1 | ddd | 300 | 3 | 4 |
| 2 | fff | 200 | 2 | 1 |
| 2 | eee | 400 | 4 | 2 |
不使用< 窗口分区>
select id
, val
, max(val) over () as max1 -- 没有分组和排序
, max(val) over (partition by id) as max2 -- 有分组没有排序
, max(val) over (partition by id order by val) as max3 -- 既有分组,也有排序
, min(val) over () as min1
, min(val) over (partition by id) as min2
, min(val) over (partition by id order by val) as min3
, sum(val) over () as sum1
, sum(val) over (partition by id) as sum2
, sum(val) over (partition by id order by val) as sum3
, avg(val) over () as avg1
, avg(val) over (partition by id) as avg2
, avg(val) over (partition by id order by val) as avg3
, count(val) over () as count1
, count(val) over (partition by id) as count2
, count(val) over (partition by id order by val) as count3
from test2;
| id | val | max1 | max2 | max3 | min1 | min2 | min3 | sum1 | sum2 | sum3 | avg1 | avg2 | avg3 | count1 | count2 | count3 |
| 1 | 1 | 9 | 5 | 1 | 1 | 1 | 1 | 53 | 23 | 1 | 4.8182 | 3.2857 | 1.0000 | 11 | 7 | 1 |
| 1 | 2 | 9 | 5 | 2 | 1 | 1 | 1 | 53 | 23 | 3 | 4.8182 | 3.2857 | 1.5000 | 11 | 7 | 2 |
| 1 | 3 | 9 | 5 | 3 | 1 | 1 | 1 | 53 | 23 | 9 | 4.8182 | 3.2857 | 2.2500 | 11 | 7 | 4 |
| 1 | 3 | 9 | 5 | 3 | 1 | 1 | 1 | 53 | 23 | 9 | 4.8182 | 3.2857 | 2.2500 | 11 | 7 | 4 |
| 1 | 4 | 9 | 5 | 4 | 1 | 1 | 1 | 53 | 23 | 13 | 4.8182 | 3.2857 | 2.6000 | 11 | 7 | 5 |
| 1 | 5 | 9 | 5 | 5 | 1 | 1 | 1 | 53 | 23 | 23 | 4.8182 | 3.2857 | 3.2857 | 11 | 7 | 7 |
| 1 | 5 | 9 | 5 | 5 | 1 | 1 | 1 | 53 | 23 | 23 | 4.8182 | 3.2857 | 3.2857 | 11 | 7 | 7 |
| 2 | 6 | 9 | 9 | 6 | 1 | 6 | 6 | 53 | 30 | 6 | 4.8182 | 7.5000 | 6.0000 | 11 | 4 | 1 |
| 2 | 7 | 9 | 9 | 7 | 1 | 6 | 6 | 53 | 30 | 13 | 4.8182 | 7.5000 | 6.5000 | 11 | 4 | 2 |
| 2 | 8 | 9 | 9 | 8 | 1 | 6 | 6 | 53 | 30 | 21 | 4.8182 | 7.5000 | 7.0000 | 11 | 4 | 3 |
| 2 | 9 | 9 | 9 | 9 | 1 | 6 | 6 | 53 | 30 | 30 | 4.8182 | 7.5000 | 7.5000 | 11 | 4 | 4 |
max1 在没有分组和排序情况下,取全部数据的最大值;
max2 在有分组没有排序情况下,取分组区间内数据的最大值;
max3 在既有分组,也有排序,则去分组排序过程中的最大值,即分组排序当前行与首行内数据的最大值;
使用< 窗口分区>
rows|range between start_expr and end_expr
select id
, val
, sum(val) over (partition by id order by val) as sum1-- 不指定窗口分区,相当于默认的规则,即range between unbounded preceding and current row
, sum(val) over (partition by id order by val rows between unbounded preceding and current row) as sum2 -- 指定窗口分区为rows物理行范围的第一行到当前行
, sum(val) over (partition by id order by val range between unbounded preceding and current row) as sum3 -- 指定窗口分区为range范围的第一个数值到当前行对应的数值
, sum(val) over (partition by id order by val rows between 1 preceding and 1 following) as sum4 -- 指定窗口分区为rows物理行的前一行到当前行的后一行
, sum(val) over (partition by id order by val range between 1 preceding and 1 following) as sum5 -- 指定窗口分区为range范围的上一个数值到当前行对应的数值的下一个数值
from test2;
select id
, val
, sum(val) over w1 as sum1-- 不指定窗口分区,相当于默认的规则,即range between unbounded preceding and current row
, sum(val) over w2 as sum2 -- 指定窗口分区为rows物理行范围的第一行到当前行
, sum(val) over w3 as sum3 -- 指定窗口分区为range范围的第一个数值到当前行对应的数值
, sum(val) over w4 as sum4 -- 指定窗口分区为rows物理行的前一行到当前行的后一行
, sum(val) over w5 as sum5 -- 指定窗口分区为range范围的上一个数值到当前行对应的数值的下一个数值
from test2
window w1 as (partition by id order by val)
,w2 as (partition by id order by val rows between unbounded preceding and current row)
,w3 as (partition by id order by val range between unbounded preceding and current row)
,w4 as (partition by id order by val rows between 1 preceding and 1 following)
,w5 as (partition by id order by val range between 1 preceding and 1 following);
| id | val | sum1 | sum2 | sum3 | sum4 | sum5 |
| 1 | 1 | 1 | 1 | 1 | 3 | 3 |
| 1 | 2 | 3 | 3 | 3 | 6 | 9 |
| 1 | 3 | 9 | 6 | 9 | 8 | 12 |
| 1 | 3 | 9 | 9 | 9 | 10 | 12 |
| 1 | 4 | 13 | 13 | 13 | 12 | 20 |
| 1 | 5 | 23 | 18 | 23 | 14 | 14 |
| 1 | 5 | 23 | 23 | 23 | 10 | 14 |
| 2 | 6 | 6 | 6 | 6 | 13 | 13 |
| 2 | 7 | 13 | 13 | 13 | 21 | 21 |
| 2 | 8 | 21 | 21 | 21 | 24 | 24 |
| 2 | 9 | 30 | 30 | 30 | 17 | 17 |