1. 窗口函数定义与分类
2. 各类别窗口函数用法详解
2.1 序号函数:ROW_NUMBER()、RANK()、DENSE_RANK()
2.2 分布函数:PERCENT_RANK()、CUME_DIST()
2.3 前后函数:LAG(expr,n)、LEAD(expr,n)
2.4 头尾函数:FIRST_VALUE(expr)、LAST_VALUE(expr)
2.5 其它函数:NTH_VALUE(expr, n)、NTILE(n)
2.6 聚合函数作为窗口函数:SUM()、AVG()、MAX()、MIN()、COUNT()
2.7 Frame子句
3. 窗口函数实战
3.1 基础数据准备
3.2 窗口函数用法实战
4. 相关业务问题
4.1 解决方法
4.2 TOPN问题模板
4.3 聚合函数作为窗口函数
5. 总结
(1)定义
窗口函数也称OLAP函数(Online Anallytical Processing,联机分析处理),可对数据进行实时分析处理。
窗口表示记录集合;窗口函数表示在满足某些条件的记录集合上执行的特殊函数,对于每条记录都要在此窗口内执行函数;有的函数随着记录的不同,窗口大小都是固定的,称为
静态窗口
;有的函数则相反,不同的记录对应着不同的窗口,称为滑动窗口
。
(2)窗口函数与普通函数的区别与联系
①聚合函数是将多条记录聚合为一条;窗口函数是每条记录都会执行,有几条记录执行完还是几条。
②聚合函数也可以用于窗口函数。
(3)基本用法
<窗口函数> over ([partition by <列清单>]
order by <排序用列清单>)
over关键字用来指定函数执行的窗口范围,若后面括号什么都不写,则意味着窗口包含满足WHERE条件的所有行,窗口函数基于所有行进行计算;如不为空,则支持以下四种语法来设置窗口。
①window_name:给窗口指定一个别名。如果SQL中涉及的窗口较多,采用别名可以看起来更清晰易读;
②PARTITION BY 子句:窗口按照哪些字段进行分组,窗口函数在不同的分组上分别执行;
③ORDER BY子句:按照哪些字段进行排序,窗口函数将按照排序后的记录顺序进行编号;
④FRAME子句:FRAME是当前分区的一个子集,子句用来定义子集的规则,通常用来作为滑动窗口使用。
(4)窗口函数分类
ROW_NUMBER()、RANK()
、DENSE_RANK()
① ROW_NUMBER()
SELECT * FROM (
SELECT stu_id,
ROW_NUMBER() OVER (PARTITION BY stu_id ORDER BY score DESC) AS score_order,
lesson_id, score
FROM t_score) t
WHERE score_order <= 3;
+--------+-------------+-----------+-------+
| stu_id | score_order | lesson_id | score |
+--------+-------------+-----------+-------+
| 1 | 1 | L005 | 98 |
| 1 | 2 | L001 | 98 |
| 1 | 3 | L004 | 88 |
| 2 | 1 | L002 | 90 |
| 2 | 2 | L003 | 86 |
| 2 | 3 | L001 | 84 |
| 3 | 1 | L001 | 100 |
| 3 | 2 | L002 | 91 |
| 3 | 3 | L003 | 85 |
| 4 | 1 | L001 | 99 |
| 4 | 2 | L005 | 98 |
| 4 | 3 | L002 | 88 |
+--------+-------------+-----------+-------+
② RANK()
、DENSE_RANK()
rank()/dense_rank():与ROW_NUMBER()
非常类似,只是在出现重复值时处理逻辑有所不同。下面我们查看学号stu_id=1使用不同序号函数的排序结果。
SELECT *
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY stu_id ORDER BY score DESC) AS score_row_number,
RANK() OVER (PARTITION BY stu_id ORDER BY score DESC) AS score_rank,
DENSE_RANK() OVER (PARTITION BY stu_id ORDER BY score DESC) AS score_dense_rank,
stu_id,
lesson_lid,
score
FROM t_score) t
WHERE stu_id = 1
AND score_order1 <= 3
AND score_order2 <= 3
AND score_order3 <= 3
;
+------------------+------------+------------------+--------+-----------+-------+
| score_row_number | score_rank | score_dense_rank | stu_id | lesson_id | score |
+------------------+------------+------------------+--------+-----------+-------+
| 1 | 1 | 1 | 1 | L005 | 98 |
| 2 | 1 | 1 | 1 | L001 | 98 |
| 3 | 3 | 2 | 1 | L004 | 88 |
+------------------+------------+------------------+--------+-----------+-------+
当成绩相同时,会存在并列情况,主要区别是三个函数如何处理并列情况:
PERCENT_RANK()
、CUME_DIST()
① PERCENT_RANK()
(rank-1) / (rows-1)
进行计算。其中,rank
为RANK()函数
产生的序号,rows
为当前窗口的记录总行数SELECT RANK() OVER w AS rk,
PERCENT_RANK() OVER w AS prk,
stu_id,
lesson_id,
score
FROM t_score
WHERE stu_id = 1
WINDOW w AS (PARTITION BY stu_id ORDER BY score)
;
+----+------+--------+-----------+-------+
| rk | prk | stu_id | lesson_id | score |
+----+------+--------+-----------+-------+
| 1 | 0 | 1 | L003 | 79 |
| 2 | 0.25 | 1 | L002 | 86 |
| 3 | 0.5 | 1 | L004 | 88 |
| 4 | 0.75 | 1 | L005 | 98 |
| 4 | 0.75 | 1 | L001 | 98 |
+----+------+--------+-----------+-------+
② CUME_DIST()
SELECT stu_id,
lesson_id,
score,
CUME_DIST() OVER (ORDER BY score) AS cd1,
CUME_DIST() OVER (PARTITION BY lesson_id ORDER BY score) AS cd2
FROM t_score
WHERE lesson_id IN ('L001', 'L002')
;
+--------+-----------+-------+-------+------+
| stu_id | lesson_id | score | cd1 | cd2 |
+--------+-----------+-------+-------+------+
| 2 | L001 | 84 | 0.125 | 0.25 |
| 1 | L001 | 98 | 0.75 | 0.5 |
| 4 | L001 | 99 | 0.875 | 0.75 |
| 3 | L001 | 100 | 1 | 1 |
| 1 | L002 | 86 | 0.25 | 0.25 |
| 4 | L002 | 88 | 0.375 | 0.5 |
| 2 | L002 | 90 | 0.5 | 0.75 |
| 3 | L002 | 91 | 0.625 | 1 |
+--------+-----------+-------+-------+------+
其中,cd1将整张表作为窗口,按成绩排序(默认升序),排序后cd1列的值依次为0-1;然后cd2按照lesson_id分组score排序,打乱了cd1的排列顺序,得到上图结果。
LAG(expr,n)
、LEAD(expr,n)
LAG(expr,n)
)或后n行(LEAD(expr,n)
)的expr的值内层SQL先通过
LAG()函数
得到前1名同学的成绩,外层SQL再将当前同学和前1名同学的成绩做差得到成绩差值diff
。
SELECT stu_id, lesson_id, score, pre_score,score-pre_score AS diff
FROM(
SELECT stu_id, lesson_id, score,
LAG(score,1) OVER w AS pre_score
FROM t_score
WHERE lesson_id IN ('L001','L002')
WINDOW w AS (PARTITION BY lesson_id ORDER BY score)) t;
+--------+-----------+-------+-----------+------+
| stu_id | lesson_id | score | pre_score | diff |
+--------+-----------+-------+-----------+------+
| 2 | L001 | 84 | NULL | NULL |
| 1 | L001 | 98 | 84 | 14 |
| 4 | L001 | 99 | 98 | 1 |
| 3 | L001 | 100 | 99 | 1 |
| 1 | L002 | 86 | NULL | NULL |
| 4 | L002 | 88 | 86 | 2 |
| 2 | L002 | 90 | 88 | 2 |
| 3 | L002 | 91 | 90 | 1 |
+--------+-----------+-------+-----------+------+
FIRST_VALUE(expr)
、LAST_VALUE(expr)
FIRST_VALUE(expr)
)或最后一个(LAST_VALUE(expr)
)expr的值- 添加新列:
ALTER TABLE t_score ADD create_time DATE;
SELECT stu_id, lesson_id, score, create_time,
FIRST_VALUE(score) OVER w AS first_score,
LAST_VALUE(score) OVER w AS last_score
FROM t_score
WHERE lesson_id IN ('L001','L002')
WINDOW w AS (PARTITION BY lesson_id ORDER BY create_time);
+--------+-----------+-------+-------------+-------------+------------+
| stu_id | lesson_id | score | create_time | first_score | last_score |
+--------+-----------+-------+-------------+-------------+------------+
| 3 | L001 | 100 | 2018-08-07 | 100 | 100 |
| 1 | L001 | 98 | 2018-08-08 | 100 | 98 |
| 2 | L001 | 84 | 2018-08-09 | 100 | 84 |
| 4 | L001 | 99 | 2018-08-09 | 100 | 99 |
| 3 | L002 | 91 | 2018-08-07 | 91 | 91 |
| 1 | L002 | 86 | 2018-08-08 | 91 | 86 |
| 2 | L002 | 90 | 2018-08-09 | 91 | 90 |
| 4 | L002 | 88 | 2018-08-10 | 91 | 88 |
+--------+-----------+-------+-------------+-------------+------------+
其中,FIRST_VALUE取得的是当前窗口的第一行,LAST_VALUE取得的是当前窗口到此条记录的最后一行,即当前记录;若想取得当前窗口的最后一行记录,需添加frame子句,
rows between unbounded preceding and unbounded following。
NTH_VALUE(expr, n)
、NTILE(n)
① NTH_VALUE(expr,n)
expr
的值。expr
可以是表达式,也可以是列名SELECT stu_id, lesson_id, score,
NTH_VALUE(score,2) OVER w AS second_score,
NTH_VALUE(score,3) OVER w AS third_score
FROM t_score
WHERE stu_id IN (1,2)
WINDOW w AS (PARTITION BY stu_id ORDER BY score);
+--------+-----------+-------+--------------+-------------+
| stu_id | lesson_id | score | second_score | third_score |
+--------+-----------+-------+--------------+-------------+
| 1 | L003 | 79 | NULL | NULL |
| 1 | L002 | 86 | 86 | NULL |
| 1 | L004 | 88 | 86 | 88 |
| 1 | L001 | 98 | 86 | 88 |
| 1 | L005 | 98 | 86 | 88 |
| 2 | L004 | 75 | NULL | NULL |
| 2 | L005 | 77 | 77 | NULL |
| 2 | L001 | 84 | 77 | 84 |
| 2 | L003 | 86 | 77 | 84 |
| 2 | L002 | 90 | 77 | 84 |
+--------+-----------+-------+--------------+-------------+
② NTILE(n):分桶
SELECT
NTILE(3) OVER w AS nf,
stu_id, lesson_id, score
FROM t_score
WHERE lesson_id IN ('L001','L002')
WINDOW w AS (PARTITION BY lesson_id ORDER BY score);
+------+--------+-----------+-------+
| nf | stu_id | lesson_id | score |
+------+--------+-----------+-------+
| 1 | 2 | L001 | 84 |
| 1 | 1 | L001 | 98 |
| 2 | 4 | L001 | 99 |
| 3 | 3 | L001 | 100 |
| 1 | 1 | L002 | 86 |
| 1 | 4 | L002 | 88 |
| 2 | 2 | L002 | 90 |
| 3 | 3 | L002 | 91 |
+------+--------+-----------+-------+
NTILE(n)函数
在数据分析中应用较多,比如由于数据量大,需要将数据平均分配到n个并行的进程分别计算,此时就可以用NTILE(n)
对数据进行分组(由于记录数不一定被n整除,所以数据不一定完全平均),然后将不同桶号的数据再分配。
SUM()
、AVG()
、MAX()
、MIN()
、COUNT()
SUM()
、AVG()
、MAX()
、MIN()
、COUNT()
),可以动态计算在指定的窗口内的各种聚合函数值stu_id=1
的累计分数/平均分数/最高分数/最低分数/成绩记录条数SELECT stu_id, lesson_id, score, create_time,
SUM(score) OVER w AS score_sum,
MAX(score) OVER w AS score_max,
MIN(score) OVER w AS score_min
FROM t_score
WHERE stu_id = 1
WINDOW w AS (PARTITION BY stu_id ORDER BY create_time);
+--------+-----------+-------+-------------+-----------+-----------+-----------+
| stu_id | lesson_id | score | create_time | score_sum | score_max | score_min |
+--------+-----------+-------+-------------+-----------+-----------+-----------+
| 1 | L001 | 98 | 2018-08-08 | 184 | 98 | 86 |
| 1 | L002 | 86 | 2018-08-08 | 184 | 98 | 86 |
| 1 | L003 | 79 | 2018-08-09 | 263 | 98 | 79 |
| 1 | L004 | 88 | 2018-08-10 | 449 | 98 | 79 |
| 1 | L005 | 98 | 2018-08-10 | 449 | 98 | 79 |
+--------+-----------+-------+-------------+-----------+-----------+-----------+
注意:sum作为窗口函数,当求和字段与排序字段相同时,如果有重复数据,会直接把相同的数据相加,并不是逐步相加。为什么呢?
frame是当前分区的一个子集,子句用来定义子集的规则,通常用来作为滑动窗口使用。比如要根据每个订单动态计算包括本订单和按时间顺序前后两个订单的平均订单金额,则可以设置如下frame子句来创建滑动窗口:
select * from(
select order_id,user_no,amount,
avg(amount) over w as avg_num,
create_date
from order_tabl
window w as (partition by user_no order by create_date desc rows between 1 preceding and 1 following)
)t;
+----------+---------+--------+-----------+---------------------+
| order_id | user_no | amount | avg_num | create_date |
+----------+-----------+-------+-------- -+---------------------+
| 5 | 001 | 900 | 850.0000 | 2018-01-04 00:00:00 |
| 4 | 001 | 800 | 666.6667 | 2018-01-03 00:00:00 |
| 2 | 001 | 300 | 533.3333 | 2018-01-02 00:00:00 |
| 3 | 001 | 500 | 300.0000 | 2018-01-02 00:00:00 |
| 1 | 001 | 100 | 300.0000 | 2018-01-01 00:00:00 |
| 10 | 002 | 800 | 800.0000 | 2018-01-22 00:00:00 |
| 9 | 002 | 800 | 633.3333 | 2018-01-16 00:00:00 |
| 8 | 002 | 300 | 566.6667 | 2018-01-10 00:00:00 |
| 7 | 002 | 600 | 466.6667 | 2018-01-04 00:00:00 |
| 6 | 002 | 500 | 550.0000 | 2018-01-03 00:00:00 |
+----------+---------+--------+-----------+---------------------+
从结果可以看出,order_id为5订单属于边界值,没有前一行,因此平均订单金额为(900+800)/2=850;order_id为4的订单前后都有订单,所以平均订单金额为(900+800+300)/3=666.6667,以此类推就可以得到一个基于滑动窗口的动态平均订单值。
对于滑动窗口的范围指定,有两种方式,基于行和基于范围,具体区别如下:
① 基于行:通常使用BETWEEN frame_start AND frame_end语法来表示行范围,frame_start和frame_end可以支持如下关键字,来确定不同的动态行记录:
- rows BETWEEN 1 PRECEDING AND 1 FOLLOWING 窗口范围是当前行、前一行、后一行一共三行记录。
- rows UNBOUNDED FOLLOWING 窗口范围是当前行到分区中的最后一行。
- rows BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING 窗口范围是当前分区中所有行,等同于不写。(last_value的窗口范围是窗口首行-当前行!即rows BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
② 基于范围:有些范围不可直接用行数来表示,比如窗口范围是一周前订单开始,截止到当前行,无法使用rows来表示,此时就可以使用范围来表示窗口:INTERVAL 7 DAY PRECEDING。Linux中常见的最近1分钟、5分钟负载是一个典型的应用场景。有的函数不管有没有frame子句,它的窗口都是固定的,也就是前面介绍的静态窗口,这些函数包括如下(排序/前后/分布等函数):
- DENSE_RANK()、RANK()、ROW_NUMBER()
- LAG()、LEAD()
- PERCENT_RANK()、CUME_DIST()
- NTILE()
接下来我们以会员订单表dm_order_2018为例,来介绍每个函数的使用方法。表里包含了如下字段:订单id,订单金额,会员id,订单创建时间,订单状态等。
(1)建表
CREATE TABLE dm_order_2018
(
id varchar(100) DEFAULT NULL COMMENT '订单id',
order_money float DEFAULT NULL COMMENT '订单金额',
member_id int(11) DEFAULT NULL COMMENT '会员id',
create_time timestamp DEFAULT NULL COMMENT '创建时间',
status int(11) DEFAULT NULL COMMENT '订单状态|0:取消,1:待支付,2:付款成功'
);
(2)插入数据
INSERT INTO dm_order_2018 VALUES ('A001', '100.15', '1', '2018-01-01 13:23:12', '2');
INSERT INTO dm_order_2018 VALUES ('A023', '100.15', '1', '2018-01-01 15:23:12', '2');
INSERT INTO dm_order_2018 VALUES ('A002', '100', '2', '2018-04-01 14:23:12', '2');
INSERT INTO dm_order_2018 VALUES ('A003', '12.1', '3', '2018-01-01 13:23:12', '2');
INSERT INTO dm_order_2018 VALUES ('A004', '200.15', '4', '2018-01-01 13:23:12', '2');
INSERT INTO dm_order_2018 VALUES ('A005', '1200.15', '5', '2018-01-01 13:23:12', '2');
INSERT INTO dm_order_2018 VALUES ('A006', '0.15', '11', '2018-01-01 17:23:12', '2');
INSERT INTO dm_order_2018 VALUES ('A007', '1215.1', '31', '2018-01-01 10:23:12', '2');
INSERT INTO dm_order_2018 VALUES ('A008', '100.75', '15', '2018-01-01 14:23:12', '2');
INSERT INTO dm_order_2018 VALUES ('A009', '100.15', '8', '2018-01-01 15:23:12', '2');
INSERT INTO dm_order_2018 VALUES ('A010', '20.15', '9', '2018-01-01 16:23:12', '2');
INSERT INTO dm_order_2018 VALUES ('A011', '110.15', '21', '2018-01-01 14:23:12', '2');
INSERT INTO dm_order_2018 VALUES ('A012', '1.15', '13', '2018-01-01 13:03:12', '2');
INSERT INTO dm_order_2018 VALUES ('A013', '20.15', '14', '2018-02-01 13:53:12', '2');
INSERT INTO dm_order_2018 VALUES ('A014', '30.15', '15', '2018-03-01 13:23:12', '2');
INSERT INTO dm_order_2018 VALUES ('A015', '40.5', '13', '2018-01-01 13:23:12', '2');
INSERT INTO dm_order_2018 VALUES ('A016', '65', '1', '2017-01-01 13:23:12', '2');
INSERT INTO dm_order_2018 VALUES ('A017', '78.15', '1', '2017-03-01 09:23:12', '1');
INSERT INTO dm_order_2018 VALUES ('A018', '88.15', '1', '2017-11-01 20:23:12', '2');
INSERT INTO dm_order_2018 VALUES ('A019', '99', '12', '2018-02-11 13:23:12', '1');
INSERT INTO dm_order_2018 VALUES ('A020', '10.3', '13', '2018-04-01 13:23:12', '2');
INSERT INTO dm_order_2018 VALUES ('A021', '600', '5', '2018-11-01 08:58:31', '2');
INSERT INTO dm_order_2018 VALUES ('A022', '500', '31', '2018-11-11 08:59:02', '2');
(3)查看数据
select * FROM dm_order_2018;
+----+-----------+---------+-------------------+------+
|id |order_money|member_id|create_time |status|
+----+-----------+---------+-------------------+------+
|A022|500 |31 |2018-11-11 08:59:02|2 |
|A001|100.15 |1 |2018-01-01 13:23:12|2 |
|A023|100.15 |1 |2018-01-01 15:23:12|2 |
|A002|100 |2 |2018-04-01 14:23:12|2 |
|A003|12.1 |3 |2018-01-01 13:23:12|2 |
|A004|200.15 |4 |2018-01-01 13:23:12|2 |
|A005|1200.15 |5 |2018-01-01 13:23:12|2 |
|A006|0.15 |11 |2018-01-01 17:23:12|2 |
|A007|1215.1 |31 |2018-01-01 10:23:12|2 |
|A008|100.75 |15 |2018-01-01 14:23:12|2 |
|A009|100.15 |8 |2018-01-01 15:23:12|2 |
|A010|20.15 |9 |2018-01-01 16:23:12|2 |
|A011|110.15 |21 |2018-01-01 14:23:12|2 |
|A012|1.15 |13 |2018-01-01 13:03:12|2 |
|A013|20.15 |14 |2018-02-01 13:53:12|2 |
|A014|30.15 |15 |2018-03-01 13:23:12|2 |
|A015|40.5 |13 |2018-01-01 13:23:12|2 |
|A016|65 |1 |2017-01-01 13:23:12|2 |
|A017|78.15 |1 |2017-03-01 09:23:12|1 |
|A018|88.15 |1 |2017-11-01 20:23:12|2 |
|A019|99 |12 |2018-02-11 13:23:12|1 |
|A020|10.3 |13 |2018-04-01 13:23:12|2 |
|A021|600 |5 |2018-11-01 08:58:31|2 |
+----+-----------+---------+-------------------+------+
(1)序号函数:row_number() / rank() / dense_rank()
查询排序小于等于3的订单:
select *
FROM (
select id,
order_money,
member_id,
create_time,
row_number() over (partition by member_id ORDER BY order_money desc) as row_num
from dm_order_2018
) t
where row_num <= 3;
+----+-----------+---------+-------------------+-------+
|id |order_money|member_id|create_time |row_num|
+----+-----------+---------+-------------------+-------+
|A001|100.15 |1 |2018-01-01 13:23:12|1 |
|A023|100.15 |1 |2018-01-01 15:23:12|2 |
|A018|88.15 |1 |2017-11-01 20:23:12|3 |
|A002|100 |2 |2018-04-01 14:23:12|1 |
|A003|12.1 |3 |2018-01-01 13:23:12|1 |
|A004|200.15 |4 |2018-01-01 13:23:12|1 |
|A005|1200.15 |5 |2018-01-01 13:23:12|1 |
|A021|600 |5 |2018-11-01 08:58:31|2 |
|A009|100.15 |8 |2018-01-01 15:23:12|1 |
|A010|20.15 |9 |2018-01-01 16:23:12|1 |
|A006|0.15 |11 |2018-01-01 17:23:12|1 |
|A019|99 |12 |2018-02-11 13:23:12|1 |
|A015|40.5 |13 |2018-01-01 13:23:12|1 |
|A020|10.3 |13 |2018-04-01 13:23:12|2 |
|A012|1.15 |13 |2018-01-01 13:03:12|3 |
|A013|20.15 |14 |2018-02-01 13:53:12|1 |
|A008|100.75 |15 |2018-01-01 14:23:12|1 |
|A014|30.15 |15 |2018-03-01 13:23:12|2 |
|A011|110.15 |21 |2018-01-01 14:23:12|1 |
|A007|1215.1 |31 |2018-01-01 10:23:12|1 |
|A022|500 |31 |2018-11-11 08:59:02|2 |
+----+-----------+---------+-------------------+-------+
多种排序方式:
select *
FROM (select id,
order_money,
member_id,
create_time,
row_number() over (partition by member_id ORDER BY order_money desc) as row_num1,
rank() over (partition by member_id ORDER BY order_money desc) as row_num2,
dense_rank() over (partition by member_id ORDER BY order_money desc) as row_num3
from dm_order_2018
) t;
+----+-----------+---------+-------------------+--------+--------+--------+
|id |order_money|member_id|create_time |row_num1|row_num2|row_num3|
+----+-----------+---------+-------------------+--------+--------+--------+
|A001|100.15 |1 |2018-01-01 13:23:12|1 |1 |1 |
|A023|100.15 |1 |2018-01-01 15:23:12|2 |1 |1 |
|A018|88.15 |1 |2017-11-01 20:23:12|3 |3 |2 |
|A017|78.15 |1 |2017-03-01 09:23:12|4 |4 |3 |
|A016|65 |1 |2017-01-01 13:23:12|5 |5 |4 |
|A002|100 |2 |2018-04-01 14:23:12|1 |1 |1 |
|A003|12.1 |3 |2018-01-01 13:23:12|1 |1 |1 |
|A004|200.15 |4 |2018-01-01 13:23:12|1 |1 |1 |
|A005|1200.15 |5 |2018-01-01 13:23:12|1 |1 |1 |
|A021|600 |5 |2018-11-01 08:58:31|2 |2 |2 |
|A009|100.15 |8 |2018-01-01 15:23:12|1 |1 |1 |
|A010|20.15 |9 |2018-01-01 16:23:12|1 |1 |1 |
|A006|0.15 |11 |2018-01-01 17:23:12|1 |1 |1 |
|A019|99 |12 |2018-02-11 13:23:12|1 |1 |1 |
|A015|40.5 |13 |2018-01-01 13:23:12|1 |1 |1 |
|A020|10.3 |13 |2018-04-01 13:23:12|2 |2 |2 |
|A012|1.15 |13 |2018-01-01 13:03:12|3 |3 |3 |
|A013|20.15 |14 |2018-02-01 13:53:12|1 |1 |1 |
|A008|100.75 |15 |2018-01-01 14:23:12|1 |1 |1 |
|A014|30.15 |15 |2018-03-01 13:23:12|2 |2 |2 |
|A011|110.15 |21 |2018-01-01 14:23:12|1 |1 |1 |
|A007|1215.1 |31 |2018-01-01 10:23:12|1 |1 |1 |
|A022|500 |31 |2018-11-11 08:59:02|2 |2 |2 |
+----+-----------+---------+-------------------+--------+--------+--------+
rank()会产生序号相同的记录,同时可能产生序号间隙,如1,1,3;而dense_rank()也会产生序号相同的记录,但不会产生序号间隙,如1,1,2。
(2)分布函数:percent_rank() / cume_dist()
查看数据的分位数:
select *
FROM (select id,
order_money,
member_id,
create_time,
row_number() over w as row_num,
percent_rank() over w as percent
from dm_order_2018
window w as (partition by member_id ORDER BY create_time DESC)
) t;
+----+-----------+---------+-------------------+-------+-------+
|id |order_money|member_id|create_time |row_num|percent|
+----+-----------+---------+-------------------+-------+-------+
|A023|100.15 |1 |2018-01-01 15:23:12|1 |0 |
|A001|100.15 |1 |2018-01-01 13:23:12|2 |0.25 |
|A018|88.15 |1 |2017-11-01 20:23:12|3 |0.5 |
|A017|78.15 |1 |2017-03-01 09:23:12|4 |0.75 |
|A016|65 |1 |2017-01-01 13:23:12|5 |1 |
|A002|100 |2 |2018-04-01 14:23:12|1 |0 |
|A003|12.1 |3 |2018-01-01 13:23:12|1 |0 |
|A004|200.15 |4 |2018-01-01 13:23:12|1 |0 |
|A021|600 |5 |2018-11-01 08:58:31|1 |0 |
|A005|1200.15 |5 |2018-01-01 13:23:12|2 |1 |
|A009|100.15 |8 |2018-01-01 15:23:12|1 |0 |
|A010|20.15 |9 |2018-01-01 16:23:12|1 |0 |
|A006|0.15 |11 |2018-01-01 17:23:12|1 |0 |
|A019|99 |12 |2018-02-11 13:23:12|1 |0 |
|A020|10.3 |13 |2018-04-01 13:23:12|1 |0 |
|A015|40.5 |13 |2018-01-01 13:23:12|2 |0.5 |
|A012|1.15 |13 |2018-01-01 13:03:12|3 |1 |
|A013|20.15 |14 |2018-02-01 13:53:12|1 |0 |
|A014|30.15 |15 |2018-03-01 13:23:12|1 |0 |
|A008|100.75 |15 |2018-01-01 14:23:12|2 |1 |
|A011|110.15 |21 |2018-01-01 14:23:12|1 |0 |
|A022|500 |31 |2018-11-11 08:59:02|1 |0 |
|A007|1215.1 |31 |2018-01-01 10:23:12|2 |1 |
+----+-----------+---------+-------------------+-------+-------+
大于等于当前订单金额的订单比例有多少:
select *
FROM (select id,
order_money,
member_id,
create_time,
row_number() over w as row_num,
percent_rank() over w as percent
from dm_order_2018
window w as (partition by member_id ORDER BY create_time DESC)
) t;
+----+-----------+---------+-------------------+-------+-------+
|id |order_money|member_id|create_time |row_num|percent|
+----+-----------+---------+-------------------+-------+-------+
|A023|100.15 |1 |2018-01-01 15:23:12|1 |0 |
|A001|100.15 |1 |2018-01-01 13:23:12|2 |0.25 |
|A018|88.15 |1 |2017-11-01 20:23:12|3 |0.5 |
|A017|78.15 |1 |2017-03-01 09:23:12|4 |0.75 |
|A016|65 |1 |2017-01-01 13:23:12|5 |1 |
|A002|100 |2 |2018-04-01 14:23:12|1 |0 |
|A003|12.1 |3 |2018-01-01 13:23:12|1 |0 |
|A004|200.15 |4 |2018-01-01 13:23:12|1 |0 |
|A021|600 |5 |2018-11-01 08:58:31|1 |0 |
|A005|1200.15 |5 |2018-01-01 13:23:12|2 |1 |
|A009|100.15 |8 |2018-01-01 15:23:12|1 |0 |
|A010|20.15 |9 |2018-01-01 16:23:12|1 |0 |
|A006|0.15 |11 |2018-01-01 17:23:12|1 |0 |
|A019|99 |12 |2018-02-11 13:23:12|1 |0 |
|A020|10.3 |13 |2018-04-01 13:23:12|1 |0 |
|A015|40.5 |13 |2018-01-01 13:23:12|2 |0.5 |
|A012|1.15 |13 |2018-01-01 13:03:12|3 |1 |
|A013|20.15 |14 |2018-02-01 13:53:12|1 |0 |
|A014|30.15 |15 |2018-03-01 13:23:12|1 |0 |
|A008|100.75 |15 |2018-01-01 14:23:12|2 |1 |
|A011|110.15 |21 |2018-01-01 14:23:12|1 |0 |
|A022|500 |31 |2018-11-11 08:59:02|1 |0 |
|A007|1215.1 |31 |2018-01-01 10:23:12|2 |1 |
+----+-----------+---------+-------------------+-------+-------+
(3)前后函数:lag() / lead()
-- 查询上一个订单距离当前订单的时间间隔。
select id, order_money, member_id, create_time, last_time, datediff(create_time, last_time) as diff
FROM (select id,
order_money,
member_id,
create_time,
lag(create_time, 1) over w as last_time
from dm_order_2018
window w as (partition by member_id ORDER BY create_time ASC)
) t;
+----+-----------+---------+-------------------+-------------------+----+
|id |order_money|member_id|create_time |last_time |diff|
+----+-----------+---------+-------------------+-------------------+----+
|A016|65 |1 |2017-01-01 13:23:12|NULL |NULL|
|A017|78.15 |1 |2017-03-01 09:23:12|2017-01-01 13:23:12|59 |
|A018|88.15 |1 |2017-11-01 20:23:12|2017-03-01 09:23:12|245 |
|A001|100.15 |1 |2018-01-01 13:23:12|2017-11-01 20:23:12|61 |
|A023|100.15 |1 |2018-01-01 15:23:12|2018-01-01 13:23:12|0 |
|A002|100 |2 |2018-04-01 14:23:12|NULL |NULL|
|A003|12.1 |3 |2018-01-01 13:23:12|NULL |NULL|
|A004|200.15 |4 |2018-01-01 13:23:12|NULL |NULL|
|A005|1200.15 |5 |2018-01-01 13:23:12|NULL |NULL|
|A021|600 |5 |2018-11-01 08:58:31|2018-01-01 13:23:12|304 |
|A009|100.15 |8 |2018-01-01 15:23:12|NULL |NULL|
|A010|20.15 |9 |2018-01-01 16:23:12|NULL |NULL|
|A006|0.15 |11 |2018-01-01 17:23:12|NULL |NULL|
|A019|99 |12 |2018-02-11 13:23:12|NULL |NULL|
|A012|1.15 |13 |2018-01-01 13:03:12|NULL |NULL|
|A015|40.5 |13 |2018-01-01 13:23:12|2018-01-01 13:03:12|0 |
|A020|10.3 |13 |2018-04-01 13:23:12|2018-01-01 13:23:12|90 |
|A013|20.15 |14 |2018-02-01 13:53:12|NULL |NULL|
|A008|100.75 |15 |2018-01-01 14:23:12|NULL |NULL|
|A014|30.15 |15 |2018-03-01 13:23:12|2018-01-01 14:23:12|59 |
|A011|110.15 |21 |2018-01-01 14:23:12|NULL |NULL|
|A007|1215.1 |31 |2018-01-01 10:23:12|NULL |NULL|
|A022|500 |31 |2018-11-11 08:59:02|2018-01-01 10:23:12|314 |
+----+-----------+---------+-------------------+-------------------+----+
可以看到,没有匹配到的数值默认以null填充。
(4)头尾函数:first_val() / last_val()
-- 查询截止到当前订单,按照日期排序第一个订单和最后一个订单的订单金额。
select *
FROM (select id,
order_money,
member_id,
create_time,
first_value(order_money) over w as first_money,
last_value(order_money) over w as last_money
from dm_order_2018
window w as (partition by member_id ORDER BY create_time ASC)
) t;
+----+-----------+---------+-------------------+-------------------+-------------------+
|id |order_money|member_id|create_time |first_money |last_money |
+----+-----------+---------+-------------------+-------------------+-------------------+
|A016|65 |1 |2017-01-01 13:23:12|65 |65 |
|A017|78.15 |1 |2017-03-01 09:23:12|65 |78.1500015258789 |
|A018|88.15 |1 |2017-11-01 20:23:12|65 |88.1500015258789 |
|A001|100.15 |1 |2018-01-01 13:23:12|65 |100.1500015258789 |
|A023|100.15 |1 |2018-01-01 15:23:12|65 |100.1500015258789 |
|A002|100 |2 |2018-04-01 14:23:12|100 |100 |
|A003|12.1 |3 |2018-01-01 13:23:12|12.100000381469727 |12.100000381469727 |
|A004|200.15 |4 |2018-01-01 13:23:12|200.14999389648438 |200.14999389648438 |
|A005|1200.15 |5 |2018-01-01 13:23:12|1200.1500244140625 |1200.1500244140625 |
|A021|600 |5 |2018-11-01 08:58:31|1200.1500244140625 |600 |
|A009|100.15 |8 |2018-01-01 15:23:12|100.1500015258789 |100.1500015258789 |
|A010|20.15 |9 |2018-01-01 16:23:12|20.149999618530273 |20.149999618530273 |
|A006|0.15 |11 |2018-01-01 17:23:12|0.15000000596046448|0.15000000596046448|
|A019|99 |12 |2018-02-11 13:23:12|99 |99 |
|A012|1.15 |13 |2018-01-01 13:03:12|1.149999976158142 |1.149999976158142 |
|A015|40.5 |13 |2018-01-01 13:23:12|1.149999976158142 |40.5 |
|A020|10.3 |13 |2018-04-01 13:23:12|1.149999976158142 |10.300000190734863 |
|A013|20.15 |14 |2018-02-01 13:53:12|20.149999618530273 |20.149999618530273 |
|A008|100.75 |15 |2018-01-01 14:23:12|100.75 |100.75 |
|A014|30.15 |15 |2018-03-01 13:23:12|100.75 |30.149999618530273 |
|A011|110.15 |21 |2018-01-01 14:23:12|110.1500015258789 |110.1500015258789 |
|A007|1215.1 |31 |2018-01-01 10:23:12|1215.0999755859375 |1215.0999755859375 |
|A022|500 |31 |2018-11-11 08:59:02|1215.0999755859375 |500 |
+----+-----------+---------+-------------------+-------------------+-------------------+
查询截止到当前订单,按照日期排序第一个订单和最后一个订单的订单金额。
(5)滑动窗口示例
select id,
order_money,
member_id,
create_time,
avg(order_money) over w as avg_num,
sum(order_money) over w as sum_num
from dm_order_2018
window w as (partition by member_id order by create_time desc rows between 1 preceding and 1 following)
+----+-----------+---------+-------------------+-------------------+-------------------+
|id |order_money|member_id|create_time |avg_num |sum_num |
+----+-----------+---------+-------------------+-------------------+-------------------+
|A023|100.15 |1 |2018-01-01 15:23:12|100.1500015258789 |200.3000030517578 |
|A001|100.15 |1 |2018-01-01 13:23:12|96.1500015258789 |288.4500045776367 |
|A018|88.15 |1 |2017-11-01 20:23:12|88.81666819254558 |266.4500045776367 |
|A017|78.15 |1 |2017-03-01 09:23:12|77.10000101725261 |231.3000030517578 |
|A016|65 |1 |2017-01-01 13:23:12|71.57500076293945 |143.1500015258789 |
|A002|100 |2 |2018-04-01 14:23:12|100 |100 |
|A003|12.1 |3 |2018-01-01 13:23:12|12.100000381469727 |12.100000381469727 |
|A004|200.15 |4 |2018-01-01 13:23:12|200.14999389648438 |200.14999389648438 |
|A021|600 |5 |2018-11-01 08:58:31|900.0750122070312 |1800.1500244140625 |
|A005|1200.15 |5 |2018-01-01 13:23:12|900.0750122070312 |1800.1500244140625 |
|A009|100.15 |8 |2018-01-01 15:23:12|100.1500015258789 |100.1500015258789 |
|A010|20.15 |9 |2018-01-01 16:23:12|20.149999618530273 |20.149999618530273 |
|A006|0.15 |11 |2018-01-01 17:23:12|0.15000000596046448|0.15000000596046448|
|A019|99 |12 |2018-02-11 13:23:12|99 |99 |
|A020|10.3 |13 |2018-04-01 13:23:12|25.40000009536743 |50.80000019073486 |
|A015|40.5 |13 |2018-01-01 13:23:12|17.31666672229767 |51.950000166893005 |
|A012|1.15 |13 |2018-01-01 13:03:12|20.82499998807907 |41.64999997615814 |
|A013|20.15 |14 |2018-02-01 13:53:12|20.149999618530273 |20.149999618530273 |
|A014|30.15 |15 |2018-03-01 13:23:12|65.44999980926514 |130.89999961853027 |
|A008|100.75 |15 |2018-01-01 14:23:12|65.44999980926514 |130.89999961853027 |
|A011|110.15 |21 |2018-01-01 14:23:12|110.1500015258789 |110.1500015258789 |
|A022|500 |31 |2018-11-11 08:59:02|857.5499877929688 |1715.0999755859375 |
|A007|1215.1 |31 |2018-01-01 10:23:12|857.5499877929688 |1715.0999755859375 |
+----+-----------+---------+-------------------+-------------------+-------------------+
从结果可以看出,id为A001订单属于边界值,没有前一行,因此平均订单金额为(100.15+100.15)/2=100.15;第二行id为A001的订单前后都有订单,所以平均订单金额为(100.15+100.15+88.15/3=96.15;以此类推就可以得到一个基于滑动窗口的动态平均订单值。此例中,窗口函数用到了传统的聚合函数avg(),用来计算动态的平均值。
SELECT* FROM score AS a WHERE 成绩=( SELECT MAX(成绩) FROM score AS b WHERE a.`学号`=b.`学号`);
SELECT* FROM score AS a WHERE 成绩=( SELECT MIN(成绩) FROM score AS b WHERE a.`学号`=b.`学号`);
SELECT * FROM (SELECT*,row_number() over (PARTITION BY 姓名 ORDER BY 成绩 DESC) AS ranking FROM test1) AS newtest WHERE ranking<=2;
说明:
SELECT * FROM (SELECT*,row_number() over (PARTITION BY 姓名 ORDER BY 成绩 DESC) AS ranking FROM test1) AS newtest WHERE ranking<=N;
聚合函数作为窗口函数,是起到"累加/累计"的效果,比如截止到本行,最大值,最小值是多少。与专用窗口函数的区别:括号中需要有指定列,不能为空
举例1:查询成绩的累加
SELECT*, SUM(成绩) over (ORDER BY 成绩 DESC) AS '求和' FROM class;
举例2:查找单科成绩高于该科目平均成绩的学生名单
(1)解法一:单科成绩需按科目分组,分组有两种:group by子句和窗口函数的partition by;要求平均值,通过avg()实现,并且找到大于平均值的学生,那么不能减少行数,则用partition by;
#错误解法:不能用where 成绩>分组平均值,因为where子句在select子句之后执行
select * ,avg(成绩) over (partition by 科目) as 分组平均值 from test3 where 成绩>分组平均值
#正确解法:
select * from (select * ,avg(成绩) over (partition by 科目) as 分组平均值 from test3)as a where 成绩>分组平均值;
(2)解法二:关联子查询;
SELECT* FROM test3 AS a WHERE 成绩>(SELECT AVG(成绩) AS 平均值 FROM test3 AS b WHERE a.`科目`=b.`科目`);
1. 语法:窗口函数 over (partition by 用于分列的列名 order by 用于排序的列名)
2. 功能:既能分组又可以排序,且不改变行数
3. 分类:序号函数、分布函数、前后函数、头尾函数、聚合函数、其他函数等
4. 注意事项:一般写在select子句中
今天分享下MySQL中的窗口函数的实际操作,这对分析查询很有帮助,希望对大家有帮助。谢谢支持
欢迎关注微信公众号,访问更多精彩:数据星球DataStar。
如需转载,请联系授权,谢谢合作。
引文链接:
[1]MySQL操作实战(二):窗口函数
https://blog.csdn.net/weixin_39010770/article/details/87862407
[2]数据分析系列 18/32 | MySQL中的数据分析函数实战
https://zhuanlan.zhihu.com/p/137711813
[3]MySQL 8.0窗口函数
https://www.cnblogs.com/DataArt/p/9961676.html