MySQL8.0开始像Oracle一样支持窗口函数了,先来跑个SQL,感受一下窗口函数的魅力。
mysql> create table numbers(val int);
mysql> insert into numbers values(1),(1),(2),(3),(3),(3),(4),(4),(5);
mysql> SELECT
-> val,
-> ROW_NUMBER() OVER w AS 'row_number',
-> RANK() OVER w AS 'rank',
-> DENSE_RANK() OVER w AS 'dense_rank',
-> CUME_DIST() OVER w AS 'cume_dist',
-> PERCENT_RANK() OVER w AS 'percent_rank'
-> FROM numbers
-> WINDOW w AS (ORDER BY val);
+------+------------+------+------------+--------------------+--------------+
| val | row_number | rank | dense_rank | cume_dist | percent_rank |
+------+------------+------+------------+--------------------+--------------+
| 1 | 1 | 1 | 1 | 0.2222222222222222 | 0 |
| 1 | 2 | 1 | 1 | 0.2222222222222222 | 0 |
| 2 | 3 | 3 | 2 | 0.3333333333333333 | 0.25 |
| 3 | 4 | 4 | 3 | 0.6666666666666666 | 0.375 |
| 3 | 5 | 4 | 3 | 0.6666666666666666 | 0.375 |
| 3 | 6 | 4 | 3 | 0.6666666666666666 | 0.375 |
| 4 | 7 | 7 | 4 | 0.8888888888888888 | 0.75 |
| 4 | 8 | 7 | 4 | 0.8888888888888888 | 0.75 |
| 5 | 9 | 9 | 5 | 1 | 1 |
+------+------------+------+------------+--------------------+--------------+
9 rows in set (0.00 sec)
有了窗口函数,以前一些复制的SQL,可以很简单的实现了,下面分别介绍几种常用的窗口函数。
row_number()
用来求满足特定分组的唯一行数据,它跟group by不同,group by不能返回重复的分组。
例:求每个部门dept_no员工号emp_no最大的员工数据
(1)方法1:使用窗口函数
第一步:
mysql> select t.*,row_number() over(partition by t.dept_no order by t.emp_no desc) rn from t_group t;
+--------+---------+------------+------------+----+
| emp_no | dept_no | from_date | to_date | rn |
+--------+---------+------------+------------+----+
| 31112 | d002 | 1986-12-01 | 1993-12-10 | 1 |
| 10004 | d004 | 1986-12-01 | 9999-01-01 | 1 |
| 50449 | d005 | 1986-12-01 | 9999-01-01 | 1 |
| 40983 | d005 | 1986-12-01 | 9999-01-01 | 2 |
| 30970 | d005 | 1986-12-01 | 2017-03-29 | 3 |
| 24007 | d005 | 1986-12-01 | 9999-01-01 | 4 |
| 22744 | d006 | 1986-12-01 | 9999-01-01 | 1 |
| 49667 | d007 | 1986-12-01 | 9999-01-01 | 1 |
| 48317 | d008 | 1986-12-01 | 1989-01-11 | 1 |
| 46554 | d008 | 1986-12-01 | 1992-05-27 | 2 |
+--------+---------+------------+------------+----+
10 rows in set (0.01 sec)
第二步:
mysql> select a.* from (select t.*,row_number() over(partition by t.dept_no order by t.emp_no desc) rn from t_group t)a where a.rn=1;
+--------+---------+------------+------------+----+
| emp_no | dept_no | from_date | to_date | rn |
+--------+---------+------------+------------+----+
| 31112 | d002 | 1986-12-01 | 1993-12-10 | 1 |
| 10004 | d004 | 1986-12-01 | 9999-01-01 | 1 |
| 50449 | d005 | 1986-12-01 | 9999-01-01 | 1 |
| 22744 | d006 | 1986-12-01 | 9999-01-01 | 1 |
| 49667 | d007 | 1986-12-01 | 9999-01-01 | 1 |
| 48317 | d008 | 1986-12-01 | 1989-01-11 | 1 |
+--------+---------+------------+------------+----+
6 rows in set (0.00 sec)
(2)方法2:使用传统变量@方法
第一步:
mysql> select t.*, if(@dept_no=t.dept_no, @rn:=@rn+1, @rn:=1)as rn, @dept_no:=t.dept_no as calc_dept_no from (select * from t_group t order by t.dept_no,t.emp_no desc)t, (select @rn:=0 rn, @dept_no:='')b;
+--------+---------+------------+------------+------+--------------+
| emp_no | dept_no | from_date | to_date | rn | calc_dept_no |
+--------+---------+------------+------------+------+--------------+
| 31112 | d002 | 1986-12-01 | 1993-12-10 | 1 | d002 |
| 10004 | d004 | 1986-12-01 | 9999-01-01 | 1 | d004 |
| 50449 | d005 | 1986-12-01 | 9999-01-01 | 1 | d005 |
| 40983 | d005 | 1986-12-01 | 9999-01-01 | 2 | d005 |
| 30970 | d005 | 1986-12-01 | 2017-03-29 | 3 | d005 |
| 24007 | d005 | 1986-12-01 | 9999-01-01 | 4 | d005 |
| 22744 | d006 | 1986-12-01 | 9999-01-01 | 1 | d006 |
| 49667 | d007 | 1986-12-01 | 9999-01-01 | 1 | d007 |
| 48317 | d008 | 1986-12-01 | 1989-01-11 | 1 | d008 |
| 46554 | d008 | 1986-12-01 | 1992-05-27 | 2 | d008 |
+--------+---------+------------+------------+------+--------------+
10 rows in set, 5 warnings (0.00 sec)
第二步:
mysql> select * from (
-> select t.*, if(@dept_no=t.dept_no, @rn:=@rn+1, @rn:=1)as rn, @dept_no:=t.dept_no as calc_dept_no from (select * from t_group t order by t.dept_no,t.emp_no desc)t, (select @rn:=0 rn, @dept_no:='')b)c where c.rn=1;
+--------+---------+------------+------------+------+--------------+
| emp_no | dept_no | from_date | to_date | rn | calc_dept_no |
+--------+---------+------------+------------+------+--------------+
| 31112 | d002 | 1986-12-01 | 1993-12-10 | 1 | d002 |
| 10004 | d004 | 1986-12-01 | 9999-01-01 | 1 | d004 |
| 50449 | d005 | 1986-12-01 | 9999-01-01 | 1 | d005 |
| 22744 | d006 | 1986-12-01 | 9999-01-01 | 1 | d006 |
| 49667 | d007 | 1986-12-01 | 9999-01-01 | 1 | d007 |
| 48317 | d008 | 1986-12-01 | 1989-01-11 | 1 | d008 |
+--------+---------+------------+------------+------+--------------+
6 rows in set, 5 warnings (0.00 sec)
rank()
同一个分组中,如果排序的列值相同,编号相同;
列值不同,编号跳跃增加,如两个并列第1,那么下一个名次的编号就是3;
如下按照dept_no分组,dept_no='d005'列对应to_date='9999-01-01'有三个1,那么to_date='2017-03-29'跳跃增加的编号就是4;
mysql> select t.*, rank() over(partition by t.dept_no order by t.to_date desc) rn from t_group t;
+--------+---------+------------+------------+----+
| emp_no | dept_no | from_date | to_date | rn |
+--------+---------+------------+------------+----+
| 31112 | d002 | 1986-12-01 | 1993-12-10 | 1 |
| 10004 | d004 | 1986-12-01 | 9999-01-01 | 1 |
| 24007 | d005 | 1986-12-01 | 9999-01-01 | 1 |
| 40983 | d005 | 1986-12-01 | 9999-01-01 | 1 |
| 50449 | d005 | 1986-12-01 | 9999-01-01 | 1 |
| 30970 | d005 | 1986-12-01 | 2017-03-29 | 4 |
| 22744 | d006 | 1986-12-01 | 9999-01-01 | 1 |
| 49667 | d007 | 1986-12-01 | 9999-01-01 | 1 |
| 46554 | d008 | 1986-12-01 | 1992-05-27 | 1 |
| 48317 | d008 | 1986-12-01 | 1989-01-11 | 2 |
+--------+---------+------------+------------+----+
10 rows in set (0.00 sec)
dense_rank()
同一个分组中,如果排序的列值相同,编号相同;
列值不同,编号顺序增加,如两个并列第1,那么下一个名次的编号就是2;
mysql> select t.*, dense_rank() over(partition by t.dept_no order by t.to_date desc) rn from t_group t;
+--------+---------+------------+------------+----+
| emp_no | dept_no | from_date | to_date | rn |
+--------+---------+------------+------------+----+
| 31112 | d002 | 1986-12-01 | 1993-12-10 | 1 |
| 10004 | d004 | 1986-12-01 | 9999-01-01 | 1 |
| 24007 | d005 | 1986-12-01 | 9999-01-01 | 1 |
| 40983 | d005 | 1986-12-01 | 9999-01-01 | 1 |
| 50449 | d005 | 1986-12-01 | 9999-01-01 | 1 |
| 30970 | d005 | 1986-12-01 | 2017-03-29 | 2 |
| 22744 | d006 | 1986-12-01 | 9999-01-01 | 1 |
| 49667 | d007 | 1986-12-01 | 9999-01-01 | 1 |
| 46554 | d008 | 1986-12-01 | 1992-05-27 | 1 |
| 48317 | d008 | 1986-12-01 | 1989-01-11 | 2 |
+--------+---------+------------+------------+----+
10 rows in set (0.00 sec)
lag(expr,offset,default) over() 返回当前行列对应的上一行数据
lead(expr,offset,default) over() 返回当前行列对应的下一行数据
mysql> select s.*, lag(s.salary,1,null) over(partition by s.emp_no order by s.to_date asc) rn from salaries s where s.emp_no=10001;
+--------+--------+------------+------------+-------+
| emp_no | salary | from_date | to_date | rn |
+--------+--------+------------+------------+-------+
| 10001 | 60117 | 1986-06-26 | 1987-06-26 | NULL |
| 10001 | 62102 | 1987-06-26 | 1988-06-25 | 60117 |
| 10001 | 66074 | 1988-06-25 | 1989-06-25 | 62102 |
| 10001 | 66596 | 1989-06-25 | 1990-06-25 | 66074 |
| 10001 | 66961 | 1990-06-25 | 1991-06-25 | 66596 |
| 10001 | 71046 | 1991-06-25 | 1992-06-24 | 66961 |
| 10001 | 74333 | 1992-06-24 | 1993-06-24 | 71046 |
| 10001 | 75286 | 1993-06-24 | 1994-06-24 | 74333 |
| 10001 | 75994 | 1994-06-24 | 1995-06-24 | 75286 |
| 10001 | 76884 | 1995-06-24 | 1996-06-23 | 75994 |
| 10001 | 80013 | 1996-06-23 | 1997-06-23 | 76884 |
| 10001 | 81025 | 1997-06-23 | 1998-06-23 | 80013 |
| 10001 | 81097 | 1998-06-23 | 1999-06-23 | 81025 |
| 10001 | 84917 | 1999-06-23 | 2000-06-22 | 81097 |
| 10001 | 85112 | 2000-06-22 | 2001-06-22 | 84917 |
| 10001 | 85097 | 2001-06-22 | 2002-06-22 | 85112 |
| 10001 | 88958 | 2002-06-22 | 9999-01-01 | 85097 |
+--------+--------+------------+------------+-------+
17 rows in set (0.00 sec)
mysql> select s.*, lead(s.salary,1,0) over(partition by s.emp_no order by s.to_date asc) rn from salaries s where s.emp_no=10001;
+--------+--------+------------+------------+-------+
| emp_no | salary | from_date | to_date | rn |
+--------+--------+------------+------------+-------+
| 10001 | 60117 | 1986-06-26 | 1987-06-26 | 62102 |
| 10001 | 62102 | 1987-06-26 | 1988-06-25 | 66074 |
| 10001 | 66074 | 1988-06-25 | 1989-06-25 | 66596 |
| 10001 | 66596 | 1989-06-25 | 1990-06-25 | 66961 |
| 10001 | 66961 | 1990-06-25 | 1991-06-25 | 71046 |
| 10001 | 71046 | 1991-06-25 | 1992-06-24 | 74333 |
| 10001 | 74333 | 1992-06-24 | 1993-06-24 | 75286 |
| 10001 | 75286 | 1993-06-24 | 1994-06-24 | 75994 |
| 10001 | 75994 | 1994-06-24 | 1995-06-24 | 76884 |
| 10001 | 76884 | 1995-06-24 | 1996-06-23 | 80013 |
| 10001 | 80013 | 1996-06-23 | 1997-06-23 | 81025 |
| 10001 | 81025 | 1997-06-23 | 1998-06-23 | 81097 |
| 10001 | 81097 | 1998-06-23 | 1999-06-23 | 84917 |
| 10001 | 84917 | 1999-06-23 | 2000-06-22 | 85112 |
| 10001 | 85112 | 2000-06-22 | 2001-06-22 | 85097 |
| 10001 | 85097 | 2001-06-22 | 2002-06-22 | 88958 |
| 10001 | 88958 | 2002-06-22 | 9999-01-01 | 0 |
+--------+--------+------------+------------+-------+
17 rows in set (0.00 sec)
sum()、min()、max()增强
rows between unbounded preceding and unbounded following 从开始到最后一行
rows between unbounded preceding and current row 从开始到目前行
mysql> select s.*,
-> sum(s.salary) over(partition by s.emp_no order by s.to_date asc rows between unbounded preceding and unbounded following) s1,
-> sum(s.salary) over(partition by s.emp_no order by s.to_date asc rows between unbounded preceding and current row) s2
-> from salaries s where s.emp_no=10001;
+--------+--------+------------+------------+---------+---------+
| emp_no | salary | from_date | to_date | s1 | s2 |
+--------+--------+------------+------------+---------+---------+
| 10001 | 60117 | 1986-06-26 | 1987-06-26 | 1281612 | 60117 |
| 10001 | 62102 | 1987-06-26 | 1988-06-25 | 1281612 | 122219 |
| 10001 | 66074 | 1988-06-25 | 1989-06-25 | 1281612 | 188293 |
| 10001 | 66596 | 1989-06-25 | 1990-06-25 | 1281612 | 254889 |
| 10001 | 66961 | 1990-06-25 | 1991-06-25 | 1281612 | 321850 |
| 10001 | 71046 | 1991-06-25 | 1992-06-24 | 1281612 | 392896 |
| 10001 | 74333 | 1992-06-24 | 1993-06-24 | 1281612 | 467229 |
| 10001 | 75286 | 1993-06-24 | 1994-06-24 | 1281612 | 542515 |
| 10001 | 75994 | 1994-06-24 | 1995-06-24 | 1281612 | 618509 |
| 10001 | 76884 | 1995-06-24 | 1996-06-23 | 1281612 | 695393 |
| 10001 | 80013 | 1996-06-23 | 1997-06-23 | 1281612 | 775406 |
| 10001 | 81025 | 1997-06-23 | 1998-06-23 | 1281612 | 856431 |
| 10001 | 81097 | 1998-06-23 | 1999-06-23 | 1281612 | 937528 |
| 10001 | 84917 | 1999-06-23 | 2000-06-22 | 1281612 | 1022445 |
| 10001 | 85112 | 2000-06-22 | 2001-06-22 | 1281612 | 1107557 |
| 10001 | 85097 | 2001-06-22 | 2002-06-22 | 1281612 | 1192654 |
| 10001 | 88958 | 2002-06-22 | 9999-01-01 | 1281612 | 1281612 |
+--------+--------+------------+------------+---------+---------+
17 rows in set (0.00 sec)
mysql> select s.*,
-> min(s.salary) over(partition by s.emp_no order by s.to_date desc rows between unbounded preceding and unbounded following) m1,
-> min(s.salary) over(partition by s.emp_no order by s.to_date desc rows between unbounded preceding and current row) m2,
-> min(s.salary) over(partition by s.emp_no) m3
-> from salaries s where s.emp_no=10001;
+--------+--------+------------+------------+-------+-------+-------+
| emp_no | salary | from_date | to_date | m1 | m2 | m3 |
+--------+--------+------------+------------+-------+-------+-------+
| 10001 | 88958 | 2002-06-22 | 9999-01-01 | 60117 | 88958 | 60117 |
| 10001 | 85097 | 2001-06-22 | 2002-06-22 | 60117 | 85097 | 60117 |
| 10001 | 85112 | 2000-06-22 | 2001-06-22 | 60117 | 85097 | 60117 |
| 10001 | 84917 | 1999-06-23 | 2000-06-22 | 60117 | 84917 | 60117 |
| 10001 | 81097 | 1998-06-23 | 1999-06-23 | 60117 | 81097 | 60117 |
| 10001 | 81025 | 1997-06-23 | 1998-06-23 | 60117 | 81025 | 60117 |
| 10001 | 80013 | 1996-06-23 | 1997-06-23 | 60117 | 80013 | 60117 |
| 10001 | 76884 | 1995-06-24 | 1996-06-23 | 60117 | 76884 | 60117 |
| 10001 | 75994 | 1994-06-24 | 1995-06-24 | 60117 | 75994 | 60117 |
| 10001 | 75286 | 1993-06-24 | 1994-06-24 | 60117 | 75286 | 60117 |
| 10001 | 74333 | 1992-06-24 | 1993-06-24 | 60117 | 74333 | 60117 |
| 10001 | 71046 | 1991-06-25 | 1992-06-24 | 60117 | 71046 | 60117 |
| 10001 | 66961 | 1990-06-25 | 1991-06-25 | 60117 | 66961 | 60117 |
| 10001 | 66596 | 1989-06-25 | 1990-06-25 | 60117 | 66596 | 60117 |
| 10001 | 66074 | 1988-06-25 | 1989-06-25 | 60117 | 66074 | 60117 |
| 10001 | 62102 | 1987-06-26 | 1988-06-25 | 60117 | 62102 | 60117 |
| 10001 | 60117 | 1986-06-26 | 1987-06-26 | 60117 | 60117 | 60117 |
+--------+--------+------------+------------+-------+-------+-------+
17 rows in set (0.00 sec)
mysql> select s.*,
-> max(s.salary) over(partition by s.emp_no order by s.to_date asc rows between unbounded preceding and unbounded following) m1,
-> max(s.salary) over(partition by s.emp_no order by s.to_date asc rows between unbounded preceding and current row) m2,
-> max(s.salary) over(partition by s.emp_no) m3
-> from salaries s where s.emp_no=10001;
+--------+--------+------------+------------+-------+-------+-------+
| emp_no | salary | from_date | to_date | m1 | m2 | m3 |
+--------+--------+------------+------------+-------+-------+-------+
| 10001 | 60117 | 1986-06-26 | 1987-06-26 | 88958 | 60117 | 88958 |
| 10001 | 62102 | 1987-06-26 | 1988-06-25 | 88958 | 62102 | 88958 |
| 10001 | 66074 | 1988-06-25 | 1989-06-25 | 88958 | 66074 | 88958 |
| 10001 | 66596 | 1989-06-25 | 1990-06-25 | 88958 | 66596 | 88958 |
| 10001 | 66961 | 1990-06-25 | 1991-06-25 | 88958 | 66961 | 88958 |
| 10001 | 71046 | 1991-06-25 | 1992-06-24 | 88958 | 71046 | 88958 |
| 10001 | 74333 | 1992-06-24 | 1993-06-24 | 88958 | 74333 | 88958 |
| 10001 | 75286 | 1993-06-24 | 1994-06-24 | 88958 | 75286 | 88958 |
| 10001 | 75994 | 1994-06-24 | 1995-06-24 | 88958 | 75994 | 88958 |
| 10001 | 76884 | 1995-06-24 | 1996-06-23 | 88958 | 76884 | 88958 |
| 10001 | 80013 | 1996-06-23 | 1997-06-23 | 88958 | 80013 | 88958 |
| 10001 | 81025 | 1997-06-23 | 1998-06-23 | 88958 | 81025 | 88958 |
| 10001 | 81097 | 1998-06-23 | 1999-06-23 | 88958 | 81097 | 88958 |
| 10001 | 84917 | 1999-06-23 | 2000-06-22 | 88958 | 84917 | 88958 |
| 10001 | 85112 | 2000-06-22 | 2001-06-22 | 88958 | 85112 | 88958 |
| 10001 | 85097 | 2001-06-22 | 2002-06-22 | 88958 | 85112 | 88958 |
| 10001 | 88958 | 2002-06-22 | 9999-01-01 | 88958 | 88958 | 88958 |
+--------+--------+------------+------------+-------+-------+-------+
17 rows in set (0.00 sec)
还有很多没有提到的窗口函数,可以参考手册。
CUME_DIST()
FIRST_VALUE()
LAST_VALUE()
NTH_VALUE()
NTILE()
参考链接
12.21.1 Window Function Descriptions