备注:测试数据库版本为MySQL 8.0
这个blog我们来聊聊MySQL高级窗口函数
窗口函数在复杂查询以及数据仓库中应用得比较频繁
与sql打交道比较多的技术人员都需要掌握
如需要scott用户下建表及录入数据语句,可参考:
scott建表及录入数据sql脚本
分析函数有3个基本组成部分:
1.分区子句
2.排序子句
3.开窗子句
function1 (argument1,argument2,..argumentN)
over w
window w as ([partition-by-clause] [order-by-clause] [windowing-clause])
窗口说明子句的语法:
默认的窗口子句是rows between unbounded preceding and current row。如果你没有显示声明窗口,就将会使用默认窗口。
并不是所有的分析函数都支持开窗子句
[rows | range] between and [end expr]
whereas
is [unbounded preceding | current row | n preceding | n following]
is [unbounded following | current row | n preceding | n following]
row_number语法:
row_number() over w
window w as (partition-clause order-by-clause)
row_number不支持开窗子句
rank、dense_rank语法同row_number语法
现在需要对分不同部门来看部门内的工资排名,且从大到小排列:
-- 可以看到deptno为30的员工工资有重复的,重复的工资为1250
-- row_number() 不关注重复的,直接排名,1-2-3-4-5-6
-- rank() 重复排名,会跳过,1-2-3-4-4-6
-- dense_rank() 重复排名,不跳过,1-2-3-4-4-5
select a.empno,
a.ename,
a.deptno,
a.sal,
row_number() over w as 'num',
rank() over w as 'rank',
dense_rank() over w as 'dense_rank'
from emp a
window w as (partition by a.deptno order by a.sal desc)
;
mysql> select a.empno,
-> a.ename,
-> a.deptno,
-> a.sal,
-> row_number() over w as 'num',
-> rank() over w as 'rank',
-> dense_rank() over w as 'dense_rank'
-> from emp a
-> window w as (partition by a.deptno order by a.sal desc)
-> ;
+-------+--------+--------+---------+-----+------+------------+
| empno | ename | deptno | sal | num | rank | dense_rank |
+-------+--------+--------+---------+-----+------+------------+
| 7839 | KING | 10 | 5000.00 | 1 | 1 | 1 |
| 7782 | CLARK | 10 | 2450.00 | 2 | 2 | 2 |
| 7934 | MILLER | 10 | 1300.00 | 3 | 3 | 3 |
| 7788 | SCOTT | 20 | 3000.00 | 1 | 1 | 1 |
| 7902 | FORD | 20 | 3000.00 | 2 | 1 | 1 |
| 7566 | JONES | 20 | 2975.00 | 3 | 3 | 2 |
| 7876 | ADAMS | 20 | 1100.00 | 4 | 4 | 3 |
| 7369 | SMITH | 20 | 800.00 | 5 | 5 | 4 |
| 7698 | BLAKE | 30 | 2850.00 | 1 | 1 | 1 |
| 7499 | ALLEN | 30 | 1600.00 | 2 | 2 | 2 |
| 7844 | TURNER | 30 | 1500.00 | 3 | 3 | 3 |
| 7521 | WARD | 30 | 1250.00 | 4 | 4 | 4 |
| 7654 | MARTIN | 30 | 1250.00 | 5 | 4 | 4 |
| 7900 | JAMES | 30 | 950.00 | 6 | 6 | 5 |
+-------+--------+--------+---------+-----+------+------------+
14 rows in set (0.01 sec)
lag语法:
lag (expression, offset, default) over w
window w as (partition-clause order-by-clause)
lag不支持开窗子句
lead同lag语法
-- 根据分组,取值上n条和下n条 如果是第一条或最后一条,就给个默认值
SELECT a.empno,
a.deptno,
a.hiredate,
a.sal,
lag(sal, 1, 0) over w as 'pre_sal',
lead(sal, 1, 0) over w as 'next_sal',
lag(sal, 2, 0) over w as 'pre2_sal',
lead(sal, 2, 0) over w as 'next_2sal'
FROM emp a
window w as (PARTITION BY a.deptno ORDER BY hiredate ASC)
;
mysql> SELECT a.empno,
-> a.deptno,
-> a.hiredate,
-> a.sal,
-> lag(sal, 1, 0) over w as 'pre_sal',
-> lead(sal, 1, 0) over w as 'next_sal',
-> lag(sal, 2, 0) over w as 'pre2_sal',
-> lead(sal, 2, 0) over w as 'next_2sal'
-> FROM emp a
-> window w as (PARTITION BY a.deptno ORDER BY hiredate ASC)
-> ;
+-------+--------+------------+---------+---------+----------+----------+-----------+
| empno | deptno | hiredate | sal | pre_sal | next_sal | pre2_sal | next_2sal |
+-------+--------+------------+---------+---------+----------+----------+-----------+
| 7782 | 10 | 1981-06-09 | 2450.00 | 0.00 | 5000.00 | 0.00 | 1300.00 |
| 7839 | 10 | 1981-11-17 | 5000.00 | 2450.00 | 1300.00 | 0.00 | 0.00 |
| 7934 | 10 | 1982-01-23 | 1300.00 | 5000.00 | 0.00 | 2450.00 | 0.00 |
| 7369 | 20 | 1980-12-17 | 800.00 | 0.00 | 2975.00 | 0.00 | 3000.00 |
| 7566 | 20 | 1981-04-02 | 2975.00 | 800.00 | 3000.00 | 0.00 | 3000.00 |
| 7902 | 20 | 1981-12-03 | 3000.00 | 2975.00 | 3000.00 | 800.00 | 1100.00 |
| 7788 | 20 | 1987-06-13 | 3000.00 | 3000.00 | 1100.00 | 2975.00 | 0.00 |
| 7876 | 20 | 1987-06-13 | 1100.00 | 3000.00 | 0.00 | 3000.00 | 0.00 |
| 7499 | 30 | 1981-02-20 | 1600.00 | 0.00 | 1250.00 | 0.00 | 2850.00 |
| 7521 | 30 | 1981-02-22 | 1250.00 | 1600.00 | 2850.00 | 0.00 | 1500.00 |
| 7698 | 30 | 1981-05-01 | 2850.00 | 1250.00 | 1500.00 | 1600.00 | 1250.00 |
| 7844 | 30 | 1981-09-08 | 1500.00 | 2850.00 | 1250.00 | 1250.00 | 950.00 |
| 7654 | 30 | 1981-09-28 | 1250.00 | 1500.00 | 950.00 | 2850.00 | 0.00 |
| 7900 | 30 | 1981-12-03 | 950.00 | 1250.00 | 0.00 | 1500.00 | 0.00 |
+-------+--------+------------+---------+---------+----------+----------+-----------+
14 rows in set (0.00 sec)
-- 没有比自己小我的我们设为AAA,没有比自己大的,我们设置为ZZZ
select deptno,
ename,
lag(ename, 1, 'AAA') over w as 'lower_name',
lead(ename, 1, 'ZZZ') over w as 'higher_name'
from emp
window w as(PARTITION BY deptno ORDER BY ename)
;
-- 部门重复的话值输出第一行的部门编号
select (case when deptno= lag(deptno,1)over w then null else deptno end) as 'deptno',
ename,
lag(ename, 1, 'AAA') over w as 'lower_name',
lead(ename, 1, 'ZZZ') over w as 'higher_name'
from emp
window w as (PARTITION BY deptno ORDER BY ename)
;
mysql> -- 没有比自己小我的我们设为AAA,没有比自己大的,我们设置为ZZZ
mysql> select deptno,
-> ename,
-> lag(ename, 1, 'AAA') over w as 'lower_name',
-> lead(ename, 1, 'ZZZ') over w as 'higher_name'
-> from emp
-> window w as(PARTITION BY deptno ORDER BY ename)
-> ;
+--------+--------+------------+-------------+
| deptno | ename | lower_name | higher_name |
+--------+--------+------------+-------------+
| 10 | CLARK | AAA | KING |
| 10 | KING | CLARK | MILLER |
| 10 | MILLER | KING | ZZZ |
| 20 | ADAMS | AAA | FORD |
| 20 | FORD | ADAMS | JONES |
| 20 | JONES | FORD | SCOTT |
| 20 | SCOTT | JONES | SMITH |
| 20 | SMITH | SCOTT | ZZZ |
| 30 | ALLEN | AAA | BLAKE |
| 30 | BLAKE | ALLEN | JAMES |
| 30 | JAMES | BLAKE | MARTIN |
| 30 | MARTIN | JAMES | TURNER |
| 30 | TURNER | MARTIN | WARD |
| 30 | WARD | TURNER | ZZZ |
+--------+--------+------------+-------------+
14 rows in set (0.00 sec)
mysql>
mysql> -- 部门重复的话值输出第一行的部门编号
mysql> select (case when deptno= lag(deptno,1)over w then null else deptno end) as 'deptno',
-> ename,
-> lag(ename, 1, 'AAA') over w as 'lower_name',
-> lead(ename, 1, 'ZZZ') over w as 'higher_name'
-> from emp
-> window w as (PARTITION BY deptno ORDER BY ename)
-> ;
+--------+--------+------------+-------------+
| deptno | ename | lower_name | higher_name |
+--------+--------+------------+-------------+
| 10 | CLARK | AAA | KING |
| NULL | KING | CLARK | MILLER |
| NULL | MILLER | KING | ZZZ |
| 20 | ADAMS | AAA | FORD |
| NULL | FORD | ADAMS | JONES |
| NULL | JONES | FORD | SCOTT |
| NULL | SCOTT | JONES | SMITH |
| NULL | SMITH | SCOTT | ZZZ |
| 30 | ALLEN | AAA | BLAKE |
| NULL | BLAKE | ALLEN | JAMES |
| NULL | JAMES | BLAKE | MARTIN |
| NULL | MARTIN | JAMES | TURNER |
| NULL | TURNER | MARTIN | WARD |
| NULL | WARD | TURNER | ZZZ |
+--------+--------+------------+-------------+
14 rows in set (0.00 sec)
first_value、last_value语法:
first_value(expression) over w
window w as (partition-clause order-by-clause windowing-clause)
last_value(expression) over w
window w as (partition-clause order-by-clause windowing-clause)
nth_value语法:
nth_value (measure, n) [ from first | from last ] [ respect nulls | ignore nulls ]
over w
window w as (partitioning-clause order-by-clause windowing-clause)
/*
需求:求每个部门工资最高的和工资最低的以及工资第二高的
*/
-- 默认不带开窗子句,从第一行到当前行
select a.empno,a.deptno,a.sal,
first_value(a.sal) over w as 'first',
last_value(a.sal) over w as 'last',
nth_value(a.sal,2) over w as 'top_2'
from emp a
window w as (partition by a.deptno order by sal)
;
-- rows between unbounded preceding and current row 从第一行到当前行
select a.empno,a.deptno,a.sal,
first_value(a.sal) over w as 'first',
last_value(a.sal) over w as 'last',
nth_value(a.sal,2) over w as 'top_2'
from emp a
window w as (partition by a.deptno order by sal rows between unbounded preceding and current row)
;
-- rows between unbounded preceding and unbounded following 从第一行到最后一行
select a.empno,a.deptno,a.sal,
first_value(a.sal) over w as 'first',
last_value(a.sal) over w as 'last',
nth_value(a.sal,2) over w as 'top_2'
from emp a
window w as (partition by a.deptno order by sal rows between unbounded preceding and unbounded following)
;
-- 1 preceding and 1 following 当前行的前一行到当前行的后一行
select a.empno,a.deptno,a.sal,
first_value(a.sal) over w as 'first',
last_value(a.sal) over w as 'last',
nth_value(a.sal,2) over w as 'top_2'
from emp a
window w as (partition by a.deptno order by sal rows between 1 preceding and 1 following)
;
mysql> -- 默认不带开窗子句,从第一行到当前行
mysql> select a.empno,a.deptno,a.sal,
-> first_value(a.sal) over w as 'first',
-> last_value(a.sal) over w as 'last',
-> nth_value(a.sal,2) over w as 'top_2'
-> from emp a
-> window w as (partition by a.deptno order by sal)
-> ;
+-------+--------+---------+---------+---------+---------+
| empno | deptno | sal | first | last | top_2 |
+-------+--------+---------+---------+---------+---------+
| 7934 | 10 | 1300.00 | 1300.00 | 1300.00 | NULL |
| 7782 | 10 | 2450.00 | 1300.00 | 2450.00 | 2450.00 |
| 7839 | 10 | 5000.00 | 1300.00 | 5000.00 | 2450.00 |
| 7369 | 20 | 800.00 | 800.00 | 800.00 | NULL |
| 7876 | 20 | 1100.00 | 800.00 | 1100.00 | 1100.00 |
| 7566 | 20 | 2975.00 | 800.00 | 2975.00 | 1100.00 |
| 7788 | 20 | 3000.00 | 800.00 | 3000.00 | 1100.00 |
| 7902 | 20 | 3000.00 | 800.00 | 3000.00 | 1100.00 |
| 7900 | 30 | 950.00 | 950.00 | 950.00 | NULL |
| 7521 | 30 | 1250.00 | 950.00 | 1250.00 | 1250.00 |
| 7654 | 30 | 1250.00 | 950.00 | 1250.00 | 1250.00 |
| 7844 | 30 | 1500.00 | 950.00 | 1500.00 | 1250.00 |
| 7499 | 30 | 1600.00 | 950.00 | 1600.00 | 1250.00 |
| 7698 | 30 | 2850.00 | 950.00 | 2850.00 | 1250.00 |
+-------+--------+---------+---------+---------+---------+
14 rows in set (0.00 sec)
mysql>
mysql> -- rows between unbounded preceding and current row 从第一行到当前行
mysql> select a.empno,a.deptno,a.sal,
-> first_value(a.sal) over w as 'first',
-> last_value(a.sal) over w as 'last',
-> nth_value(a.sal,2) over w as 'top_2'
-> from emp a
-> window w as (partition by a.deptno order by sal rows between unbounded preceding and current row)
-> ;
+-------+--------+---------+---------+---------+---------+
| empno | deptno | sal | first | last | top_2 |
+-------+--------+---------+---------+---------+---------+
| 7934 | 10 | 1300.00 | 1300.00 | 1300.00 | NULL |
| 7782 | 10 | 2450.00 | 1300.00 | 2450.00 | 2450.00 |
| 7839 | 10 | 5000.00 | 1300.00 | 5000.00 | 2450.00 |
| 7369 | 20 | 800.00 | 800.00 | 800.00 | NULL |
| 7876 | 20 | 1100.00 | 800.00 | 1100.00 | 1100.00 |
| 7566 | 20 | 2975.00 | 800.00 | 2975.00 | 1100.00 |
| 7788 | 20 | 3000.00 | 800.00 | 3000.00 | 1100.00 |
| 7902 | 20 | 3000.00 | 800.00 | 3000.00 | 1100.00 |
| 7900 | 30 | 950.00 | 950.00 | 950.00 | NULL |
| 7521 | 30 | 1250.00 | 950.00 | 1250.00 | 1250.00 |
| 7654 | 30 | 1250.00 | 950.00 | 1250.00 | 1250.00 |
| 7844 | 30 | 1500.00 | 950.00 | 1500.00 | 1250.00 |
| 7499 | 30 | 1600.00 | 950.00 | 1600.00 | 1250.00 |
| 7698 | 30 | 2850.00 | 950.00 | 2850.00 | 1250.00 |
+-------+--------+---------+---------+---------+---------+
14 rows in set (0.00 sec)
mysql>
mysql>
mysql> -- rows between unbounded preceding and unbounded following 从第一行到最后一行
mysql> select a.empno,a.deptno,a.sal,
-> first_value(a.sal) over w as 'first',
-> last_value(a.sal) over w as 'last',
-> nth_value(a.sal,2) over w as 'top_2'
-> from emp a
-> window w as (partition by a.deptno order by sal rows between unbounded preceding and unbounded following)
-> ;
+-------+--------+---------+---------+---------+---------+
| empno | deptno | sal | first | last | top_2 |
+-------+--------+---------+---------+---------+---------+
| 7934 | 10 | 1300.00 | 1300.00 | 5000.00 | 2450.00 |
| 7782 | 10 | 2450.00 | 1300.00 | 5000.00 | 2450.00 |
| 7839 | 10 | 5000.00 | 1300.00 | 5000.00 | 2450.00 |
| 7369 | 20 | 800.00 | 800.00 | 3000.00 | 1100.00 |
| 7876 | 20 | 1100.00 | 800.00 | 3000.00 | 1100.00 |
| 7566 | 20 | 2975.00 | 800.00 | 3000.00 | 1100.00 |
| 7788 | 20 | 3000.00 | 800.00 | 3000.00 | 1100.00 |
| 7902 | 20 | 3000.00 | 800.00 | 3000.00 | 1100.00 |
| 7900 | 30 | 950.00 | 950.00 | 2850.00 | 1250.00 |
| 7521 | 30 | 1250.00 | 950.00 | 2850.00 | 1250.00 |
| 7654 | 30 | 1250.00 | 950.00 | 2850.00 | 1250.00 |
| 7844 | 30 | 1500.00 | 950.00 | 2850.00 | 1250.00 |
| 7499 | 30 | 1600.00 | 950.00 | 2850.00 | 1250.00 |
| 7698 | 30 | 2850.00 | 950.00 | 2850.00 | 1250.00 |
+-------+--------+---------+---------+---------+---------+
14 rows in set (0.00 sec)
mysql>
mysql> -- 1 preceding and 1 following 当前行的前一行到当前行的后一行
mysql> select a.empno,a.deptno,a.sal,
-> first_value(a.sal) over w as 'first',
-> last_value(a.sal) over w as 'last',
-> nth_value(a.sal,2) over w as 'top_2'
-> from emp a
-> window w as (partition by a.deptno order by sal rows between 1 preceding and 1 following)
-> ;
+-------+--------+---------+---------+---------+---------+
| empno | deptno | sal | first | last | top_2 |
+-------+--------+---------+---------+---------+---------+
| 7934 | 10 | 1300.00 | 1300.00 | 2450.00 | 2450.00 |
| 7782 | 10 | 2450.00 | 1300.00 | 5000.00 | 2450.00 |
| 7839 | 10 | 5000.00 | 2450.00 | 5000.00 | 5000.00 |
| 7369 | 20 | 800.00 | 800.00 | 1100.00 | 1100.00 |
| 7876 | 20 | 1100.00 | 800.00 | 2975.00 | 1100.00 |
| 7566 | 20 | 2975.00 | 1100.00 | 3000.00 | 2975.00 |
| 7788 | 20 | 3000.00 | 2975.00 | 3000.00 | 3000.00 |
| 7902 | 20 | 3000.00 | 3000.00 | 3000.00 | 3000.00 |
| 7900 | 30 | 950.00 | 950.00 | 1250.00 | 1250.00 |
| 7521 | 30 | 1250.00 | 950.00 | 1250.00 | 1250.00 |
| 7654 | 30 | 1250.00 | 1250.00 | 1500.00 | 1250.00 |
| 7844 | 30 | 1500.00 | 1250.00 | 1600.00 | 1500.00 |
| 7499 | 30 | 1600.00 | 1500.00 | 2850.00 | 1600.00 |
| 7698 | 30 | 2850.00 | 1600.00 | 2850.00 | 2850.00 |
+-------+--------+---------+---------+---------+---------+
14 rows in set (0.00 sec)
percent_rank语法:
percent_rank() over w
window w as ([partition-by-clause] [order-by-clause] )
CUME_DIST语法
cume_dist() over w
window w as ([partition-by-clause] [order-by-clause] )
percent_rank:
– percent_rank函数以0到1之间的分数形式返回某个值在数据分区中的排名
– percent_rank的计算公式为(rank-1)/(n-1)
CUME_DIST:
–一个5行的组中,返回的累计分布值为0.2,0.4,0.6,0.8,1.0;
–注意对于重复行,计算时取重复行中的最后一行的位置。
SELECT a.empno,
a.ename,
a.deptno,
a.sal,
percent_rank() over w as 'num',
cume_dist() over w as 'cume'
FROM emp a
window w as (PARTITION BY a.deptno ORDER BY a.sal DESC)
;
mysql> SELECT a.empno,
-> a.ename,
-> a.deptno,
-> a.sal,
-> percent_rank() over w as 'num',
-> cume_dist() over w as 'cume'
-> FROM emp a
-> window w as (PARTITION BY a.deptno ORDER BY a.sal DESC);
+-------+--------+--------+---------+------+---------------------+
| empno | ename | deptno | sal | num | cume |
+-------+--------+--------+---------+------+---------------------+
| 7839 | KING | 10 | 5000.00 | 0 | 0.3333333333333333 |
| 7782 | CLARK | 10 | 2450.00 | 0.5 | 0.6666666666666666 |
| 7934 | MILLER | 10 | 1300.00 | 1 | 1 |
| 7788 | SCOTT | 20 | 3000.00 | 0 | 0.4 |
| 7902 | FORD | 20 | 3000.00 | 0 | 0.4 |
| 7566 | JONES | 20 | 2975.00 | 0.5 | 0.6 |
| 7876 | ADAMS | 20 | 1100.00 | 0.75 | 0.8 |
| 7369 | SMITH | 20 | 800.00 | 1 | 1 |
| 7698 | BLAKE | 30 | 2850.00 | 0 | 0.16666666666666666 |
| 7499 | ALLEN | 30 | 1600.00 | 0.2 | 0.3333333333333333 |
| 7844 | TURNER | 30 | 1500.00 | 0.4 | 0.5 |
| 7521 | WARD | 30 | 1250.00 | 0.6 | 0.8333333333333334 |
| 7654 | MARTIN | 30 | 1250.00 | 0.6 | 0.8333333333333334 |
| 7900 | JAMES | 30 | 950.00 | 1 | 1 |
+-------+--------+--------+---------+------+---------------------+
14 rows in set (0.00 sec)
Ntile语法:
Ntile(expr) OVER w
window w as ([ query_partition_clause ] order_by_clause)
Ntile 把数据行分成N个桶。每个桶会有相同的行数,正负误差为1
将员工表emp按照工资分为2、3个桶
-- 分成2个桶
SELECT ENAME, SAL, NTILE(2) OVER w as 'n' FROM EMP
window w as (ORDER BY SAL ASC)
;
-- 分成3个桶
SELECT ENAME, SAL, NTILE(3) OVER w as 'n' FROM EMP
window w as (ORDER BY SAL ASC)
;
mysql> -- 分成2个桶
mysql> SELECT ENAME, SAL, NTILE(2) OVER w as 'n' FROM EMP
-> window w as (ORDER BY SAL ASC)
-> ;
+--------+---------+------+
| ENAME | SAL | n |
+--------+---------+------+
| SMITH | 800.00 | 1 |
| JAMES | 950.00 | 1 |
| ADAMS | 1100.00 | 1 |
| WARD | 1250.00 | 1 |
| MARTIN | 1250.00 | 1 |
| MILLER | 1300.00 | 1 |
| TURNER | 1500.00 | 1 |
| ALLEN | 1600.00 | 2 |
| CLARK | 2450.00 | 2 |
| BLAKE | 2850.00 | 2 |
| JONES | 2975.00 | 2 |
| SCOTT | 3000.00 | 2 |
| FORD | 3000.00 | 2 |
| KING | 5000.00 | 2 |
+--------+---------+------+
14 rows in set (0.00 sec)
mysql>
mysql> -- 分成3个桶
mysql> SELECT ENAME, SAL, NTILE(3) OVER w as 'n' FROM EMP
-> window w as (ORDER BY SAL ASC)
-> ;
+--------+---------+------+
| ENAME | SAL | n |
+--------+---------+------+
| SMITH | 800.00 | 1 |
| JAMES | 950.00 | 1 |
| ADAMS | 1100.00 | 1 |
| WARD | 1250.00 | 1 |
| MARTIN | 1250.00 | 1 |
| MILLER | 1300.00 | 2 |
| TURNER | 1500.00 | 2 |
| ALLEN | 1600.00 | 2 |
| CLARK | 2450.00 | 2 |
| BLAKE | 2850.00 | 2 |
| JONES | 2975.00 | 3 |
| SCOTT | 3000.00 | 3 |
| FORD | 3000.00 | 3 |
| KING | 5000.00 | 3 |
+--------+---------+------+
14 rows in set (0.00 sec)