MySQL进阶排序(窗口函数)----分组并对组内排序后每个组取前N个数据(ORDER BY、ROW_NUMBER)

基本的排序–ORDER BY

Mysql中对数据的排序的基础方法是使用ORDER BY升序为ASC逆序为DESC

mysql> select * from student order by sid asc;
		//按sid 升序排列
+-------------+-----------+----------+-------+-----+
| sid         | sname     | major    | tid   | sex |
+-------------+-----------+----------+-------+-----+
| 16451082123 | ZH        | math     | 17531 ||
| 16451082124 | L         | computer | 17532 ||
| 16451082125 | Z         | math     | 17533 ||
| 16451082126 | M         | English  | 17532 ||
| 16451082127 | ZT        | Chinese  | 17534 ||
| 16451082128 | 王力宏    | math     | 17531 ||
+-------------+-----------+----------+-------+-----+
6 rows in set (0.01 sec)

mysql> select * from student order by sid desc;
		//按sid 逆序排列
+-------------+-----------+----------+-------+-----+
| sid         | sname     | major    | tid   | sex |
+-------------+-----------+----------+-------+-----+
| 16451082128 | 王力宏    | math     | 17531 ||
| 16451082127 | ZT        | Chinese  | 17534 ||
| 16451082126 | M         | English  | 17532 ||
| 16451082125 | Z         | math     | 17533 ||
| 16451082124 | L         | computer | 17532 ||
| 16451082123 | ZH        | math     | 17531 ||
+-------------+-----------+----------+-------+-----+
6 rows in set (0.00 sec)

组内排序

面对数据,有时候我们需要得到数据分组后的排序。

第一反应使用ORDER BY。但是这就会遇到一个问题,如果简单的使用ORDER BY,那么是对全局排序。

故需要对两个字段ORDER BY

mysql> select * from student order by major asc, sid desc;
+-------------+-----------+----------+-------+-----+
| sid         | sname     | major    | tid   | sex |
+-------------+-----------+----------+-------+-----+
| 16451082127 | ZT        | Chinese  | 17534 ||
| 16451082124 | L         | computer | 17532 ||
| 16451082126 | M         | English  | 17532 ||
| 16451082128 | 王力宏     | math     | 17531 ||
| 16451082125 | Z         | math     | 17533 ||
| 16451082123 | ZH        | math     | 17531 ||
+-------------+-----------+----------+-------+-----+
6 rows in set (0.00 sec)

这样我们可以得到最简单的组内排序。

但是现实情况往往更加复杂,比如我需要得到组内排名前10的学生,或者每门选修课程中学生学号最大的前两名。这时候简单的使用GROUP BY、ORDER BY、LIMIT就比较难以实现。

这里我们可以使用一个小技巧,如果是需要对组内数据进行逆序排序取前N个,可以添加一个辅助列,将表进行复制比较,按照分组进行拼接,同时在进行目标行匹配时,比较同个组中的数值大小。也就是说在join on 时有两个条件,通过值的比较,获得组内的大小排名,<=得到的是从大到小1,2,3…的顺序,>=得到的是从小到大1,2,3…的顺序。
示例:

mysql> select a.sid,a.major,b.major,b.sid 
		from student a 
		left join 
		student b 
		on 
		a.major = b.major and a.sid<= b.sid;
+-------------+----------+----------+-------------+
| sid         | major    | major    | sid         |
+-------------+----------+----------+-------------+
| 16451082123 | math     | math     | 16451082123 |
| 16451082123 | math     | math     | 16451082125 |
| 16451082123 | math     | math     | 16451082128 |
| 16451082124 | computer | computer | 16451082124 |
| 16451082125 | math     | math     | 16451082125 |
| 16451082125 | math     | math     | 16451082128 |
| 16451082126 | English  | English  | 16451082126 |
| 16451082127 | Chinese  | Chinese  | 16451082127 |
| 16451082128 | math     | math     | 16451082128 |
+-------------+----------+----------+-------------+
9 rows in set (0.01 sec)

故有以上的中间表我们可以做进一步的工作,即按major进行分组后对sid进行计数COUNT若计数值小于等于2 ,那么就是组内学号排名前2的学生。

mysql> select a.sid,a.major,count(*) 
		from student a 
		left join 
		student b 
		on a.major =b.major and a.sid<=b.sid   #关键在于a.sid<=b.sid
		group by a.major ,a.sid 
		having count(*)<=2 
		order by major asc,sid desc;
+-------------+----------+----------+
| sid         | major    | count(*) |
+-------------+----------+----------+
| 16451082127 | Chinese  |        1 |
| 16451082124 | computer |        1 |
| 16451082126 | English  |        1 |
| 16451082128 | math     |        1 |
| 16451082125 | math     |        2 |
+-------------+----------+----------+
5 rows in set (0.01 sec)

进阶排序方法 – ROW_NUMBER() + OVER()

这里介绍一个更加简便的方法。使用ROW_NUMBER() + OVER( partition by )

row_number()是对行的排序编号:

mysql> select *, row_number() over(order by sid desc) as row_num from student ; 
+-------------+-----------+----------+-------+-----+---------+
| sid         | sname     | major    | tid   | sex | row_num |
+-------------+-----------+----------+-------+-----+---------+
| 16451082128 | 王力宏     | math     | 17531 ||       1 |
| 16451082127 | ZT        | Chinese  | 17534 ||       2 |
| 16451082126 | M         | English  | 17532 ||       3 |
| 16451082125 | Z         | math     | 17533 ||       4 |
| 16451082124 | L         | computer | 17532 ||       5 |
| 16451082123 | ZH        | math     | 17531 ||       6 |
+-------------+-----------+----------+-------+-----+---------+
6 rows in set (0.00 sec)


row number() + over()中添加partition by 则表示按照对象分组后排序编号

mysql> select *, 
		row_number() over(partition by major order by sid desc) row_num 
		from student ;
+-------------+-----------+----------+-------+-----+---------+
| sid         | sname     | major    | tid   | sex | row_num |
+-------------+-----------+----------+-------+-----+---------+
| 16451082127 | ZT        | Chinese  | 17534 ||       1 |
| 16451082124 | L         | computer | 17532 ||       1 |
| 16451082126 | M         | English  | 17532 ||       1 |
| 16451082128 | 王力宏     | math     | 17531 ||       1 |
| 16451082125 | Z         | math     | 17533 ||       2 |
| 16451082123 | ZH        | math     | 17531 ||       3 |
+-------------+-----------+----------+-------+-----+---------+
6 rows in set (0.00 sec)

除了row_number函数之外,还有两个分组排序函数,分别是rank() 和dense_rank()

row_number() 在排序相同时不重复,会根据顺序排序。

rank()排序相同时会重复,总数不会变,意思是会出现1、1、3这样的排序结果;

mysql> select *,
		rank() over(partition by major order by tid asc) row_num 
		from student;
+-------------+-----------+----------+-------+-----+---------+
| sid         | sname     | major    | tid   | sex | row_num |
+-------------+-----------+----------+-------+-----+---------+
| 16451082127 | ZT        | Chinese  | 17534 ||       1 |
| 16451082124 | L         | computer | 17532 ||       1 |
| 16451082126 | M         | English  | 17532 ||       1 |
| 16451082123 | ZH        | math     | 17531 ||       1 |
| 16451082128 | 王力宏     | math     | 17531 ||       1 |
| 16451082125 | Z         | math     | 17533 ||       3 |
+-------------+-----------+----------+-------+-----+---------+
6 rows in set (0.00 sec)

dense_rank()排序相同时会重复,总数会减少,意思是会出现1、1、2这样的排序结果。

mysql> select *, dense_rank() over(partition by major order by tid asc) row_num from student;
+-------------+-----------+----------+-------+-----+---------+
| sid         | sname     | major    | tid   | sex | row_num |
+-------------+-----------+----------+-------+-----+---------+
| 16451082127 | ZT        | Chinese  | 17534 ||       1 |
| 16451082124 | L         | computer | 17532 ||       1 |
| 16451082126 | M         | English  | 17532 ||       1 |
| 16451082123 | ZH        | math     | 17531 ||       1 |
| 16451082128 | 王力宏     | math     | 17531 ||       1 |
| 16451082125 | Z         | math     | 17533 ||       2 |
+-------------+-----------+----------+-------+-----+---------+

你可能感兴趣的:(机器学习)