[MySQL] 组内排序

1. 背景

这几天遇到了一个对查询结果分组,再进行组内排序的问题。
查了很多资料,能成功的办法实在是太少了,因此整理了一下,把事情的来龙去脉总结在这里。

2. 数据表

a1  a2  a3 
-----------
a   1   x
a   2   y
b   3   z

表中有3个字段,a1a2a3
我们希望对a1进行分组,然后对组内的数据进行排序,
找出每组中a2为最大值的那行记录。

该例子中,可以分为两组,

a1  a2  a3 
-----------
a   1   x
a   2   y    <- 该组中这条记录 a2 为最大值
a1  a2  a3 
-----------
b   3   z

所以,我们预期的结果是,

a1  a2  a3 
-----------
a   2   y
b   3   z

3. group by

我们先用最简单的 group by试一下,看看默认行为,

select * from t1 group by a1
a1  a2  a3 
-----------
a   1   x    <- 选哪条记录是未定义的,由引擎决定
b   3   z

group by的行为具有不确定性,

If ONLY_FULL_GROUP_BY is disabled, a MySQL extension to the standard SQL use of GROUP BY permits the select list, HAVING condition, or ORDER BY list to refer to nonaggregated columns even if the columns are not functionally dependent on GROUP BY columns. This causes MySQL to accept the preceding query. In this case, the server is free to choose any value from each group, so unless they are the same, the values chosen are nondeterministic, which is probably not what you want.

并且,在后面添加order by也是不行的,因为排序发生在分组之后。

Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause. Result set sorting occurs after values have been chosen, and ORDER BY does not affect which value within each group the server chooses.

select * from t1 group by a1 order by a2 desc
a1  a2  a3 
-----------
b   3   z
a   1   x

我们看到,MySQL是将分组筛选后的结果再进行了排序。

3. max + group by

有一些文章写了,可以结合maxgroup by一起使用,用来进行组内排序。

select *, max(a2) as max_a2 from t1 group by a1
a1  a2  a3  max_a2
------------------
a   1   x   2
b   3   z   3

a1max_a2确实选对了,但是其他列a2a3还是不正确。

4. 先排序再分组

select * from (
    select * from t1 order by a2 desc
) t
group by t.a1
a1  a2  a3 
-----------
a   1   x
b   3   z

这样居然不行

为什么呢?
我们可以在 MariaDB: Why is ORDER BY in a FROM Subquery Ignored? 中找到一些线索,

A "table" (and subquery in the FROM clause too) is - according to the SQL standard - an unordered set of rows. Rows in a table (or in a subquery in the FROM clause) do not come in any specific order. That's why the optimizer can ignore the ORDER BY clause that you have specified. In fact, the SQL standard does not even allow the ORDER BY clause to appear in this subquery (we allow it, because ORDER BY ... LIMIT ... changes the result, the set of rows, not only their order).
You need to treat the subquery in the FROM clause, as a set of rows in some unspecified and undefined order, and put the ORDER BY on the top-level SELECT.

文中还提到了,在子查询中添加limit,就能绕过这个问题。

this cause the optimizer to create a temporary table, and use filesort to order the query
the limit number is a 64bit unsigned -1 (2^64-1), this is a big number and can work with 99.999% of queries i know

select * from (
    select * from t1 order by a2 desc limit 18446744073709551615
) t
group by t.a1

其中,18446744073709551615是无符号64位整数的最大值,
即,264次方减1

a1  a2  a3 
-----------
b   3   z
a   2   y

5. group by + having

我们还可以使用having来筛选满足条件的组。

select * from t1 u group by a1, a2
having a2 = (
  select max(a2) from t1 where a1 = u.a1
)

我们指定根据a1a2进行分组,这两个字段全部相同时,才视为同一组。
然后,让a2满足一下条件:在a1指定的情况下,取a2中的最大值。

a1  a2  a3 
-----------
a   2   y
b   3   z

6. inner join

除此之外,还可以使用inner join对结果求交集。

select u.* from t1 u
inner join (
    select a1, max(a2) as max_a2 from t1 group by a1
) v on u.a1 = v.a1 and u.a2 = v.max_a2

我们先通过,子查询 select a1, max(a2) as max_a2 from t1 group by a1
找到了满足条件的a1max_a2记录。
(值得注意的是,该子查询中的其它字段,并不符合要求,可见上文第3节的示例。)

然后,我们使用inner join将原表与这个子查询的结果求交集,
交集条件是,找出a1相同,且a2max_a2的那些记录。
这就是我们要的结果了。

a1  a2  a3 
-----------
a   2   y
b   3   z

7. 总结

对于组内排序问题,目前我只找到了3种行之有效的方法。

(1)先排序在分组,并且在子查询中使用limit

select * from (
    select * from t1 order by a2 desc limit 18446744073709551615
) t
group by t.a1

(2)group by + having

select * from t1 u group by a1, a2
having a2 = (
  select max(a2) from t1 where a1 = u.a1
)

(3)inner join

select u.* from t1 u
inner join (
    select a1, max(a2) as max_a2 from t1 group by a1
) v on u.a1 = v.a1 and u.a2 = v.max_a2

参考

MySQL 8.0 Reference Manual: MySQL Handling of GROUP BY
MariaDB: Why is ORDER BY in a FROM Subquery Ignored?

你可能感兴趣的:([MySQL] 组内排序)