从分组中选择极值查询/前N项查询 是经常会遇到的问题 ,下面通过简单举例展示这种SQL的写法
举例表
type |
variety |
price |
apple |
gala |
2.79 |
apple |
fuji |
0.24 |
apple |
limbertwig |
2.87 |
orange |
valencia |
3.59 |
orange |
navel |
9.36 |
pear |
bradford |
6.05 |
pear |
bartlett |
2.14 |
cherry |
bing |
2.55 |
cherry |
chelan |
6.33 |
Selecting the one minimum row from each group
期望结果
type |
variety |
price |
apple |
fuji |
0.24 |
orange |
valencia |
3.59 |
pear |
bartlett |
2.14 |
cherry |
bing |
2.55 |
方法一 通过分组子查询实现
select f.type, f.variety, f.price
from (
select type, min(price) as minprice
from fruits group by type
) as x inner join fruits as f on f.type = x.type and f.price = x.minprice;
结果:
type |
variety |
price |
apple |
fuji |
0.24 |
cherry |
bing |
2.55 |
orange |
valencia |
3.59 |
pear |
bartlett |
2.14 |
方法二 通过关联子查询实现
select type, variety, price
from fruits
where price = (select min(price) from fruits as f where f.type = fruits.type);
结果:
type |
variety |
price |
apple |
fuji |
0.24 |
orange |
valencia |
3.59 |
pear |
bartlett |
2.14 |
cherry |
bing |
2.55 |
以上两个查询是等价的.
Select the top N rows from each group
每组前N个查询是比较痛苦的问题,因为聚合函数只返回一个值,所以通过聚集函数分组查询前几个数据是不可能的.
比方说,我要选择每个类型最便宜的两个水果。
可以通过变换SQL写法实现
方法一:
select type, variety, price
from fruits
where price = (select min(price) from fruits as f where f.type = fruits.type)
or price = (select min(price) from fruits as f where f.type = fruits.type
and price > (select min(price) from fruits as f2 where f2.type = fruits.type));
结果:
type |
variety |
price |
apple |
gala |
2.79 |
apple |
fuji |
0.24 |
orange |
valencia |
3.59 |
orange |
navel |
9.36 |
pear |
bradford |
6.05 |
pear |
bartlett |
2.14 |
cherry |
bing |
2.55 |
cherry |
chelan |
6.33 |
这种大量子查询方法性能很差,如果是前3个,前4个等等 这种查询变得不可实现
方法二 从每个品种的水果,品种不超过第二便宜的查询,通过关联子查询实现
select type, variety, price
from fruits
where (
select count(1) from fruits as f
where f.type = fruits.type and f.price < fruits.price
) <= 2;
第二种方法在fruits表很大时效果不佳
方法三
可以使用union all 实现 (union all 与 union 的区别是 前者不会通过排序消除重复)
(select * from fruits where type = 'apple' order by price limit 2)
union all
(select * from fruits where type = 'orange' order by price limit 2)
union all
(select * from fruits where type = 'pear' order by price limit 2)
union all
(select * from fruits where type = 'cherry' order by price limit 2)
如果分组(这里是水果种类)数量不大/分页的情况下,可以使用union把数据切成多段分开查询,在好的索引支持下效率很高.
(测试中发现如果去掉每段查询的圆括号则limit 限制整个结果集的返回行数 而不是每段.)
使用union all 联合查询是解决N+1 问题的利器(特别是1 - n 中 n方数据量特别大时),不过JPA 框架在处理union all 查询时有bug (悲催啊)
h3. 实际项目使用情况:
CRM中商家页面 商家与分店是典型的N+1查询问题,由于有些商家分店数比较多,页面中只展示前3个分店,此时可以使用union all 查询优化 , 通过一条SQL 返回所有商家的前3家分店.
实例参考自Baron Schwartz的博客:
http://www.xaprb.com/blog