阅读整理自《MySQL 必知必会》- 朱晓峰,详细内容请登录 极客时间 官网购买专栏。
查询的时候,经常需要按条件对查询结果进行筛选,这就要用到条件语句 WHERE 和 HAVING 了。
WHERE 是直接对表中的字段进行限定,来筛选结果;
HAVING 则需要跟分组关键字 GROUP BY 一起使用,通过对分组字段或分组计算函数进行限定,来筛选结果。
虽然它们都是对查询进行限定,却有着各自的特点和适用场景。很多时候,会遇到 2 个都可以用的情况。一旦用错,就很容易出现执行效率低下、查询结果错误,甚至是查询无法运行的情况。
超市的经营者提出,要查单笔销售金额超过 50 元的商品。我们来分析一下这个需求:需要查询出一个商品记录集,限定条件是单笔销售金额超过 50 元。
假设我们有一个这样的商品信息表(demo.goodsmaster
),里面有 2 种商品:书和笔。
数据准备:
show databases;
use demo;
create table demo.goodsmaster
(
itemnumber int primary key auto_increment,
barcode text,
goodsname text,
specification text,
unit text,
saleprice decimal(5,2)
);
describe goodsmaster;
insert into demo.goodsmaster (itemnumber, barcode, goodsname, specification, unit, saleprice) values (1, '0001', '书', '', '本', 89);
insert into demo.goodsmaster (itemnumber, barcode, goodsname, specification, unit, saleprice) values (2, '0002', '笔', '', '支', 5);
select * from demo.goodsmaster;
create table demo.transactiondetails
(
transactionid int,
itemnumber int,
quantity decimal(10,3),
price decimal(5,2),
salesvalue decimal(5,2)
);
describe transactiondetails;
insert into demo.transactiondetails (transactionid, itemnumber, quantity, price, salesvalue) values (1, 1, 1, 89, 89);
insert into demo.transactiondetails (transactionid, itemnumber, quantity, price, salesvalue) values (1, 2, 2, 5, 10);
insert into demo.transactiondetails (transactionid, itemnumber, quantity, price, salesvalue) values (2, 1, 2, 89, 178);
insert into demo.transactiondetails (transactionid, itemnumber, quantity, price, salesvalue) values (3, 2, 10, 5, 50);
select * from demo.transactiondetails;
商品信息表:
mysql> select * from demo.goodsmaster;
+------------+---------+-----------+---------------+------+-----------+
| itemnumber | barcode | goodsname | specification | unit | saleprice |
+------------+---------+-----------+---------------+------+-----------+
| 1 | 0001 | 书 | | 本 | 89.00 |
| 2 | 0002 | 笔 | | 支 | 5.00 |
+------------+---------+-----------+---------------+------+-----------+
2 rows in set (0.01 sec)
商品销售明细表:
mysql> select * from demo.transactiondetails;
+---------------+------------+----------+-------+------------+
| transactionid | itemnumber | quantity | price | salesvalue |
+---------------+------------+----------+-------+------------+
| 1 | 1 | 1.000 | 89.00 | 89.00 |
| 1 | 2 | 2.000 | 5.00 | 10.00 |
| 2 | 1 | 2.000 | 89.00 | 178.00 |
| 3 | 2 | 10.000 | 5.00 | 50.00 |
+---------------+------------+----------+-------+------------+
4 rows in set (0.00 sec)
使用 where
查询:
select distinct b.goodsname
from demo.transactiondetails as a
join demo.goodsmaster as b
on (a.itemnumber = b.itemnumber)
where a.salesvalue > 50;
使用 having
查询:
select b.goodsname
from demo.transactiondetails as a
join demo.goodsmaster as b
on (a.itemnumber = b.itemnumber)
group by b.goodsname
having max(a.salesvalue) > 50;
结果都是:
+-----------+
| goodsname |
+-----------+
| 书 |
+-----------+
1 row in set (0.02 sec)
分析一下使用 WHERE 条件的查询语句,是如何执行这个查询的
首先,MySQL 从数据表 demo.transactiondetails
中抽取满足条件 a.salesvalue>50
的记录:
mysql> select * from demo.transactiondetails as a where a.salesvalue > 50;
+---------------+------------+----------+-------+------------+
| transactionid | itemnumber | quantity | price | salesvalue |
+---------------+------------+----------+-------+------------+
| 1 | 1 | 1.000 | 89.00 | 89.00 |
| 2 | 1 | 2.000 | 89.00 | 178.00 |
+---------------+------------+----------+-------+------------+
2 rows in set (0.00 sec)
为了获取到销售信息所对应的商品名称,需要通过公共字段 itemnumbers
与数据表 demo.goodsmaster
进行关联,从 demo.goodsmaster
中获取商品名称:
mysql> select a.*, b.goodsname
-> from demo.transactiondetails a
-> join demo.goodsmaster b
-> on (a.itemnumber = b.itemnumber)
-> where a.salesvalue > 50;
+---------------+------------+----------+-------+------------+-----------+
| transactionid | itemnumber | quantity | price | salesvalue | goodsname |
+---------------+------------+----------+-------+------------+-----------+
| 1 | 1 | 1.000 | 89.00 | 89.00 | 书 |
| 2 | 1 | 2.000 | 89.00 | 178.00 | 书 |
+---------------+------------+----------+-------+------------+-----------+
2 rows in set (0.00 sec)
查询商品名称,就会出现两个重复的记录:
mysql> select b.goodsname from demo.transactiondetails a join demo.goodsmaster b on (a.itemnumber = b.itemnumber) where a.salesvalue > 50;
+-----------+
| goodsname |
+-----------+
| 书 |
| 书 |
+-----------+
2 rows in set (0.00 sec)
为了消除重复的语句,这里我们需要用到一个关键字:DISTINCT
,它的作用是返回唯一不同的值。比如,DISTINCT 字段 1,就表示返回所有字段 1 的不同的值
mysql> select distinct( b.goodsname )from demo.transactiondetails a join demo.goodsmaster b on (a.itemnumber = b.itemnumber) where a.salesvalue > 50;
+-----------+
| goodsname |
+-----------+
| 书 |
+-----------+
1 row in set (0.00 sec)
WHERE 关键字的特点是,直接用表的字段对数据集进行筛选。
如果需要通过关联查询从其他的表获取需要的信息,那么执行的时候,也是先通过 WHERE 条件进行筛选,用筛选后的比较小的数据集进行连接。
这样一来,连接过程中占用的资源比较少,执行效率也比较高。
HAVING 不能单独使用,必须要跟 GROUP BY 一起使用。可以把 GROUP BY 理解成对数据进行分组,方便对组内的数据进行统计计算。
先说明 GROUP BY 如何使用,以及如何在分组里面进行统计计算。
假设现在有一组销售数据,我们需要从里面查询每天、每个收银员的销售数量和销售金额。
数据准备:
create table demo.transactionhead
(
transactionid int primary key auto_increment,
transactionno text,
operatorid int,
transdate datetime
);
insert into demo.transactionhead (transactionid, transactionno, operatorid, transdate) values (1, '0120201201000001', 1, '2020-12-10 00:00:00');
insert into demo.transactionhead (transactionid, transactionno, operatorid, transdate) values (2, '0120201202000001', 2, '2020-12-11 00:00:00');
insert into demo.transactionhead (transactionid, transactionno, operatorid, transdate) values (3, '0120201202000002', 2, '2020-12-12 00:00:00');
select * from demo.transactionhead;
当前数据:
mysql> SELECT * FROM demo.transactionhead;
+---------------+------------------+------------+---------------------+
| transactionid | transactionno | operatorid | transdate |
+---------------+------------------+------------+---------------------+
| 1 | 0120201201000001 | 1 | 2020-12-10 00:00:00 |
| 2 | 0120201202000001 | 2 | 2020-12-11 00:00:00 |
| 3 | 0120201202000002 | 2 | 2020-12-12 00:00:00 |
+---------------+------------------+------------+---------------------+
mysql> SELECT * FROM demo.transactiondetails;
+---------------+------------+----------+-------+------------+
| transactionid | itemnumber | quantity | price | salesvalue |
+---------------+------------+----------+-------+------------+
| 1 | 1 | 1.000 | 89.00 | 89.00 |
| 1 | 2 | 2.000 | 5.00 | 10.00 |
| 2 | 1 | 2.000 | 89.00 | 178.00 |
| 3 | 2 | 10.000 | 5.00 | 50.00 |
+---------------+------------+----------+-------+------------+
mysql> SELECT * FROM demo.operator;
+------------+----------+--------+--------------+-------------+---------+--------------------+--------+
| operatorid | branchid | workno | operatorname | phone | address | pid | duty |
+------------+----------+--------+--------------+-------------+---------+--------------------+--------+
| 1 | 1 | 001 | 张静 | 18612345678 | 北京 | 110392197501012332 | 店长 |
| 2 | 1 | 002 | 李强 | 13312345678 | 北京 | 110222199501012332 | 收银员 |
+------------+----------+--------+--------------+-------------+---------+--------------------+--------+
mysql> SELECT * FROM demo.goodsmaster;
+------------+---------+-----------+---------------+------+-----------+
| itemnumber | barcode | goodsname | specification | unit | saleprice |
+------------+---------+-----------+---------------+------+-----------+
| 1 | 0001 | 书 | | 本 | 89.00 |
| 2 | 0002 | 笔 | | 支 | 5.00 |
+------------+---------+-----------+---------------+------+-----------+
查询:
mysql> SELECT a.transdate, c.operatorname, d.goodsname, b.quantity, b.price, b.salesvalue
-> FROM demo.transactionhead AS a
-> JOIN demo.transactiondetails AS b ON (a.transactionid = b.transactionid)
-> JOIN demo.operator AS c ON (a.operatorid = c.operatorid)
-> JOIN demo.goodsmaster AS d ON (b.itemnumber = d.itemnumber);
+---------------------+--------------+-----------+----------+-------+------------+
| transdate | operatorname | goodsname | quantity | price | salesvalue |
+---------------------+--------------+-----------+----------+-------+------------+
| 2020-12-10 00:00:00 | 张静 | 书 | 1.000 | 89.00 | 89.00 |
| 2020-12-10 00:00:00 | 张静 | 笔 | 2.000 | 5.00 | 10.00 |
| 2020-12-11 00:00:00 | 李强 | 书 | 2.000 | 89.00 | 178.00 |
| 2020-12-12 00:00:00 | 李强 | 笔 | 10.000 | 5.00 | 50.00 |
+---------------------+--------------+-----------+----------+-------+------------+
如果想看看每天的销售数量和销售金额,可以按照一个字段 transdate 对数据进行分组和统计:
mysql> select a.transdate, sum(b.quantity), sum(b.salesvalue)
-> from demo.transactionhead as a
-> join demo.transactiondetails as b on (a.transactionid = b.transactionid)
-> group by a.transdate;
+---------------------+-----------------+-------------------+
| transdate | sum(b.quantity) | sum(b.salesvalue) |
+---------------------+-----------------+-------------------+
| 2020-12-10 00:00:00 | 3.000 | 99.00 |
| 2020-12-11 00:00:00 | 2.000 | 178.00 |
| 2020-12-12 00:00:00 | 10.000 | 50.00 |
+---------------------+-----------------+-------------------+
如果想看每天、每个收银员的销售数量和销售金额,就可以按 2 个字段进行分组和统计,分别是 transdate 和 operatorname:
mysql> select a.transdate, c.operatorname, sum(b.quantity), sum(b.salesvalue)
-> from demo.transactionhead as a
-> join demo.transactiondetails as b on (a.transactionid = b.transactionid)
-> join demo.operator as c on (a.operatorid = c.operatorid)
-> group by a.transdate, c.operatorid;
+---------------------+--------------+-----------------+-------------------+
| transdate | operatorname | sum(b.quantity) | sum(b.salesvalue) |
+---------------------+--------------+-----------------+-------------------+
| 2020-12-10 00:00:00 | 张静 | 3.000 | 99.00 |
| 2020-12-11 00:00:00 | 李强 | 2.000 | 178.00 |
| 2020-12-12 00:00:00 | 李强 | 10.000 | 50.00 |
+---------------------+--------------+-----------------+-------------------+
通过对销售数据按照交易日期和收银员进行分组,再对组内数据进行求和统计,就实现了对每天、每个收银员的销售数量和销售金额的查询。
回到开头的超市经营者的需求:查询单笔销售金额超过 50 元的商品:
mysql> select a.goodsname
-> from demo.goodsmaster as a
-> join demo.transactiondetails as b on (a.itemnumber = b.itemnumber)
-> group by a.goodsname
-> having max(b.salesvalue) > 50;
+-----------+
| goodsname |
+-----------+
| 书 |
+-----------+
1 row in set (0.00 sec)
这种查询方式在 MySQL 里面是分四步实现的:
第一步,把流水明细表和商品信息表通过公共字段 itemnumber 连接起来,从 2 个表中获取数据:
mysql> select a.*, b.*
-> from demo.transactiondetails as a
-> join demo.goodsmaster as b on (a.itemnumber = b.itemnumber);
+---------------+------------+----------+-------+------------+------------+---------+-----------+---------------+------+-----------+
| transactionid | itemnumber | quantity | price | salesvalue | itemnumber | barcode | goodsname | specification | unit | saleprice |
+---------------+------------+----------+-------+------------+------------+---------+-----------+---------------+------+-----------+
| 1 | 1 | 1.000 | 89.00 | 89.00 | 1 | 0001 | 书 | | 本 | 89.00 |
| 1 | 2 | 2.000 | 5.00 | 10.00 | 2 | 0002 | 笔 | | 支 | 5.00 |
| 2 | 1 | 2.000 | 89.00 | 178.00 | 1 | 0001 | 书 | | 本 | 89.00 |
| 3 | 2 | 10.000 | 5.00 | 50.00 | 2 | 0002 | 笔 | | 支 | 5.00 |
+---------------+------------+----------+-------+------------+------------+---------+-----------+---------------+------+-----------+
第二步,把结果集按照商品名称(itemnumber)分组:
组 1:
组 2:
第三步,对分组后的数据集进行筛选,把组中字段 salesvalue 的最大值 >50 的组筛选出来:
第四步,返回商品名称。这时就得到了需要的结果:单笔销售金额超过 50 元的商品就是“书”。
简单小结下使用 HAVING 的查询过程。
两者的区别:
如果需要通过连接从关联表中获取需要的数据,where是先筛选后连接,而having是先连接后筛选
这一点,就决定了在关联查询中,where
比having
更高效。因为where
可以先筛选,用一个筛选后的较小数据集和关联表进行连接,这样占用的资源比较少,执行效率也就比较高。having
则需要先把结果集准备好,也就是用未被筛选的数据集进行关联,然后对这个大的数据集进行筛选,这样占用的资源就比较多,执行效率也较低。
where
可以直接使用表中的字段作为筛选条件,但不能使用分组中的计算函数作为筛选条件;having
必须要与group by
配合使用,可以把分组计算的函数和分组字段作为筛选条件
这决定了,在需要对数据进行分组统计的时候,having
可以完成where
不能完成的任务。这是因为,在查询语法结构中,where
在 group by
之前,所以无法对分组结果进行筛选。having
在group by
之后,可以使用分组字段和分组中的计算函数,对分组的结果集进行筛选,这个功能是where
无法完成的。
假如超市经营者提出,要查询一下是哪个收银员、在哪天卖了 2 单商品。这种必须先分组才能筛选的查询,用 where 语句实现就比较难,我们可能要分好几步,通过把中间结果存储起来,才能搞定。但是用 having,则很轻松,代码如下:
mysql> select a.transdate, c.operatorname
-> from demo.transactionhead as a
-> join demo.transactiondetails as b on (a.transactionid = b.transactionid)
-> join demo.operator as c on (a.operatorid = c.operatorid)
-> group by a.transdate, c.operatorid
-> having count(*) = 2;
+---------------------+--------------+
| transdate | operatorname |
+---------------------+--------------+
| 2020-12-10 00:00:00 | 张静 |
+---------------------+--------------+
where 和 having 各自的优缺点:
优点 | 缺点 | |
---|---|---|
where | 先筛选数据,再关联,执行效率高 | 不能使用分组中俄计算函数进行筛选 |
having | 可以使用分组中的计算函数 | 在最后的结果集中进行筛选,执行效率较低 |
需要注意的是,where 和 having 也不是互相排斥的,我们可以在一个查询里面同时使用where 和 having。
举个例子,假设现在有一组销售数据,包括交易时间、收银员、商品名称、销售数量、价格和销售金额等信息,超市的经营者要查询“2020-12-10”和“2020-12-11”这两天收银金额超过 100 元的销售日期、收银员名称、销售数量和销售金额。
mysql> select a.transdate, c.operatorname, d.goodsname, b.quantity, b.price, b.salesvalue
-> from demo.transactionhead as a
-> join demo.transactiondetails as b on (b.transactionid = a.transactionid)
-> join demo.operator as c on (c.operatorid = a.operatorid)
-> join demo.goodsmaster as d on (d.itemnumber = b.itemnumber);
+---------------------+--------------+-----------+----------+-------+------------+
| transdate | operatorname | goodsname | quantity | price | salesvalue |
+---------------------+--------------+-----------+----------+-------+------------+
| 2020-12-10 00:00:00 | 张静 | 书 | 1.000 | 89.00 | 89.00 |
| 2020-12-10 00:00:00 | 张静 | 笔 | 2.000 | 5.00 | 10.00 |
| 2020-12-11 00:00:00 | 李强 | 书 | 2.000 | 89.00 | 178.00 |
| 2020-12-12 00:00:00 | 李强 | 笔 | 10.000 | 5.00 | 50.00 |
+---------------------+--------------+-----------+----------+-------+------------+
分析需求:由于是要按照销售日期和收银员进行统计,所以,必须按照销售日期和收银员进行分组,因此,我们可以通过使用group by
和having
进行查询:
mysql> select a.transdate, c.operatorname, sum(b.quantity), sum(b.salesvalue)
-> from demo.transactionhead as a
-> join demo.transactiondetails as b on (b.transactionid = a.transactionid)
-> join demo.operator as c on (c.operatorid = a.operatorid)
-> group by a.transdate, c.operatorname
-> having a.transdate in ('2020-12-10' , '2020-12-11') and sum(b.salesvalue) > 100;
+---------------------+--------------+-----------------+-------------------+
| transdate | operatorname | sum(b.quantity) | sum(b.salesvalue) |
+---------------------+--------------+-----------------+-------------------+
| 2020-12-11 00:00:00 | 李强 | 2.000 | 178.00 |
+---------------------+--------------+-----------------+-------------------+
其实having
后面的筛选条件,就会发现,条件 a.transdate IN ('2020-12-10' , '2020-12-11')
,其实可以用where
来限定:
mysql> select a.transdate, c.operatorname, sum(b.quantity), sum(b.salesvalue)
-> from demo.transactionhead as a
-> join demo.transactiondetails as b on (b.transactionid = a.transactionid)
-> join demo.operator as c on (c.operatorid = a.operatorid)
-> where a.transdate in ('2020-12-10' , '2020-12-11') -- 先按日期筛选
-> group by a.transdate, c.operatorname
-> having sum(b.salesvalue) > 100; -- 后按金额筛选
+---------------------+--------------+-----------------+-------------------+
| transdate | operatorname | sum(b.quantity) | sum(b.salesvalue) |
+---------------------+--------------+-----------------+-------------------+
| 2020-12-11 00:00:00 | 李强 | 2.000 | 178.00 |
+---------------------+--------------+-----------------+-------------------+
得到了需要的结果,这是因为把条件拆分开,包含分组统计函数的条件用having
,普通条件用where
。这就既利用了where
条件的高效快速,又发挥了having
可以使用包含分组统计函数的查询条件的优点。当数据量特别大的时候,运行效率会有很大的差别。