您可以到这里下载本篇博文所使用的数据库以及优化工具mysqldumpslow和pt-query-digest:
https://blog.csdn.net/IT_TIfarmer/article/details/92224646
关于explain语句的结果参数详解也可以参考如上博文。
在优化之前,我们首先应该知道什么样的SQL需要我们去优化:
1、查询次数多且每次查询消耗时间长的sql
通常为pt-query-digest分析的前几个查询
2、IO大的sql(IO即指对内存、磁盘中的数据的扫描量,扫描量越大,自然反应时间也就越长,所以IO越小越好)
注意pt-query-digest分析中的Rows examine项
3、未命中索引的sql
注意pt-query-digest分析中Rows examine 和 Rows Send的对比
如何优化?
一、插入优化:如果同时从同一客户端插入大量数据,应该尽量使用多个值的表的INSERT 语句,这种方式将大大减少客户端与数据库服务器之间的连接,关闭等消耗,使得效率比分开执行的单个INSERT语句快(大部分情况下,使用多个值表的INSERT语句能比单个INSERT语句快上好几倍),比如下面一次插入多行:
INSERT INTO test VALUES ('yayun',23),('tom',26),('atlas',32),('david',25).......
二、子查询优化:
select id from t where t.id in (select tid from t1);
效果同下(优化、性能要高很多):
select t.id from t join t1 on t.id = t1.tid;
但这样会出现重复的值,因为它是一行一行去扫的,但我们加个distinct就可以了select distinct
三、or优化:
SELECT * FROM table_1 where name = 'jack' or id = '4';
以上方式中,如果name或id列至少有一列无索引,数据库引擎会放弃使用索引扫描而使用全局扫描,改为以下方式↓
SELECT * FROM table_name1 where name = 'jack'
UNION
SELECT * FROM table_name1 where name = 'tom'
union是指将两个表所查询的数据联合起来,union会去重,union all不会去重。
四、group by优化:
还是使用sakila数据库,我们看看下面这个查询语句:
explain select actor.first_name, actor.last_name, count(*)
from sakila.film_actor
inner join sakila.actor using(actor_id) #ps:using(actor_id)等同于 on (actor.actor_id = actor.actor_id),前者是为了简写后者,前提条件是两个关联字段名字一样,在本例自连接中,关联字段都是actor_id。
group by film_actor.actor_id;
执行结果是酱紫的:
+----+-------------+------------+------------+------+-----------------------------+---------+---------+-----------------------+------+----------+-----------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+------+-----------------------------+---------+---------+-----------------------+------+----------+-----------------+
| 1 | SIMPLE | actor | NULL | ALL | PRIMARY | NULL | NULL | NULL | 200 | 100.00 | Using temporary |
| 1 | SIMPLE | film_actor | NULL | ref | PRIMARY,faid,idx_fk_film_id | PRIMARY | 2 | sakila.actor.actor_id | 27 | 100.00 | Using index |
+----+-------------+------------+------------+------+-----------------------------+---------+---------+-----------------------+------+----------+-----------------+
我们可以看到它还是使用了临时表来存储数据,浪费空间与资源,如何改进呢?答案就是使用子查询并将group by从句写到子查询语句里来进行优化:
explain select actor.first_name, actor.last_name, c.cnt
from sakila.actor inner join(
select actor_id, count(*) as cnt from sakila.film_actor group by actor_id
) as c using(actor_id);
运行结果:
+----+-------------+------------+------------+-------+-----------------------------+-------------+---------+-----------------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+-------+-----------------------------+-------------+---------+-----------------------+------+----------+-------------+
| 1 | PRIMARY | actor | NULL | ALL | PRIMARY | NULL | NULL | NULL | 200 | 100.00 | NULL |
| 1 | PRIMARY | | NULL | ref | | | 2 | sakila.actor.actor_id | 27 | 100.00 | NULL |
| 2 | DERIVED | film_actor | NULL | index | PRIMARY,faid,idx_fk_film_id | PRIMARY | 4 | NULL | 5462 | 100.00 | Using index |
+----+-------------+------------+------------+-------+-----------------------------+-------------+---------+-----------------------+------+----------+-------------+
可以看到,using temporary已经消失了,虽然我们使用了一次全表扫描,但与使用临时表相比,还是非常值得的。
group by的优化,我们还可以使用loose index scan的方法,这需要我们遵守一定的规则,感兴趣的童鞋可以参考:https://www.cnblogs.com/wingsless/p/5040620.html
五、null查询优化:
select id from t where num is not null;
这样的话就算num列有索引,数据库引擎也会放弃索引查找,而改为全局扫描,优化的方法就是用一个默认值来代替原来本来为空的字段,比如我们可以设置为空的字段值为0,然后优化该查询:
select id from t where num is not null;
六、limit优化:
很多情况下,我们的limit语句都是伴随着order by共同使用的:
mysql> explain select film_id, description from sakila.film order by title limit 50, 5;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------+
| 1 | SIMPLE | film | NULL | ALL | NULL | NULL | NULL | NULL | 1000 | 100.00 | Using filesort |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------+
而order by一般都会导致数据库引擎进行额外的表排序操作,占用资源,所以不加order by的话是这样的:
mysql> explain select film_id, description from sakila.film limit 50, 5;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------+
| 1 | SIMPLE | film | NULL | ALL | NULL | NULL | NULL | NULL | 1000 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------+
虽然没有了额外排序,但仍要进行全表扫描,解决的方式就是--用表的主键作为order by的列!:
mysql> explain select film_id, description from sakila.film order by film_id limit 50, 5;
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------+
| 1 | SIMPLE | film | NULL | index | NULL | PRIMARY | 2 | NULL | 55 | 100.00 | NULL |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------+
如上,order by的对象列换成主键之后,就解决了上述问题。
但这样就完美了吗?并不是,当我们limit很大的时候,它就变成了:
mysql> explain select film_id, description from sakila.film order by film_id limit 500, 5;
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------+
| 1 | SIMPLE | film | NULL | index | NULL | PRIMARY | 2 | NULL | 505 | 100.00 | NULL |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------+
注意rows这个属性,这是数据库引擎所要扫描的行数,它是随着我们limit传入的初始扫描行而递增的,如果我们传入的值非常大的话,它就变成了一个IO非常大的语句,要去磁盘扫描很多数据,显然是非常浪费资源、时间的,那么如何改进呢?办法就是加一个where:
mysql> explain select film_id, description from sakila.film where film_id > 600 and film_id < 605 order by film_id limit 4;
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| 1 | SIMPLE | film | NULL | range | PRIMARY | PRIMARY | 2 | NULL | 4 | 100.00 | Using where |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
我们可以看到type已经由index变为了range范围查找,扫描的行数也直接变为了4行,如此便达到了我们的目的。
其它优化建议(笔者在不同前辈的博客中搜集到的):
1、like 查询尽量不要用百分号%开头,这样会使数据库引擎放弃使用该列的索引而变为全表扫描,使用%结尾则不会有这个问题。
2、如果用多个and连接查询,一定要注意把有=“”号的放在最左边,比如:
不会用到b的索引:
where a=1 and c>0 and b=2
会用到b的索引:
where a=1 and b=2 and c>0
应为mysql会一直向右匹配,直到遇到范围查询(>、<、between、like)就停止匹配并放弃使用索引,所以要尽量把“=”条件放在前面,把这些条件放在最后。
3、使用组合索引时,必须要包括第一个列(索引的左前缀列很重要)。
例如:
alter table test add index(a,b,c);
不使用索引的情况:
where b=1, c=2
where b=1
where c=2
使用索引的情况:
where a=1, b=1, c=2
where a=1, b=1
where a=1, c=2
4、外大内小用exists,外小内大用in(千万不要弄反了)
下面我们举个栗子(sakila数据库,内表film_actor表5400行数据,外表actor表200行数据,外小内大):
# Time: 2019-06-15T13:44:15.730229Z
# User@Host: root[root] @ localhost [::1] Id: 22
# Query_time: 0.007953 Lock_time: 0.000121 Rows_sent: 489 Rows_examined: 10924
SET timestamp=1560606255;
select * from film_actor where exists (
select actor_id from sakila.actor where actor.actor_id < 20 and actor.actor_id = film_actor.actor_id
);
# Time: 2019-06-15T13:44:58.065964Z
# User@Host: root[root] @ localhost [::1] Id: 22
# Query_time: 0.001128 Lock_time: 0.000133 Rows_sent: 489 Rows_examined: 508
SET timestamp=1560606298;
select * from film_actor where actor_id in (
select actor_id from sakila.actor where actor.actor_id < 20
);
以上是数据库慢查询日志中的内容可以看出,外表小、内表大的情况下使用in的效率是exists的7倍!而反过来,外大内小的话,用exists比用in效率高得多。关于exists和in的详细区别,感兴趣的童鞋可以参考这篇博文:https://www.cnblogs.com/emilyyoucan/p/7833769.html
5、能使用where就不使用having,having只会在检索出所有记录之后才对结果集进行过滤. 这个处理需要排序、总计等操作. 如果能通过where子句限制记录的数目,那就能减少这方面的开销。
低效:
select * from user group by id having id > 40;
高效:
select * from user where id > 40 group by id;