通过50w条数据的表来进行调优实战,MySQL使用阿里云丐版服务器,性能较慢,获取表数据请私信我
利用索引进行全值匹配效率更高
EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE age=30;
EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE age=30 AND classId=4;
EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE age=30 AND classId=4 AND NAME = 'abcd';
无索引情况下:
都是全表扫描,耗时达到一秒,条件越多耗时越长
建立索引:
CREATE INDEX idx_age ON student(age);
CREATE INDEX idx_age_classid ON student(age,classId);
CREATE INDEX idx_age_classid_name ON student(age,classId,NAME);
可以看到消耗时间有所降低,简单测试
使用上述创建的3个索引情况解析,最左前缀法则,只能依次向后使用索引字段,一旦跳过了,后边的字段就无法使用索引
# 可以使用age索引
EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE student.age=30 AND student.name = 'abcd' ;
# 只能全表扫描,因为没有以classid开头的索引
EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE student.classid=1 AND student.name = 'abcd';
# 可以使用三个字段的索引,和SQL编写顺序无关
EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE classid=4 AND student.age=30 AND student.name = 'abcd';
如果不按照依次递增顺序插入数据,就可能导致页分裂
在WHERE字段进行这些操作都会导致全表扫描,因为需要对每行数据都进行这些操作之后才能进行判断是否符合条件
# 创建name字段索引
CREATE INDEX idx_name ON student(NAME);
# 可以使用索引,索引的范围查询,扫描索引的一部分
EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE student.name LIKE 'abc%';
# WHERE字段使用函数,导致索引失效
EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE LEFT(student.name,3) = 'abc';
# WHERE字段运算导致索引失效
EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE age + 1 = 30;
# 类型转换导致索引失效,因为Name是字符串,转换成了数字
EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE NAME = 123;
总结:避免在WHERE后进行运算,函数,类型转换操作
当字段使用> < <> between范围查询时,索引右边字段就用不上索引了,将>,< 改成<=,>=后可以用上后续索引
# 创建三字段联合索引
CREATE INDEX idx_age_classId_name ON student(age,classId,NAME);
# 中间字段使用>范围查询,导致只有前两个字段使用到了索引(通过key_len长度来发现的)
EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE student.age=30 AND student.classId>20 AND student.name = 'abc' ;
两种情况都会使索引失效
CREATE INDEX idx_name ON student(NAME);
# ALL
EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE student.name <> 'abc' ;
# ALL
EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE student.name != 'abc' ;
# ref
EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE age IS NULL;
# ALL
EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE age IS NOT NULL;
# range
EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE NAME LIKE 'ab%';
# ALL
EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE NAME LIKE '%ab%';
索引列用索引,非索引列全表扫描,加起来还不如直接全表扫描,所以索引失效
# 对age做索引
CREATE INDEX idx_age ON student(age);
# OR后的classid非索引,导致age也无法使用索引
EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE age = 10 OR classid = 100;
# 创建classid索引后,发现上边sql可以使用了索引
CREATE INDEX idx_cid ON student(classid);
因为不同意就涉及了类型转换,导致索引失效
解释外连接和内连接索引优化
# 添加被连接表的索引
CREATE INDEX Y ON book(card);
# 结果连接表ALL,被链接表ref
EXPLAIN SELECT SQL_NO_CACHE * FROM `type` LEFT JOIN book ON type.card = book.card;
# 两张表都上索引
CREATE INDEX X ON `type`(card);
# 结果连接表index,被链接表ref
EXPLAIN SELECT SQL_NO_CACHE * FROM `type` LEFT JOIN book ON type.card = book.card;
# 查询优化器决定谁来做驱动表,谁做被驱动表
EXPLAIN SELECT SQL_NO_CACHE * FROM `type` INNER JOIN book ON type.card = book.card;
结论:
JOIN方式连接多个表,本质就是各个表之间数据的循环匹配,MySQL5.5之前,MySQL只支持一种表间关联方式,就是嵌套循环,如果关联表的数据量很大,则JOIN关联的执行时间会非常长,5.5之后,MySQL引入BNLJ散算法来优化
驱动表:在Explain时上边的表
被驱动表:在Explain时下边的表
简单循环嵌套:
MySQL4.1之后支持子查询,可以进行SELECT语句的嵌套查询,即一个SELECT查询结果作为另一个SELECT语句的条件
子查询效率不高的原因:
MySQL中,支持两种排序方式,分别是FileSort,Index排序
优化建议:
# 创建联合索引
CREATE INDEX idx_age_classid_name ON student (age,classid,NAME);
# 对所有数据进行排序时,索引失效(主要是因为需要回表)
EXPLAIN SELECT SQL_NO_CACHE * FROM student ORDER BY age,classid;
# 这时候就可以用上索引了(覆盖索引,无须回表)
EXPLAIN SELECT SQL_NO_CACHE age,classid,name,id FROM student ORDER BY age,classid;
# 数据量较少,即使回表也比FileSort快,使用了索引
EXPLAIN SELECT SQL_NO_CACHE * FROM student ORDER BY age,classid LIMIT 10;
主要看最左前缀法则
#以下哪些索引失效?
# 不用
EXPLAIN SELECT * FROM student ORDER BY classid LIMIT 10;
# 不用
EXPLAIN SELECT * FROM student ORDER BY classid,NAME LIMIT 10;
# 用
EXPLAIN SELECT * FROM student ORDER BY age,classid,stuno LIMIT 10;
# 用
EXPLAIN SELECT * FROM student ORDER BY age,classid LIMIT 10;
# 用
EXPLAIN SELECT * FROM student ORDER BY age LIMIT 10;
看排序方向
# 不用,age降序了
EXPLAIN SELECT * FROM student ORDER BY age DESC, classid ASC LIMIT 10;
# 不用,没给到age
EXPLAIN SELECT * FROM student ORDER BY classid DESC, NAME DESC LIMIT 10;
# 不用,看来有降序就不行了
EXPLAIN SELECT * FROM student ORDER BY age ASC,classid DESC LIMIT 10;
# 都是降序,就可以用索引了,因为可以反着遍历叶子节点
EXPLAIN SELECT * FROM student ORDER BY age DESC, classid DESC LIMIT 10;
# 只用了age
EXPLAIN SELECT * FROM student WHERE age=45 ORDER BY classid;
# 只用了age
EXPLAIN SELECT * FROM student WHERE age=45 ORDER BY classid,NAME;
# 不用
EXPLAIN SELECT * FROM student WHERE classid=45 ORDER BY age;
# 会用,先用age排序,再找classid=45
EXPLAIN SELECT * FROM student WHERE classid=45 ORDER BY age LIMIT 10;
# 默认情况使用FileSort 1.417s
EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE age = 30 AND stuno <101000 ORDER BY NAME ;
# 构建索引防止FileSort 1.118s
CREATE INDEX idx_age_name ON student(age,NAME);
EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE age = 30 AND stuno <101000 ORDER BY NAME ;
# 建立WHERE查询条件的索引
CREATE INDEX idx_age_stuno_name ON student(age,stuno,NAME);
# 因为这个索引效率很高了,索引Name字段进行FileSort就可以,用不用Name字段取决于过滤来的数据量大小
EXPLAIN SELECT SQL_NO_CACHE * FROM student WHERE age = 30 AND stuno <101000 ORDER BY NAME ;
这里总结:
当出现范围条件和group by字段二选一时,优先观察条件字段的过滤数量,如果过滤的很多,需要排序的并不多时,优先把索引放在范围字段上
排序字段若不在索引列,则filesort会有两种算法:双路排序和单路排序
双路排序:
两次扫描磁盘,最终获得数据,读取行指针和order by列,对他们进行排序,根据排序结果再去读取对应的行数据
从磁盘中取字段,在buffer中排序,再从磁盘取其他字段
,这就会出现随机IO
单路排序:
直接读取所有列,再buffer中进行排序