背景知识
1.索引
这里只罗列出InnoDB支持的索引:
- 主键索引(PRIMARY)。一种特殊的唯一索引,受主键约束,不允许有空值
- 唯一索引(UNIQUE)。索引列允许空值但不允许重复
- 普通索引(INDEX) 。索引列允许空值和重复
- 组合索引。多个列共同组成的索引,创建组合索引的规则是首先会对组合索引的最左第一个字段排序,在第一个字段的排序基础上再对第二个字段进行排序,以此类推。因此,对于区分度越大的字段,越应当放在组合索引的越左侧。使用该索引时收到最左匹配原则的约束,直到遇到范围查询(>、<、between、like)而停止匹配
这些索引的底层都是b+树,一个表有几个索引就有几棵b+树
索引总体划分为两类,主键索引也被称为聚簇索引(clustered index),其余都称呼为非主键索引也被称为二级索引(secondary index)
2.mysql explain说明
explain命令可以获得sql statement的执行计划,其输出的列说明如下:
EXPLAIN Output Columns
Column | JSON Name | Meaning |
---|---|---|
id |
select_id |
The SELECT identifier. id相同的,从上往下顺序执行;id不同的,值越大,优先级越高,越先执行 |
select_type |
None | The SELECT type |
table |
table_name |
The table for the output row |
partitions |
partitions |
The matching partitions |
type |
access_type |
The join type |
possible_keys |
possible_keys |
The possible indexes to choose |
key |
key |
The index actually chosen |
key_len |
key_length |
The length of the chosen key |
ref |
ref |
The columns compared to the index, shows which columns or constants are compared to the index named in the key column to select rows from the table |
rows |
rows |
Estimate of rows to be examined. 预估计的由type字段指明的搜索方式的探测总行数 |
filtered |
filtered |
Percentage of rows filtered by table condition. 表示通过查询条件获取的最终记录行数占通过type字段指明的搜索方式搜索出来的记录总行数的百分比 |
Extra |
None | Additional information |
首先,MySQL使用
type
扫描表,预计会得到rows
条记录
其次,MySql会使用Extra
额外的查询条件对这rows行记录做二次过滤
最终,得到符合查询语句的n条记录,filtered = n / rows
EXPLAIN Join Types
The type
column of EXPLAIN
output describes how tables are joined.
https://dev.mysql.com/doc/refman/5.7/en/explain-output.html#explain-join-types
The following list describes the join types, ordered from the best type to the worst
- system:表只有一行记录(等于系统表),这是const类型的特例
- const:
const
is used when you compare all parts of aPRIMARY KEY
orUNIQUE
index to constant values.
eg:
SELECT * FROM tbl_name WHERE primary_key=1;
- eq_ref:唯一索引的等值查找。对于每个索引键,表中只有一条记录与之匹配。常见于主键 或 唯一索引扫描。对比const,const是直接按主键或唯一键与常量值比较,而eq_ref按主键或唯一键与变量值比较,其实也就是会查多次
eg:
SELECT * FROM ref_table,other_table
WHERE ref_table.key_column=other_table.column;
SELECT * FROM ref_table,other_table
WHERE ref_table.key_column_part1=other_table.column
AND ref_table.key_column_part2=1;
ref:普通索引的等值查找,返回匹配某个index值的所有记录。
ref
is used if the join cannot select a single row based on the key valuerange:范围查找,或者说索引的部分扫描。利用索引来检索【特定范围】的记录,
range
can be used when a indexed column is compared to a constant using any of the=
,<>
,>
,>=
,<
,<=
,IS NULL
,<=>
,BETWEEN
,LIKE
, orIN()
operators
这种索引列上的范围扫描优于全索引扫描,只需要开始于某个点,结束于另一个点,不用扫描全部索引index:扫描整个索引,拿的是索引上的数据。Full Index Scan,index与ALL区别为index类型只遍历索引树。这通常比ALL快,应为索引文件通常比数据文件小。(Index与ALL虽然都是读全表,但index是从索引中读取,而ALL是从硬盘读取)
例如,查询条件是某联合索引的一部分,但又不遵循最左匹配原则时,都可能会采用 index 类型的方式扫描,但它的效率远不如最左匹配原则的查询效率高,index 类型类型的扫描方式是从索引第一个字段一个一个的查找,直到找到符合的某个索引
from mysql8.0 manual:
The
index
join type is the same asALL
, except that only the index tree is scanned. This occurs two ways:
index
与ALL
相同,都是全量扫,但index
只是扫描索引树
索引覆盖。If the index is a covering index for the queries and can be used to satisfy all data required from the table, only the index tree is scanned. In this case, the Extra column says Using index. An index-only scan(索引覆盖而无需回表的scan) usually is faster than ALL because the size of the index usually is smaller than the table data.
注意,和ALL
相比,它们都取得了全表的数据,但如果不是索引覆盖的情况,则index
要先读索引再回表随机取数据,这时index
就不会比ALL
快按索引顺序查找数据行来执行全表扫描。A full table scan is performed using reads from the index to look up data rows in index order. Uses index does not appear in the Extra column.
MySQL can use this join type when the query uses only columns that are part of a single index.
- ALL:全表扫描,拿的是一整个表的全部数据。Full Table Scan,遍历全表以找到匹配的行。A full table scan (also known as a sequential scan) is a scan made on a database where each row of the table is read in a sequential (serial) order and the columns encountered are checked for the validity of a condition.[1] Full table scans [2] are usually the slowest method of scanning a table due to the heavy amount of I/O reads required from the disk which consists of multiple seeks as well as costly disk to memory transfers. [from wiki]。虽然Full Table Scan遍历了全表,但是它利用主键进行了顺序IO,因此有时候全表扫描的速度会比【大量回表】更快,因为回表将产生随机IO
EXPLAIN extra
- Using index. 表示索引覆盖,使用索引来直接获取目标列的数据,而不需回表
- Using where. A WHERE clause is used to restrict which rows to match against the next table or send to the client.表示在server层对表记录进行进一步过滤
- Using index condition. 表示索引下推。Tables are read by accessing index tuples and testing them first to determine whether to read full table rows. In this way, index information is used to defer (“push down”) reading full table rows unless it is necessary. See “Index Condition Pushdown Optimization”.
案例
表说明:
三个表
- host. 50w record,主键为host.id,另外存在一个只有2个值的二级索引IDX_host_is_deleted
- resource_pool. 100 record,主键resource_pool.id
- biz_module. 3w record,主键biz_module.id
目的:
查询每个resource_pool及其内部的host数量 SELECT resource_pool.*, count(host.id) count
原sql
查询耗时7s
SELECT resource_pool.*, count(host.id) count
FROM `resource_pool`
Inner JOIN biz_module
ON resource_pool.module = biz_module.path
Left JOIN host
ON host.module_id = biz_module.id and host.is_deleted = 0
// 因为host表存在IDX_host_is_deleted这个二值索引(二值所以区分度极低),而这里通过explain又会走IDX_host_is_deleted这个二值索引
// 那么就会产生大量的回表,也就是产生大量的随机io,效率极低。远不如直接扫主键的全表扫描快(因为主键是聚簇索引,扫主键是顺序io)
GROUP BY resource_pool.id
ORDER BY resource_pool.name
LIMIT 20
执行计划
id | table | type | key | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|
1 | resource_pool | index | PRIMARY | --- | 11 | 100 | Using temporary; Using filesort |
1 | biz_module | ref | IDX_biz_module_path | om2.resource_pool.module | 1 | 100 | Using where; Using index |
1 | host | ref | IDX_host_is_deleted | const | 191385 | 100 | Using where |
优化:
通过 +0 把在is_deleted上的所有索引在本次查询中置为无效
另外,+0后还发现与host的left join使用了hash join的方式,进一步提升了查询效率
SELECT resource_pool.*, count(host.id) count
FROM `resource_pool`
Inner JOIN biz_module
ON resource_pool.module = biz_module.path
Left JOIN host
ON host.module_id = biz_module.id and host.is_deleted +0 = 0
GROUP BY resource_pool.id
ORDER BY resource_pool.name
LIMIT 20
or
SELECT resource_pool.*, count(host.id) count
FROM `resource_pool`
Inner JOIN biz_module
ON resource_pool.module = biz_module.path
Left JOIN host
ignore index(IDX_host_is_deleted) // 忽略IDX_host_is_deleted这个索引
ON host.module_id = biz_module.id and host.is_deleted = 0
GROUP BY resource_pool.id
ORDER BY resource_pool.name
LIMIT 20
执行计划
id | table | type | key | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|
1 | resource_pool | ALL | 11 | 100 | Using temporary; Using filesort | ||
1 | biz_module | ref | IDX_biz_module_path | om2.resource_pool.module | 1 | 100 | Using where; Using index |
1 | host | ALL | 382770 | 100 | Using where; Using join buffer (hash join) |
或把count放到server层统计:
SELECT resource_pool.*, count(case when host.is_deleted = 0 then 1 end) count
FROM `resource_pool`
Inner JOIN biz_module
ON resource_pool.module = biz_module.path
Left JOIN host
ON host.module_id = biz_module.id
GROUP BY resource_pool.id
ORDER BY resource_pool.name
LIMIT 20
这两种写法的执行计划是相同的
原sql Left JOIN的条件下放探究
to be continue
SELECT resource_pool.*, count(host.id) count
FROM `resource_pool`
Inner JOIN biz_module
ON resource_pool.module = biz_module.path
Left JOIN host
ON host.module_id = biz_module.id
WHERE host.is_deleted = 0
GROUP BY resource_pool.id
ORDER BY resource_pool.name
LIMIT 20