mysql版本
5.7
参考
本文转自手把手教你认识OPTIMIZER_TRACE,综合博主自身理解,有所修改及重新排版,转载请注明上述链接
mysql官方实例
1. 背景
众所周知,当遇到慢查询sql的时候,需要使用explain
关键字来查看该sql的执行计划,从中找出优化点,但是,有时候,光靠explain
是不足以得出全面的优化结论的,比如说,执行以下语句:
explain select * from product0 as t1 where t1.name = '产品名称';
结果:
不免会有这样的疑问:
- 已经为
name
字段创建了索引,可是为什么查询语句没有使用索引,而是走了全表扫描呢? - 如果存在多个
possible keys
,那么mysql又是如何抉择出合适的key的呢?
这一切,OPTIMIZER_TRACE
都将会为我们作出解答。
2. 定义
OPTIMIZER_TRACE
,见名知义,优化器跟踪,也就是跟踪语句的解析优化过程,并将跟踪到的信息记录到INFORMATION_SCHEMA.OPTIMIZER_TRACE
表中,该功能默认是不开启的,并且默认只记录最后一次结果到INFORMATION_SCHEMA.OPTIMIZER_TRACE
表中。
执行以下语句查看相关变量:
show variables like '%optimizer_trace%';
结果:
可以看到相关变量如下:
-
optimizer_trace
-
enabled
:启用/禁用optimizer_trace
功能 -
one_line
:执行结果中trace
字段的展示方式,on
代表单行展示,off
代表以json
格式展示
-
-
optimizer_trace_features
:控制执行结果中trace
字段需要打印的json
项和不需要打印的json
项,一般全部开启 -
optimizer_trace_limit
和optimizer_trace_offset
:这俩个参数类似limit offset,row_count
,默认值分别是1
和-1
,代表在INFORMATION_SCHEMA.OPTIMIZER_TRACE
表中只存储最近最后执行的一行结果数据 -
optimizer_trace_max_mem_size
:optimizer_trace
内存大小,如果跟踪信息超过这个大小,信息会被截断
3. 适用语句
SELECT/INSERT/REPLACE/UPDATE/DELETE
EXPLAIN
SET
DO
DECLARE/CASE/IF/RETURN
CALL
4. Demo
4.1 数据准备
CREATE TABLE t1 (
pk INT, col_int_key INT,
col_varchar_key VARCHAR(1), col_varchar_nokey VARCHAR(1)
);
INSERT INTO t1 VALUES
(10,7,'v','v'),(11,0,'s','s'),(12,9,'l','l'),(13,3,'y','y'),(14,4,'c','c'),
(15,2,'i','i'),(16,5,'h','h'),(17,3,'q','q'),(18,1,'a','a'),(19,3,'v','v'),
(20,6,'u','u'),(21,7,'s','s'),(22,5,'y','y'),(23,1,'z','z'),(24,204,'h','h'),
(25,224,'p','p'),(26,9,'e','e'),(27,5,'i','i'),(28,0,'y','y'),(29,3,'w','w');
CREATE TABLE t2 (
pk INT, col_int_key INT,
col_varchar_key VARCHAR(1), col_varchar_nokey VARCHAR(1),
PRIMARY KEY (pk)
);
INSERT INTO t2 VALUES
(1,4,'b','b'),(2,8,'y','y'),(3,0,'p','p'),(4,0,'f','f'),(5,0,'p','p'),
(6,7,'d','d'),(7,7,'f','f'),(8,5,'j','j'),(9,3,'e','e'),(10,188,'u','u'),
(11,4,'v','v'),(12,9,'u','u'),(13,6,'i','i'),(14,1,'x','x'),(15,5,'l','l'),
(16,6,'q','q'),(17,2,'n','n'),(18,4,'r','r'),(19,231,'c','c'),(20,4,'h','h'),
(21,3,'k','k'),(22,3,'t','t'),(23,7,'t','t'),(24,6,'k','k'),(25,7,'g','g'),
(26,9,'z','z'),(27,4,'n','n'),(28,4,'j','j'),(29,2,'l','l'),(30,1,'d','d'),
(31,2,'t','t'),(32,194,'y','y'),(33,2,'i','i'),(34,3,'j','j'),(35,8,'r','r'),
(36,4,'b','b'),(37,9,'o','o'),(38,4,'k','k'),(39,5,'a','a'),(40,5,'f','f'),
(41,9,'t','t'),(42,3,'c','c'),(43,8,'c','c'),(44,0,'r','r'),(45,98,'k','k'),
(46,3,'l','l'),(47,1,'o','o'),(48,0,'t','t'),(49,189,'v','v'),(50,8,'x','x'),
(51,3,'j','j'),(52,3,'x','x'),(53,9,'k','k'),(54,6,'o','o'),(55,8,'z','z'),
(56,3,'n','n'),(57,9,'c','c'),(58,5,'d','d'),(59,9,'s','s'),(60,2,'j','j'),
(61,2,'w','w'),(62,5,'f','f'),(63,8,'p','p'),(64,6,'o','o'),(65,9,'f','f'),
(66,0,'x','x'),(67,3,'q','q'),(68,6,'g','g'),(69,5,'x','x'),(70,8,'p','p'),
(71,2,'q','q'),(72,120,'q','q'),(73,25,'v','v'),(74,1,'g','g'),(75,3,'l','l'),
(76,1,'w','w'),(77,3,'h','h'),(78,153,'c','c'),(79,5,'o','o'),(80,9,'o','o'),
(81,1,'v','v'),(82,8,'y','y'),(83,7,'d','d'),(84,6,'p','p'),(85,2,'z','z'),
(86,4,'t','t'),(87,7,'b','b'),(88,3,'y','y'),(89,8,'k','k'),(90,4,'c','c'),
(91,6,'z','z'),(92,1,'t','t'),(93,7,'o','o'),(94,1,'u','u'),(95,0,'t','t'),
(96,2,'k','k'),(97,7,'u','u'),(98,2,'b','b'),(99,1,'m','m'),(100,5,'o','o');
4.2 执行
set optimizer_trace = 'enabled=on';
SELECT SUM(alias2.col_varchar_nokey) , alias2.pk AS field2 FROM t1 AS alias1
STRAIGHT_JOIN t2 AS alias2 ON alias2.pk = alias1.col_int_key WHERE alias1.pk
GROUP BY field2 ORDER BY alias1.col_int_key,alias2.pk ;
select * from information_schema.OPTIMIZER_TRACE;
set optimizer_trace = 'enabled=off';
执行结果:
5. 解析
各个字段解析如下:
-
QUERY
:跟踪的SQL
语句 -
TRACE
:跟踪信息(JSON
格式) -
MISSING_BYTES_BEYOND_MAX_MEM_SIZE
:跟踪信息过长时,被截断的跟踪信息的字节数 -
INSUFFICIENT_PRIVILEGES
:执行跟踪语句的用户是否有查看对象的权限。当不具有权限时,该列信息为1
且TRACE
字段为空。一般出现在调用带有SQL SECURITY DEFINER
的视图或者是存储过程的情况下
其中,TRACE
详细内容如下:
{
"steps" : [
{
"join_preparation" : {
"select#" : 1,
"steps" : [
{
"expanded_query" : "/* select#1 */ select sum(`t2`.`col_varchar_nokey`) AS `SUM(t2.col_varchar_nokey)`,`t2`.`pk` AS `field2` from (`t2` straight_join `t1` on((`t2`.`pk` = `t1`.`col_int_key`))) where (`t2`.`pk` between 10 and 100) group by `field2` order by `t1`.`col_int_key`,`t2`.`pk` limit 0,1000"
},
{
"transformations_to_nested_joins" : {
"expanded_query" : "/* select#1 */ select sum(`t2`.`col_varchar_nokey`) AS `SUM(t2.col_varchar_nokey)`,`t2`.`pk` AS `field2` from `t2` straight_join `t1` where ((`t2`.`pk` between 10 and 100) and (`t2`.`pk` = `t1`.`col_int_key`)) group by `field2` order by `t1`.`col_int_key`,`t2`.`pk` limit 0,1000",
"transformations" : [
"JOIN_condition_to_WHERE",
"parenthesis_removal"
]
}
},
{
"functional_dependencies_of_GROUP_columns" : {
"all_columns_of_table_map_bits" : [
0
],
"columns" : [
"training.t2.pk",
"training.t1.col_int_key"
]
}
}
]
}
},
{
"join_optimization" : {
"select#" : 1,
"steps" : [
{
"condition_processing" : {
"condition" : "WHERE",
"original_condition" : "((`t2`.`pk` between 10 and 100) and (`t2`.`pk` = `t1`.`col_int_key`))",
"steps" : [
{
"resulting_condition" : "((`t2`.`pk` between 10 and 100) and multiple equal(`t2`.`pk`, `t1`.`col_int_key`))",
"transformation" : "equality_propagation"
},
{
"resulting_condition" : "((`t2`.`pk` between 10 and 100) and multiple equal(`t2`.`pk`, `t1`.`col_int_key`))",
"transformation" : "constant_propagation"
},
{
"resulting_condition" : "((`t2`.`pk` between 10 and 100) and multiple equal(`t2`.`pk`, `t1`.`col_int_key`))",
"transformation" : "trivial_condition_removal"
}
]
}
},
{
"substitute_generated_columns" : { }
},
{
"table_dependencies" : [
{
"depends_on_map_bits" : [ ],
"map_bit" : 0,
"row_may_be_null" : false,
"table" : "`t2`"
},
{
"depends_on_map_bits" : [
0
],
"map_bit" : 1,
"row_may_be_null" : false,
"table" : "`t1`"
}
]
},
{
"ref_optimizer_key_uses" : [
{
"equals" : "`t1`.`col_int_key`",
"field" : "pk",
"null_rejecting" : true,
"table" : "`t2`"
}
]
},
{
"rows_estimation" : [
{
"range_analysis" : {
"analyzing_range_alternatives" : {
"analyzing_roworder_intersect" : {
"cause" : "too_few_roworder_scans",
"usable" : false
},
"range_scan_alternatives" : [
{
"chosen" : true,
"cost" : 19.291000,
"index" : "PRIMARY",
"index_dives_for_eq_ranges" : true,
"index_only" : false,
"ranges" : [
"10 <= pk <= 100"
],
"rowid_ordered" : true,
"rows" : 91,
"using_mrr" : false
}
]
},
"chosen_range_access_summary" : {
"chosen" : true,
"cost_for_plan" : 19.291000,
"range_access_plan" : {
"index" : "PRIMARY",
"ranges" : [
"10 <= pk <= 100"
],
"rows" : 91,
"type" : "range_scan"
},
"rows_for_plan" : 91
},
"group_index_range" : {
"cause" : "not_single_table",
"chosen" : false
},
"potential_range_indexes" : [
{
"index" : "PRIMARY",
"key_parts" : [
"pk"
],
"usable" : true
}
],
"setup_range_conditions" : [ ],
"table_scan" : {
"cost" : 23.100000,
"rows" : 100
}
},
"table" : "`t2`"
},
{
"table" : "`t1`",
"table_scan" : {
"cost" : 1,
"rows" : 20
}
}
]
},
{
"considered_execution_plans" : [
{
"best_access_path" : {
"considered_access_paths" : [
{
"access_type" : "ref",
"chosen" : false,
"index" : "PRIMARY",
"usable" : false
},
{
"access_type" : "range",
"chosen" : true,
"cost" : 37.491000,
"range_details" : {
"used_index" : "PRIMARY"
},
"resulting_rows" : 91,
"rows_to_scan" : 91
}
]
},
"condition_filtering_pct" : 100,
"cost_for_plan" : 37.491000,
"plan_prefix" : [ ],
"rest_of_plan" : [
{
"best_access_path" : {
"considered_access_paths" : [
{
"access_type" : "scan",
"buffers_needed" : 1,
"chosen" : true,
"cost" : 365,
"resulting_rows" : 20,
"rows_to_scan" : 20,
"using_join_cache" : true
}
]
},
"chosen" : true,
"condition_filtering_pct" : 10,
"cost_for_plan" : 402.490000,
"plan_prefix" : [
"`t2`"
],
"rows_for_plan" : 182,
"table" : "`t1`"
}
],
"rows_for_plan" : 91,
"table" : "`t2`"
}
]
},
{
"attaching_conditions_to_tables" : {
"attached_conditions_computation" : [ ],
"attached_conditions_summary" : [
{
"attached" : "(`t2`.`pk` between 10 and 100)",
"table" : "`t2`"
},
{
"attached" : "(`t1`.`col_int_key` = `t2`.`pk`)",
"table" : "`t1`"
}
],
"original_condition" : "((`t1`.`col_int_key` = `t2`.`pk`) and (`t2`.`pk` between 10 and 100))"
}
},
{
"clause_processing" : {
"clause" : "ORDER BY",
"items" : [
{
"item" : "`t1`.`col_int_key`"
},
{
"item" : "`t2`.`pk`"
}
],
"original_clause" : "`t1`.`col_int_key`,`t2`.`pk`",
"resulting_clause" : "`t1`.`col_int_key`,`t2`.`pk`",
"resulting_clause_is_simple" : false
}
},
{
"clause_processing" : {
"clause" : "GROUP BY",
"items" : [
{
"item" : "`t2`.`pk`"
}
],
"original_clause" : "`field2`",
"resulting_clause" : "`field2`",
"resulting_clause_is_simple" : true
}
},
{
"refine_plan" : [
{
"table" : "`t2`"
},
{
"table" : "`t1`"
}
]
}
]
}
},
{
"join_execution" : {
"select#" : 1,
"steps" : [
{
"creating_tmp_table" : {
"tmp_table_info" : {
"key_length" : 4,
"location" : "memory (heap)",
"row_length" : 18,
"row_limit_estimate" : 932067,
"table" : "intermediate_tmp_table",
"unique_constraint" : false
}
}
},
{
"filesort_execution" : [ ],
"filesort_information" : [
{
"direction" : "asc",
"field" : "col_int_key",
"table" : "intermediate_tmp_table"
},
{
"direction" : "asc",
"field" : "pk",
"table" : "intermediate_tmp_table"
}
],
"filesort_priority_queue_optimization" : {
"cause" : "quicksort_is_cheaper",
"chosen" : false,
"limit" : 1000,
"memory_available" : 262144,
"row_size" : 17,
"rows_estimate" : 10
},
"filesort_summary" : {
"examined_rows" : 0,
"number_of_tmp_files" : 0,
"rows" : 0,
"sort_buffer_size" : 376,
"sort_mode" : ""
}
}
]
}
}
]
}
对上述json串进行格式化,可以明显看到以下三个子节点:
-
join_preparation
:准备阶段 -
join_optimization
:优化阶段 -
join_execution
:执行阶段
5.1 join_preparation
准备阶段
展开join_preparation
,可以看到主结构主要有俩个:
-
select#
:代表跟踪分析的是第几条语句,如果使用union
,那么这里就会有俩条 -
steps
:对应语句的执行步骤
5.1.1 expanded_query
{
"expanded_query":"/* select#1 */ select sum(`t2`.`col_varchar_nokey`) AS `SUM(t2.col_varchar_nokey)`,`t2`.`pk` AS `field2` from (`t2` straight_join `t1` on((`t2`.`pk` = `t1`.`col_int_key`))) where (`t2`.`pk` between 10 and 100) group by `field2` order by `t1`.`col_int_key`,`t2`.`pk` limit 0,1000"
}
可以看到,steps
中的expanded_query
内容是原本的select *
语句转化为具体库名、字段名、列名等后的结果
limit
语句应该是数据库可视化软件默认加上的,这里不去追究
5.1.2 transformations_to_nested_joins
{
"transformations_to_nested_joins":{
"expanded_query":"/* select#1 */ select sum(`t2`.`col_varchar_nokey`) AS `SUM(t2.col_varchar_nokey)`,`t2`.`pk` AS `field2` from `t2` straight_join `t1` where ((`t2`.`pk` between 10 and 100) and (`t2`.`pk` = `t1`.`col_int_key`)) group by `field2` order by `t1`.`col_int_key`,`t2`.`pk` limit 0,1000",
"transformations":[
"JOIN_condition_to_WHERE",
"parenthesis_removal"
]
}
}
从名字和expanded_query
内容来看,是将on
子句转化为where
子句
5.2 join_optimization
优化阶段
优化内容,相对内容多且复杂,主要分为以下几个部分:
-
condition_processing
:对where
子句和having
子句的处理 -
substitute_generated_columns
:替换虚拟生成列 -
table_dependencies
:梳理表之间的依赖关系 -
ref_optimizer_key_uses
: 如果优化器认为查询可以使用ref
的话,在这里列出可以使用的索引 -
rows_estimation
:估算表行数和扫描的代价。如果查询中存在range
扫描的话,对range
扫描进行计划分析及代价估算 -
considered_execution_plans
:对比各可行计划的代价,选择相对最优的执行计划 -
attaching_conditions_to_tables
:添加附加条件,使得条件尽可能筛选单表数据 -
clause_processing
:对DISTINCT
、GROUP BY
、ORDER BY
等语句进行优化 -
refine_plan
:优化后的执行计划
5.2.1 condition_processing
{
"condition_processing":{
"condition":"WHERE",
"original_condition":"((`t2`.`pk` between 10 and 100) and (`t2`.`pk` = `t1`.`col_int_key`))",
"steps":[
{
"resulting_condition":"((`t2`.`pk` between 10 and 100) and multiple equal(`t2`.`pk`, `t1`.`col_int_key`))",
"transformation":"equality_propagation"
},
{
"resulting_condition":"((`t2`.`pk` between 10 and 100) and multiple equal(`t2`.`pk`, `t1`.`col_int_key`))",
"transformation":"constant_propagation"
},
{
"resulting_condition":"((`t2`.`pk` between 10 and 100) and multiple equal(`t2`.`pk`, `t1`.`col_int_key`))",
"transformation":"trivial_condition_removal"
}
]
}
}
对条件语句where
和having
的处理:
-
condition
:条件语句类型,可以是where
或者having
-
original_condition
:原始条件查询语句 -
steps
:处理步骤-
resulting_condition
:转换之后的语句 -
transformation
:转换类型-
equality_propagation
:等值条件句转换 -
constant_propagation
:常量条件句转换 -
trivial_condition_removal
:无效条件移除的转换
-
-
demo中的where
字句只命中了equality_propagation
一种情况(虽然也有constant_propagation
和trivial_condition_removal
的部分,但是可以看到resulting_condition
并没有发生变化,也就是相当于只命中了equality_propagation
一种情况),可以通过以下语句命中全部三种情况:
select * from t1 join t2 on t1.pk=t2.pk+1 where t2.pk = 5 and 1 =1 ;
{
"condition_processing":{
"condition":"WHERE",
"original_condition":"((`t2`.`pk` = 5) and (1 = 1) and (`t1`.`pk` = (`t2`.`pk` + 1)))",
"steps":[
{
"resulting_condition":"((1 = 1) and (`t1`.`pk` = (5 + 1)) and multiple equal(5, `t2`.`pk`))",
"transformation":"equality_propagation"
},
{
"resulting_condition":"((1 = 1) and (`t1`.`pk` = 6) and multiple equal(5, `t2`.`pk`))",
"transformation":"constant_propagation"
},
{
"resulting_condition":"((`t1`.`pk` = 6) and multiple equal(5, `t2`.`pk`))",
"transformation":"trivial_condition_removal"
}
]
}
}
分析如下:
-
equality_propagation
:将原始语句中的t2.pk = 5
转化为multiple equal(5, t2.pk)
;之后t1.pk = (t2.pk + 1)
结合前面的t2.pk = 5
将t2.pk
做了替换,转化成了t1.pk = (5 + 1)
,其余保持不变 -
constant_propagation
:将上一步中的结果做常量计算,t1.pk = (5 + 1)
变成t1.pk = 6
-
trivial_condition_removal
:移除1 = 1这个恒为true的无效条件
5.2.2 table_dependencies
{
"table_dependencies":[
{
"depends_on_map_bits":[
],
"map_bit":0,
"row_may_be_null":false,
"table":"`t2`"
},
{
"depends_on_map_bits":[
0
],
"map_bit":1,
"row_may_be_null":false,
"table":"`t1`"
}
]
}
在这个结点中我们可以看到表之间的依赖关系:
-
table
:表名及其别名 -
row_may_be_null
:join
之后的列是否可能为null
,注意,不是表中列属性是否设置为null
,比如说左连接,那么后一张表中有些列可能为null
,此时row_may_be_null
显示为true
-
map_bit
:表序号 -
depends_on_map_bits
:依赖的表的map_bit
5.2.3 ref_optimizer_key_uses
{
"ref_optimizer_key_uses":[
{
"equals":"`t1`.`col_int_key`",
"field":"pk",
"null_rejecting":true,
"table":"`t2`"
}
]
}
该节点列出了所有可用的ref
类型的索引,如果使用了组合索引的多个部分,则会列出多个结构体。单个结构体会列出单表ref
使用的索引及其对应值。
5.2.4 rows_estimation
{
"rows_estimation":[
{
"range_analysis":{
"analyzing_range_alternatives":{
"analyzing_roworder_intersect":{
"cause":"too_few_roworder_scans",
"usable":false
},
"range_scan_alternatives":[
{
"chosen":true,
"cost":19.291,
"index":"PRIMARY",
"index_dives_for_eq_ranges":true,
"index_only":false,
"ranges":[
"10 <= pk <= 100"
],
"rowid_ordered":true,
"rows":91,
"using_mrr":false
}
]
},
"chosen_range_access_summary":{
"chosen":true,
"cost_for_plan":19.291,
"range_access_plan":{
"index":"PRIMARY",
"ranges":[
"10 <= pk <= 100"
],
"rows":91,
"type":"range_scan"
},
"rows_for_plan":91
},
"group_index_range":{
"cause":"not_single_table",
"chosen":false
},
"potential_range_indexes":[
{
"index":"PRIMARY",
"key_parts":[
"pk"
],
"usable":true
}
],
"setup_range_conditions":[
],
"table_scan":{
"cost":23.1,
"rows":100
}
},
"table":"`t2`"
},
{
"table":"`t1`",
"table_scan":{
"cost":1,
"rows":20
}
}
]
}
rows_estimation
展示估算的表扫描行数及其代价,从结构体中可以看到,是以表为单位展示对应结果的
首先看一下t1
,t1
表由于没有索引,所以走的是全表扫描,结构体也比较简单,只有俩个结点:
{
"table":"`t1`",
"table_scan":{
"cost":1,
"rows":20
}
}
-
table
:表名及其别名 -
table_scan
-
rows
:扫描行数 -
cost
:代价
-
接下来看一下t2,t2相对复杂很多,主要看一下range_analysis
,分为以下几个部分:
analyzing_range_alternatives
chosen_range_access_summary
group_index_range
potential_range_indexes
setup_range_conditions
table_scan
5.2.4.1 analyzing_range_alternatives
{
"analyzing_roworder_intersect":{
"cause":"too_few_roworder_scans",
"usable":false
},
"range_scan_alternatives":[
{
"chosen":true,
"cost":19.291,
"index":"PRIMARY",
"index_dives_for_eq_ranges":true,
"index_only":false,
"ranges":[
"10 <= pk <= 100"
],
"rowid_ordered":true,
"rows":91,
"using_mrr":false
}
]
}
分析可选方案的代价,包括
-
analyzing_roworder_intersect
:index merge
分析 -
range_scan_alternatives
:range
扫描分析
俩个阶段,分别针对不同的情况进行执行代价的分析,从中选择出更优的执行计划。
5.2.4.1.1 analyzing_roworder_intersect
由于示例没有使用index merge
,所以在这一段仅仅给出了不使用index merge
的原因。如果是语句可以使用index_merge
的情况,在该阶段会分析使用index_merge
过程中消耗的代价(index_scan_cost
、disk_sweep_cost
等),并汇总merge
的代价确认是否选择使用index_merge
以及对应使用的索引。
5.2.4.1.2 range_scan_alternatives
range
扫描分析针对所有可用于range
扫描的索引进行了代价分析,并根据分析结果确认选择使用的索引,包含如下字段:
-
choosen
:是否使用该索引 -
cost
:代价 -
index
:索引名称 -
index_dives_for_eq_ranges
:是否使用了index dive
。这个值会被参数eq_range_index_dive_limit
设定值影响 -
index_only
:是否是覆盖索引 -
ranges
:range
扫描的条件句范围 -
rowid_ordered
:扫描的结果集是否根据PK
排序 -
rows
:扫描行数 -
using_mrr
:是否使用mrr
5.2.4.2 chosen_range_access_summary
{
"chosen":true,
"cost_for_plan":19.291,
"range_access_plan":{
"index":"PRIMARY",
"ranges":[
"10 <= pk <= 100"
],
"rows":91,
"type":"range_scan"
},
"rows_for_plan":91
}
在前一个步骤中分析了各类索引使用的方法及代价,得出了一定的中间结果之后,在summary
阶段汇总前一阶段的中间结果确认最后的方案
-
range_access_plan
:range
扫描最终选择的执行计划。在该结构体中会给出执行计划的type
,使用的索引以及扫描行数。如果range_access_plan.type
是index_roworder_intersect
(即index merge
)的话,在该结构体下还会列intersect_of
结构体给出index merge
的具体信息。 -
rows_for_plan
:该执行计划扫描行数 -
cost_for_plan
:该执行计划代价 -
chosen
:是否选择该执行计划
5.2.4.3 group_index_range
{
"cause":"not_single_table",
"chosen":false
}
评估在使用了GROUP BY
或者是DISTINCT
的时候是否有适合的索引可用:
- 当语句中没有
GROUP BY
或者是DISTINCT
的时候,该结构体下显示"chosen":"false"
以及"cause":"'not_group_by_or_distinct"
- 如果语句中在多表关联时使用了GROUP BY或DISTINCT时,在该结构体下显示
"chosen":"false"
以及"cause":"not_single_table"
- 其他情况下会去尝试分析可用的索引(
potential_group_range_indexes
)并且计算对应的扫描行数及其所需代价
5.2.4.4 potential_range_indexes
[
{
"index":"PRIMARY",
"key_parts":[
"pk"
],
"usable":true
}
]
列出表中所有的索引,并分析其是否可用,结点释义如下:
-
index
:索引名称 -
key_parts
:建立该索引的字段 -
usable
:是否可用
5.2.4.5 table_scan
同前面t1
的释义,全表扫描的行数及代价
5.2.5 considered_execution_plans
[
{
"best_access_path":{
"considered_access_paths":[
{
"access_type":"ref",
"chosen":false,
"index":"PRIMARY",
"usable":false
},
{
"access_type":"range",
"chosen":true,
"cost":37.491,
"range_details":{
"used_index":"PRIMARY"
},
"resulting_rows":91,
"rows_to_scan":91
}
]
},
"condition_filtering_pct":100,
"cost_for_plan":37.491,
"plan_prefix":[
],
"rest_of_plan":[
{
"best_access_path":{
"considered_access_paths":[
{
"access_type":"scan",
"buffers_needed":1,
"chosen":true,
"cost":365,
"resulting_rows":20,
"rows_to_scan":20,
"using_join_cache":true
}
]
},
"chosen":true,
"condition_filtering_pct":10,
"cost_for_plan":402.49,
"plan_prefix":[
"`t2`"
],
"rows_for_plan":182,
"table":"`t1`"
}
],
"rows_for_plan":91,
"table":"`t2`"
}
]
负责对比各可行计划的代价,选择相对最优的执行计划,由于我们使用了straight join
,强制决定了连接顺序,由t2
表驱动t1
表,所以considered_execution_plans
下面只有一个元素,也就代表只有一个可行的执行计划,若是使用join
,那么此时就会有俩个元素,由优化器帮我们决定使用哪一个执行计划:
{
"considered_execution_plans":[
{
"best_access_path":{
"considered_access_paths":[
{
"access_type":"scan",
"chosen":true,
"cost":5,
"resulting_rows":20,
"rows_to_scan":20
}
]
},
"condition_filtering_pct":100,
"cost_for_plan":5,
"plan_prefix":[
],
"rest_of_plan":[
{
"best_access_path":{
"considered_access_paths":[
{
"access_type":"eq_ref",
"cause":"clustered_pk_chosen_by_heuristics",
"chosen":true,
"cost":24,
"index":"PRIMARY",
"rows":1
},
{
"access_type":"range",
"cause":"heuristic_index_cheaper",
"chosen":false,
"range_details":{
"used_index":"PRIMARY"
}
}
]
},
"chosen":true,
"condition_filtering_pct":100,
"cost_for_plan":29,
"plan_prefix":[
"`t1`"
],
"rows_for_plan":20,
"table":"`t2`"
}
],
"rows_for_plan":20,
"table":"`t1`"
},
{
"best_access_path":{
"considered_access_paths":[
{
"access_type":"ref",
"chosen":false,
"index":"PRIMARY",
"usable":false
},
{
"access_type":"range",
"chosen":true,
"cost":37.491,
"range_details":{
"used_index":"PRIMARY"
},
"resulting_rows":91,
"rows_to_scan":91
}
]
},
"condition_filtering_pct":100,
"cost_for_plan":37.491,
"plan_prefix":[
],
"pruned_by_cost":true,
"rows_for_plan":91,
"table":"`t2`"
}
]
}
字段释义:
-
best_access_path
:当前最优的执行顺序信息结果集,可以看到主要内容是considered_access_paths
。根据索引的使用与否和具体的使用方法可能会产生多个considered_access_paths
-
access_type
:参照explain
的type
,不同的type
会在considered_access_paths
下展示不同的字段,详见5.2.5.1
-
-
condition_filtering_pct
:类似于explain
中的filtered
列,这是一个估算值 -
rows_for_plan
:该执行计划最终的扫描行数,这里的行数其实也是估算值,是由considered_access_paths
的resulting_rows
相乘之后再乘以condition_filtering_pct
获得 -
cost_for_plan
:该执行计划的执行代价,由considered_access_paths
的cost
相加而得 -
plan_prefix
:前置表名称,也就是驱动表的名称 -
table
:分析的表对象名称及其别名 -
rest_of_plan
:下一张表的计划,理解为类似链表的next,可以看到,其下的结点与第一层结构是类似的
5.2.5.1 access_type
- 使用索引的情况(
t2 join t1
)
[
{
"access_type":"ref",
"chosen":false,
"index":"PRIMARY",
"usable":false
},
{
"access_type":"range",
"chosen":true,
"cost":37.491,
"range_details":{
"used_index":"PRIMARY"
},
"resulting_rows":91,
"rows_to_scan":91
}
]
可以看到,这时候有多条数据,第一条数据使用ref
类型的索引,但是该索引不可用("useable":false
),所以不被选择("chosen":false
);第二条数据使用了range
类型的索引,使用了名称为PRIMARY
的索引,并且展示了扫描的行数,扫描结果行数以及代价,最后的是在选择比较的结果中是否选择了该方式(chosen
)
- 使用扫描的情况(
t1 join t2
)
{
"access_type":"scan",
"chosen":true,
"cost":5,
"resulting_rows":20,
"rows_to_scan":20
}
该结构体下列出了该表的扫描行数(rows_to_scan
),由于这里没有列出index
所以能分析出这里的access_type
中的scan
在这里指的是全表扫描。
5.2.6 attaching_conditions_to_tables
{
"attaching_conditions_to_tables":{
"attached_conditions_computation":[
],
"attached_conditions_summary":[
{
"attached":"(`t2`.`pk` between 10 and 100)",
"table":"`t2`"
},
{
"attached":"(`t1`.`col_int_key` = `t2`.`pk`)",
"table":"`t1`"
}
],
"original_condition":"((`t1`.`col_int_key` = `t2`.`pk`) and (`t2`.`pk` between 10 and 100))"
}
}
基于considered_execution_plans
中已选执行计划改造原有的where
条件句并针对表的增加适当地附加条件便于单表数据的筛选。这部分条件的增改主要是为了便于ICP,但是ICP是否开启并不影响该部分的构造
-
original_condition
:在准备阶段原始SQL语句以及considered_execution_plans
中使用的索引基础上,改写语句,尽可能将原有语句中不能使用索引的条件句绑定到单表中对单表进行数据筛选 -
attached_conditions_computation
:使用启发式算法计算已使用的索引,如果已使用的索引的访问类型是ref
的话,计算使用range
方式访问是否能使用组合索引中更多的索引列,如果可以的话,用range
的方式替换ref
的访问方式 -
attached_conditions_summary
:针对上述的附加之后的情况汇总-
attached
:附加的条件或者是原语句中能直接下推给单表筛选的条件 -
table
:对象表及其别名
-
5.2.7 clause_processing
{
"clause_processing":{
"clause":"ORDER BY",
"items":[
{
"item":"`t1`.`col_int_key`"
},
{
"item":"`t2`.`pk`"
}
],
"original_clause":"`t1`.`col_int_key`,`t2`.`pk`",
"resulting_clause":"`t1`.`col_int_key`,`t2`.`pk`",
"resulting_clause_is_simple":false
}
}
主要是对DISTINCT
、GROUP BY
、ORDER BY
等语句进行优化
-
clause
:优化语句关键字(DISTINCT
、GROUP BY
、ORDER BY
) -
original_clause
:优化对象的原始语句 -
items
:original_clause中包含的对象-
item
:对象名 -
eq_ref_to_preceding_items
:与前置表关联的是否是唯一索引。在这个示例中,并没有这样的情况所以没有列出。如果语句稍微改写一下,将原始查询语句改写为t1 STRAIGHT_JOIN t2
,让t1
去驱动t2
表,就发现在分析items[1].item = t2.pk
时出现了该字段。这是由于t1
与t2
通过t2
的主键pk
列进行关联,这就意味着,t1
中的一行数据最多只能在t2
中关联出一列,所以在后续优化的结果语句中order by
的t2.pk
列被优化掉了,因为这一列已经确认唯一不需要再进行排序:
-
{
"clause_processing":{
"clause":"ORDER BY",
"items":[
{
"item":"`t1`.`col_int_key`"
},
{
"eq_ref_to_preceding_items":true,
"item":"`t2`.`pk`"
}
],
"original_clause":"`t1`.`col_int_key`,`t2`.`pk`",
"resulting_clause":"`t1`.`col_int_key`",
"resulting_clause_is_simple":true
}
}
-
resulting_clause_is_simple
:优化后的结果语句是否是简单语句 -
resulting_clause
:优化后的结果语句
那么显然,这一块我们需要重点关注的就是
original_clause
和resulting_clause
,通过对比结果,再去查看细节,探究mysql在对应从句上是做了哪些优化
5.2.8 refine_plan
{
"refine_plan":[
{
"table":"`t2`"
},
{
"table":"`t1`"
}
]
}
该阶段展示的是改善之后的执行计划,如图所示只展示了对应的表对象没有其他的字段的话,说明在之前已经确定的执行计划中没有需要再优化的地方,可直接应用
-
table
:涉及的表名及其别名 -
pushed_index_condition
:可使用到ICP的条件句 -
table_condition_attached
:在attaching_conditions_to_tables
阶段添加了附加条件的条件语句 -
access_type
:优化后的索引访问类型
5.3 join_execution
执行阶段
在join_execution
阶段,如果分析的语句是explain
的话,该阶段的代码是空结构体,只有在真正执行语句之后,该阶段的代码才会有具体的步骤过程
-
creating_tmp_table
:创建临时表 -
filesort_information
:文件排序信息 -
filesort_priority_queue_optimization
:文件排序-优先队列优化 -
filesort_execution
:文件排序执行 -
filesort_summary
:文件排序汇总信息
5.3.1 creating_tmp_table
{
"creating_tmp_table":{
"tmp_table_info":{
"key_length":4,
"location":"memory (heap)",
"row_length":18,
"row_limit_estimate":932067,
"table":"intermediate_tmp_table",
"unique_constraint":false
}
}
}
-
converting_tmp_table_to_ondisk
:将临时表落地到磁盘中,如果临时表的大小超过了设置的max_heap_table_size
或者是tmp_table_size
参数的话,那么将会出现该节点(本示例中并没有出现) -
tmp_table_info
:临时表信息-
table
:临时表的名称 -
row_length
:临时表的单行长度 -
key_length
:临时表索引长度 -
unique_constraint
:是否有使用唯一约束 -
location
:表存储位置,比如内存表memory (heap)
,或者是转换到磁盘的物理表disk (InnoDB)
-
row_limit_estimate
:该临时表中能存储的最大行数
-
5.3.2 filesort
{
"filesort_execution":[
],
"filesort_information":[
{
"direction":"asc",
"field":"col_int_key",
"table":"intermediate_tmp_table"
},
{
"direction":"asc",
"field":"pk",
"table":"intermediate_tmp_table"
}
],
"filesort_priority_queue_optimization":{
"cause":"quicksort_is_cheaper",
"chosen":false,
"limit":1000,
"memory_available":262144,
"row_size":17,
"rows_estimate":10
},
"filesort_summary":{
"examined_rows":0,
"number_of_tmp_files":0,
"rows":0,
"sort_buffer_size":376,
"sort_mode":""
}
}
在创建临时表之后如果临时表不需要转换为磁盘表的话,即开始对文件排序进行处理
-
filesort_information
:文件排序信息(如果有多列,filesort_information
下会有多个结构体)-
direction
:排序列是升序还是降序 -
table
:排序的表对象名 -
field
:排序列
-
-
filesort_priority_queue_optimization
:优先队列优化排序,一般在使用limit
子句的时候会使用优先队列-
usable
:是否有使用 -
cause
:没有使用的原因 -
limit
:限制查询的行数 -
memory_available
:可用内存大小 -
row_size
:单行大小 -
rows_estimate
:估算的行数
-
-
filesort_execution
:执行文件排序 -
filesort_summary
:文件排序汇总信息-
rows
:预计扫描行数 -
examined_rows
:参与排序的行数 -
number_of_tmp_files
:使用临时文件的个数,这个值为0代表全部使用sort_buffer
内存排序,否则表示使用了磁盘文件排序 -
sort_buffer_size
:使用的sort_buffer
的大小 -
sort_mode
:排序方式
-
6. 总结
在整个optimizer_trace中我们重点其实就是在跟踪记录TRACE的JSON树,我们通过这棵树中的内容可以具体去分析优化器究竟做了什么事情,进行了哪些选择,是基于什么原因做的选择,选择的结果及依据。这一系列都可以辅助验证我们的一些观点及优化,更好的帮助我们对我们的数据库的实例进行调整。