mysql order by 排序
索引排序 参考
order by 字段上上有索引可能就会用上索引排序,是否应用索引排序与sql以及优化器执行的成本决定。
表结构如下,有一个idx_key二级索引,一个主键索引
CREATE TABLE `app` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT '主键',
`name` varchar(10) DEFAULT NULL COMMENT 'APP类型名称 如:Android iOS',
`is_encryption` tinyint(1) NOT NULL DEFAULT '0' COMMENT '是否加密 1加密 0不加密',
`key` varchar(20) NOT NULL DEFAULT '0' COMMENT '加密key',
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '创建时间',
`updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00' COMMENT '更新时间',
`status` tinyint(1) NOT NULL DEFAULT '1' COMMENT '状态 1正常 0删除',
PRIMARY KEY (`id`),
KEY `idx_key` (`key`),
KEY `idx_name_status` (`name`,`status`)
) ENGINE=InnoDB AUTO_INCREMENT=458745 DEFAULT CHARSET=utf8mb4
### 加点数据
insert into app(`name`,`is_encryption`,`key`,`status`) values(substring(MD5(RAND()),1,20), 1, substring(MD5(RAND()),1,20),1);
insert into app(`name`,`is_encryption`,`key`,`status`) select substring(concat(`name`,md5(RAND())),1,20),`is_encryption`,substring(concat(`key`,md5(RAND())),1,20),`status` from app where 1;
-
如果排序的索引列不能满足查询的列以及索引顺序不一样,不能用索引排序。
例如:
select *from app where 1 order by `key`;
select `key` from app where 1 order by name,status;
select status from app where 1 order by status,name;
mysql> explain select id,`key`,status from app order by `key` desc;
+----+-------------+-------+------+---------------+------+---------+------+--------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+--------+----------------+
| 1 | SIMPLE | app | ALL | NULL | NULL | NULL | NULL | 306528 | Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+--------+----------------+
1 row in set (0.00 sec)
-
查询语句上有where 条件,字段有索引,order by的上也有索引,但不是同一个索引树也不能用索引来排序。
例如:
mysql> explain select * from app where name='chujiu' order by `key`;
+----+-------------+-------+------+-----------------+-----------------+---------+-------+-------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+-----------------+-----------------+---------+-------+-------+----------------------------------------------------+
| 1 | SIMPLE | app | ref | idx_name_status | idx_name_status | 43 | const | 32226 | Using index condition; Using where; Using filesort |
+----+-------------+-------+------+-----------------+-----------------+---------+-------+-------+----------------------------------------------------+
1 row in set (0.00 sec)
- 查询语句上有where条件,筛选的字段上没有索引,orderby 字段上有索引,也不一定能用上索引,即使索引列包含查询的字段。通过扫描索引树上可以满足查的列,但不能通过索引树进行is_encryption筛选,优化器就认为使用文件排序成本更小。即使筛选的字段上有索引,但where查询的范围筛选的数据比较大,优化器基于成本考虑可能也不会使用索引排序。
mysql> explain select `key` from app where is_encryption=1 order by `key`;
+----+-------------+-------+------+---------------+------+---------+------+--------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+--------+-----------------------------+
| 1 | SIMPLE | app | ALL | NULL | NULL | NULL | NULL | 306528 | Using where; Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+--------+-----------------------------+
1 row in set (0.00 sec)
- order by 多个字段,有联合索引,每个字段的顺序不一样,也是不能用到索引排序,因为索引树的每个节点里的值都是有序的,如果多个字段排序不一样就不能利用索引树排序,使用文件排序。如果多个字段,每个字段都有单独的索引,也是不能用上索引排序的。2个以上字段的索引,如果排序的字段是非连续的也不能用上索引,比如索引 idx(a,b,c),order by a,c 也不能用索引排序
mysql> explain select status from app where name > 'chujiu' order by name desc,status asc;
+----+-------------+-------+-------+-----------------+-----------------+---------+------+--------+------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+-----------------+-----------------+---------+------+--------+------------------------------------------+
| 1 | SIMPLE | app | range | idx_name_status | idx_name_status | 43 | NULL | 153264 | Using where; Using index; Using filesort |
+----+-------------+-------+-------+-----------------+-----------------+---------+------+--------+------------------------------------------+
- orderby 字句上使用函数或者表达式,这个其实根where条件上使用函数以及隐士转换为啥不能索引一样,从数据结构上来讲,因为如果有函数计算的话,那扫描一个节点就得对节点上的key先计算一次,这样io次数增加,成本增加。
mysql> explain select status from app where name > 'chujiu' order by substr(name,1) desc;
+----+-------------+-------+-------+-----------------+-----------------+---------+------+--------+------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+-----------------+-----------------+---------+------+--------+------------------------------------------+
| 1 | SIMPLE | app | range | idx_name_status | idx_name_status | 43 | NULL | 153264 | Using where; Using index; Using filesort |
+----+-------------+-------+-------+-----------------+-----------------+---------+------+--------+------------------------------------------+
1 row in set (0.01 sec)
- 前缀索引同样不能用于索引排序。
文件排序,如果索引不能满足排序,mysql会执行文件排序。
优化器会在内存中预分配一个排序缓冲区,用于排序,如果结果集特别大超出了缓冲区,会生成一个临时文件用户排序。缓冲区的大小由sort_buffer_size和max_length_for_sort_data控制。
双路排序,从排序的列取出字段值和id值加载到内存中,进行排序,然后扫描已排好序的列,针对查询的数据,再回表查询数据。第一次从排序的列取出字段是一次io,排序后再取具体的数据是一次io,总共需要2次io操作,而排序后的数据在磁盘的分布不是有序的,成了随机io,这中排序导致性能特别低。
单路排序,其实和双路排序的区别就是在一开始排序之前就把要查询的数据加载到buffer中,这样只需要一次io操作就可以,但带来的结果就是buffer空间的增大,所以本质上也是空间换取时间。当读取数据超过buffer的容量时,就会导致多次读取数据,并创建临时表,最后多路合并,产生多次I/O,反而增加其I/O运算。
可以打开optimizer trace 跟踪下看看优化器执行的过程 参考
mysql> show global variables like '%sort_buffer_size%';
+-------------------------+---------+
| Variable_name | Value |
+-------------------------+---------+
| innodb_sort_buffer_size | 1048576 |
| myisam_sort_buffer_size | 8388608 |
| sort_buffer_size | 262144 |
+-------------------------+---------+
3 rows in set (0.00 sec)
mysql> show global variables like '%optimizer_trace%';
+------------------------------+----------------------------------------------------------------------------+
| Variable_name | Value |
+------------------------------+----------------------------------------------------------------------------+
| optimizer_trace | enabled=off,one_line=off |
| optimizer_trace_features | greedy_search=on,range_optimizer=on,dynamic_range=on,repeated_subselect=on |
| optimizer_trace_limit | 1 |
| optimizer_trace_max_mem_size | 16384 |
| optimizer_trace_offset | -1 |
+------------------------------+----------------------------------------------------------------------------+
5 rows in set (0.00 sec)
mysql> set optimizer_trace = "enabled=on";
Query OK, 0 rows affected (0.00 sec)
# Turn tracing on (it's off by default):
SET optimizer_trace="enabled=on";
SELECT ...; # your query here
SELECT * FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE;
# possibly more queries...
# When done with tracing, disable it:
SET optimizer_trace="enabled=off";
sql select *from app where 1 order by `key`;
SELECT * FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE;
{
"steps":[
{
"join_preparation":{
"select#":1,
"steps":[
{
"expanded_query":"/* select#1 */ select `app`.`id` AS `id`,`app`.`name` AS `name`,`app`.`is_encryption` AS `is_encryption`,`app`.`key` AS `key`,`app`.`created_at` AS `created_at`,`app`.`updated_at` AS `updated_at`,`app`.`status` AS `status` from `app` where 1 order by `app`.`key`"
}
]
}
},
{
"join_optimization":{
"select#":1,
"steps":[
{
"condition_processing":{
"condition":"WHERE",
"original_condition":"1",
"steps":[
{
"transformation":"equality_propagation",
"resulting_condition":"1"
},
{
"transformation":"constant_propagation",
"resulting_condition":"1"
},
{
"transformation":"trivial_condition_removal",
"resulting_condition":null
}
]
}
},
{
"table_dependencies":[
{
"table":"`app`",
"row_may_be_null":false,
"map_bit":0,
"depends_on_map_bits":[
]
}
]
},
{
"rows_estimation":[
{
"table":"`app`",
"table_scan":{
"rows":636405,
"cost":2788
}
}
]
},
{
"considered_execution_plans":[
{
"plan_prefix":[
],
"table":"`app`",
"best_access_path":{
"considered_access_paths":[
{
"access_type":"scan",
"rows":636405,
"cost":130069,
"chosen":true,
"use_tmp_table":true
}
]
},
"cost_for_plan":130069,
"rows_for_plan":636405,
"sort_cost":636405,
"new_cost_for_plan":766474,
"chosen":true
}
]
},
{
"attaching_conditions_to_tables":{
"original_condition":null,
"attached_conditions_computation":[
],
"attached_conditions_summary":[
{
"table":"`app`",
"attached":null
}
]
}
},
{
"clause_processing":{
"clause":"ORDER BY",
"original_clause":"`app`.`key`",
"items":[
{
"item":"`app`.`key`"
}
],
"resulting_clause_is_simple":true,
"resulting_clause":"`app`.`key`"
}
},
{
"refine_plan":[
{
"table":"`app`",
"access_type":"table_scan"
}
]
},
{
"reconsidering_access_paths_for_index_ordering":{
"clause":"ORDER BY",
"index_order_summary":{
"table":"`app`",
"index_provides_order":false,
"order_direction":"undefined",
"index":"unknown",
"plan_changed":false
}
}
}
]
}
},
{
"join_execution":{
"select#":1,
"steps":[
{
"filesort_information":[
{
"direction":"asc",
"table":"`app`",
"field":"key"
}
],
"filesort_priority_queue_optimization":{
"usable":false,
"cause":"not applicable (no LIMIT)"
},
"filesort_execution":[
],
"filesort_summary":{
"rows":638976,
"examined_rows":638976,
"number_of_tmp_files":457,
"sort_buffer_size":261987,
"sort_mode":""
}
}
]
}
}
]
}
大概分为三个阶段,准备阶段join_preparation,优化阶段join_optimization,执行阶段join_execution。
准备阶段把字段已解析为数据库中的表字段,并且每个选择都用其编号进行了注释。
-
优化阶段分步骤走
条件处理 condition_processing
处理表的依赖 table_dependencies
估算行数 rows_estimation
可能执行的计划 considered_execution_plans
把条件附加到表上 attaching_conditions_to_tables
排序的字段处理 clause_processing
完善计划 refine_plan
重新考虑用于索引排序的访问路径 reconsidering_access_paths_for_index_ordering
-
执行阶段 join_execution 查找最佳的排序顺序
filesort_summary 展示了排序的一些信息,rows代表扫描的行数,number_of_tmp_files是使用的临时文件数量,sort_buffer_size说明了buffer的大小,sort_mode排序的方式,这表示排序缓冲区元组包含查询引用的排序关键字值和列。元组按排序键值排序,列值直接从元组中读取。
也可以看出使用了临时的文件排序,buffer空间不够,单路排序用了很多临时文件去排序。