MySQL 优化器有许多优化子查询的执行策略,包括重写成连接、半连接、临时表。这种策略会根据子查询的类型和布置来使用。
标量子查询
标量子查询是只返回一行结果的子查询,在执行过程中还可以被优化和缓存。
在例子13中,我们可以通过标量子查询,找到 多伦多 的 CountryCode。
关键的一点是,优化器把它视作两个查询,花费分别是 1.00 和 4213.00 。
第二个查询(select_id:2)没有可用的索引,因此进行了全表扫描。因为条件查询的列attached_condition (`City`.`Name`)
没有被索引。
例子13:标量子查询
EXPLAIN FORMAT=JSON
SELECT * FROM Country WHERE Code = (
SELECT CountryCode FROM City WHERE name='Toronto'
);
{
"query_block": {
"select_id": 1, # 第一个查询
"cost_info": {
"query_cost": "1.20"
},
"table": {
"table_name": "Country",
"access_type": "ref",
"possible_keys": [
"PRIMARY"
],
"key": "PRIMARY",
"used_key_parts": [
"Code"
],
"key_length": "3",
"ref": [
"const"
],
"rows_examined_per_scan": 1,
"rows_produced_per_join": 1,
"filtered": "100.00",
"cost_info": {
"read_cost": "1.00",
"eval_cost": "0.20",
"prefix_cost": "1.20",
"data_read_per_join": "264"
},
"used_columns": [
...
],
"attached_condition": "(`world`.`Country`.`Code` = (/* select#2 */ select `world`.`City`.`CountryCode` from `world`.`City` where (`world`.`City`.`Name` = 'Toronto')))",
"attached_subqueries": [
{
"dependent": false,
"cacheable": true,
"query_block": {
"select_id": 2, # 第二个查询
"cost_info": {
"query_cost": "862.60"
},
"table": {
"table_name": "City", # 子查询的表
"access_type": "ALL", # 全表扫描
"rows_examined_per_scan": 4188,
"rows_produced_per_join": 418,
"filtered": "10.00",
"cost_info": {
"read_cost": "778.84",
"eval_cost": "83.76",
"prefix_cost": "862.60",
"data_read_per_join": "29K"
},
"used_columns": [
"Name",
"CountryCode"
],
"attached_condition": "(`world`.`City`.`Name` = 'Toronto')"
}
}
}
]
}
}
}
在为其添加索引后,这个查询就得到优化了。
例子14:添加索引,改善标量子查询
ALTER TABLE City ADD INDEX n (Name);
EXPLAIN FORMAT=JSON
SELECT * FROM Country WHERE Code = (
SELECT CountryCode FROM City WHERE name='Toronto'
);
...
"optimized_away_subqueries": [
{
"dependent": false,
"cacheable": true,
"query_block": {
"select_id": 2, # 第二个查询
"cost_info": {
"query_cost": "2.00"
},
"table": {
"table_name": "City",
"access_type": "ref", # 索引访问
"possible_keys": [
"n"
],
"key": "n",
"used_key_parts": [
"Name"
],
"key_length": "35",
"ref": [
"const"
],
"rows_examined_per_scan": 1,
"rows_produced_per_join": 1,
"filtered": "100.00",
"cost_info": {
"read_cost": "1.00",
"eval_cost": "1.00",
"prefix_cost": "2.00",
"data_read_per_join": "72"
},
...
IN 子查询 (唯一)
例子15展示了返回主键的子查询,结果是唯一的。因此这种子查询可以安全地转换为内连接查询,并返回相同结果。
这种子查询是比较高效的。我们可以看出先查询了 Country 表(使用了索引),对于每个匹配行,再通过 CountryCode 索引来查出 City 表里的行。
例子15:可转换的 IN 子查询
EXPLAIN FORMAT=JSON
SELECT * FROM City WHERE CountryCode IN (
SELECT Code FROM Country WHERE Continent = 'Asia'
);
...
{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "1893.30"
},
"nested_loop": [
{
"table": {
"table_name": "Country", # 查询 Country 表
"access_type": "ref",
"possible_keys": [
"PRIMARY",
"c"
],
"key": "c",
"used_key_parts": [
"Continent"
],
"key_length": "1",
"ref": [
"const"
],
"rows_examined_per_scan": 51,
"rows_produced_per_join": 51,
"filtered": "100.00",
"using_index": true, # 使用了索引
"cost_info": {
"read_cost": "1.02",
"eval_cost": "51.00",
"prefix_cost": "52.02",
"data_read_per_join": "13K"
},
...
IN 子查询(非唯一)
在例子15中,子查询被重写成内连接,原因是它已经返回不重复的结果了。
当子查询不是不重复的,MySQL 优化器就不得不采用其他策略。
在例子16中,子查询要找到使用至少一种官方语言的国家。因为有多个国家使用超过一种官方语言,所以子查询结果不是唯一的。
例子16:不能重写成内连接的子查询
EXPLAIN FORMAT=JSON
SELECT * FROM Country WHERE Code IN (
SELECT CountryCode FROM CountryLanguage WHERE isOfficial=1
);
...
"table": {
"table_name": "", # 子查询
"access_type": "eq_ref",
"key": "",
"key_length": "3",
"ref": [
"world.Country.Code"
],
"rows_examined_per_scan": 1,
"materialized_from_subquery": {
"using_temporary_table": true, # 使用了临时表
"query_block": {
"table": {
"table_name": "CountryLanguage",
"access_type": "ALL",
"possible_keys": [
"PRIMARY",
"CountryCode"
],
"rows_examined_per_scan": 984,
"rows_produced_per_join": 492,
"filtered": "50.00",
"cost_info": {
"read_cost": "104.40",
"eval_cost": "98.40",
"prefix_cost": "202.80",
"data_read_per_join": "19K"
},
"used_columns": [
"CountryCode",
"IsOfficial"
],
"attached_condition": "(`world`.`CountryLanguage`.`IsOfficial` = 1)"
}
}
}
...
例子17的 EXPLAIN 结果 OPTIMIZER_TRACE
可以看出优化器指出该查询不能重写成连接查询,而是“半连接”。优化器有几种策略来执行半连接:首次匹配、查临时表、去重。在这个例子中,优化器采取了(代价最低的)临时表策略来查询。
例子17:子查询的半连接策略
SET OPTIMIZER_TRACE="enabled=on";
SET optimizer_trace_max_mem_size = 1024 * 1024;
SELECT * FROM Country WHERE Code IN (
SELECT CountryCode FROM CountryLanguage WHERE isOfficial=1
);
SELECT * FROM information_schema.optimizer_trace;
...
"semijoin_strategy_choice": [
{
"strategy": "FirstMatch", # 首次匹配
"recalculate_access_paths_and_cost": {
"tables": [
]
},
"cost": 499.63,
"rows": 239,
"chosen": true
},
{
"strategy": "MaterializeLookup", # 查找临时表
"cost": 407.8, # 查询代价是最低的
"rows": 239,
"duplicate_tables_left": false,
"chosen": true
},
{
"strategy": "DuplicatesWeedout", #去重
"cost": 650.36,
"rows": 239,
"duplicate_tables_left": false,
"chosen": false
}
],
"chosen": true
}
...
{
"final_semijoin_strategy": "MaterializeLookup" # 最终选择了临时表
}
...
NOT IN 子查询
一个 NOT IN 子查询无法使用临时表或其他策略来优化。为了说明两种方式的区别,考虑如下例子:
SELECT * FROM City WHERE CountryCode NOT IN (SELECT code FROM Country);
SELECT * FROM City WHERE CountryCode NOT IN (SELECT code FROM Country WHERE continent IN ('Asia', 'Europe', 'North America'));
在第一个查询中,其子查询或多或少是其最理想的形式。code 列是 Country 的主键, 而通过索引扫描就可以构建一个不重复集。
在第二个查询中,附加了一个条件:continent IN ('Asia', 'Europe', 'North America'))
。考虑到 City 表的每一行都需要对照判断NOT IN
,创建一个临时表去储存匹配到条件的行是合理的,这样就不必对 City 表每一行都去检查附加条件。
例子18:采用临时表的 NOT IN 子查询
EXPLAIN FORMAT=JSON
SELECT * FROM City WHERE CountryCode NOT IN (
SELECT code FROM Country WHERE continent IN ('Asia', 'Europe', 'North America')
);
...
"attached_subqueries": [
{
"table": {
"table_name": "", # 采用临时表
"access_type": "eq_ref",
"key": "",
"key_length": "3",
"rows_examined_per_scan": 1,
"materialized_from_subquery": {
"using_temporary_table": true,
"dependent": true,
"cacheable": false,
"query_block": {
"select_id": 2,
"cost_info": {
"query_cost": "54.67"
},
...
派生表
SELECT
查询的FROM
后跟着的子查询产生的表就是派生表。这种子查询不需要产生临时表,MySQL通常可以把它“合并”回来。
例子19:派生表被合并回主表
EXPLAIN FORMAT=JSON
SELECT * FROM Country, (SELECT * FROM City WHERE CountryCode ='CAN' ) as CityTmp
WHERE Country.code=CityTmp.CountryCode AND CityTmp.name ='Toronto';
...
{
"table": {
"table_name": "City", # 派生表
"access_type": "ref", # 使用索引
"possible_keys": [
"CountryCode",
"n"
],
"key": "n",
"used_key_parts": [
"Name"
],
"key_length": "35",
"ref": [
"const"
],
"rows_examined_per_scan": 1,
"rows_produced_per_join": 0,
"filtered": "5.00",
"cost_info": {
"read_cost": "1.00",
"eval_cost": "0.05",
"prefix_cost": "2.00",
"data_read_per_join": "3"
},
...
潜在的问题是,这种“合并”会让一些查询不再合法。如果你升级版本后看到了语法警告,你可以选择关闭derived_merge
优化,这会导致查询代价提升,因为产生临时表的代价比较高。
译自:
Subqueries - The Unofficial MySQL 8.0 Optimizer Guide