子查询 - 非官方 MySQL 8.0 优化指南 - 学习笔记

MySQL 优化器有许多优化子查询的执行策略,包括重写成连接、半连接、临时表。这种策略会根据子查询的类型和布置来使用。

标量子查询

标量子查询是只返回一行结果的子查询,在执行过程中还可以被优化和缓存。

在例子13中,我们可以通过标量子查询,找到 多伦多 的 CountryCode。
关键的一点是,优化器把它视作两个查询,花费分别是 1.00 和 4213.00 。
第二个查询(select_id:2)没有可用的索引,因此进行了全表扫描。因为条件查询的列attached_condition (`City`.`Name`)没有被索引。

例子13:标量子查询

EXPLAIN FORMAT=JSON
SELECT * FROM Country WHERE Code = (
  SELECT CountryCode FROM City WHERE name='Toronto'
);
{
  "query_block": {
    "select_id": 1,                      # 第一个查询
    "cost_info": {
      "query_cost": "1.20"
    },
    "table": {
      "table_name": "Country",
      "access_type": "ref",
      "possible_keys": [
        "PRIMARY"
      ],
      "key": "PRIMARY",
      "used_key_parts": [
        "Code"
      ],
      "key_length": "3",
      "ref": [
        "const"
      ],
      "rows_examined_per_scan": 1,
      "rows_produced_per_join": 1,
      "filtered": "100.00",
      "cost_info": {
        "read_cost": "1.00",
        "eval_cost": "0.20",
        "prefix_cost": "1.20",
        "data_read_per_join": "264"
      },
      "used_columns": [
        ...
      ],
      "attached_condition": "(`world`.`Country`.`Code` = (/* select#2 */ select `world`.`City`.`CountryCode` from `world`.`City` where (`world`.`City`.`Name` = 'Toronto')))",
      "attached_subqueries": [
        {
          "dependent": false,
          "cacheable": true,
          "query_block": {
            "select_id": 2,              # 第二个查询
            "cost_info": {
              "query_cost": "862.60"
            },
            "table": {
              "table_name": "City",      # 子查询的表
              "access_type": "ALL",      # 全表扫描
              "rows_examined_per_scan": 4188,
              "rows_produced_per_join": 418,
              "filtered": "10.00",
              "cost_info": {
                "read_cost": "778.84",
                "eval_cost": "83.76",
                "prefix_cost": "862.60",
                "data_read_per_join": "29K"
              },
              "used_columns": [
                "Name",
                "CountryCode"
              ],
              "attached_condition": "(`world`.`City`.`Name` = 'Toronto')"
            }
          }
        }
      ]
    }
  }
}

在为其添加索引后,这个查询就得到优化了。

例子14:添加索引,改善标量子查询

ALTER TABLE City ADD INDEX n (Name);
EXPLAIN FORMAT=JSON

SELECT * FROM Country WHERE Code = (
  SELECT CountryCode FROM City WHERE name='Toronto'
);
...
"optimized_away_subqueries": [
   {
   "dependent": false,
   "cacheable": true,
   "query_block": {
   "select_id": 2,              # 第二个查询
   "cost_info": {
      "query_cost": "2.00"
   },
   "table": {
      "table_name": "City",
      "access_type": "ref",     # 索引访问
      "possible_keys": [
         "n"
      ],
      "key": "n",
      "used_key_parts": [
         "Name"
      ],
      "key_length": "35",
      "ref": [
         "const"
      ],
      "rows_examined_per_scan": 1,
      "rows_produced_per_join": 1,
      "filtered": "100.00",
      "cost_info": {
         "read_cost": "1.00",
         "eval_cost": "1.00",
         "prefix_cost": "2.00",
         "data_read_per_join": "72"
      },
...

IN 子查询 (唯一)

例子15展示了返回主键的子查询,结果是唯一的。因此这种子查询可以安全地转换为内连接查询,并返回相同结果。

这种子查询是比较高效的。我们可以看出先查询了 Country 表(使用了索引),对于每个匹配行,再通过 CountryCode 索引来查出 City 表里的行。

例子15:可转换的 IN 子查询

EXPLAIN FORMAT=JSON
SELECT * FROM City WHERE CountryCode IN (
  SELECT Code FROM Country WHERE Continent = 'Asia'
);
...
{
 "query_block": {
   "select_id": 1,
   "cost_info": {
   "query_cost": "1893.30"
   },
   "nested_loop": [
   {
      "table": {
      "table_name": "Country",   # 查询 Country 表
      "access_type": "ref",
      "possible_keys": [
         "PRIMARY",
         "c"
      ],
      "key": "c",
      "used_key_parts": [
         "Continent"
      ],
      "key_length": "1",
      "ref": [
         "const"
      ],
      "rows_examined_per_scan": 51,
      "rows_produced_per_join": 51,
      "filtered": "100.00",
      "using_index": true,      # 使用了索引
      "cost_info": {
         "read_cost": "1.02",
         "eval_cost": "51.00",
         "prefix_cost": "52.02",
         "data_read_per_join": "13K"
      },
...

IN 子查询(非唯一)

在例子15中,子查询被重写成内连接,原因是它已经返回不重复的结果了。
当子查询不是不重复的,MySQL 优化器就不得不采用其他策略。

在例子16中,子查询要找到使用至少一种官方语言的国家。因为有多个国家使用超过一种官方语言,所以子查询结果不是唯一的。

例子16:不能重写成内连接的子查询

EXPLAIN FORMAT=JSON
SELECT * FROM Country WHERE Code IN (
  SELECT CountryCode FROM CountryLanguage WHERE isOfficial=1
);
...
       "table": {
         "table_name": "",      # 子查询
         "access_type": "eq_ref",
         "key": "",
         "key_length": "3",
         "ref": [
           "world.Country.Code"
         ],
         "rows_examined_per_scan": 1,
         "materialized_from_subquery": {
           "using_temporary_table": true,  # 使用了临时表
           "query_block": {
             "table": {
               "table_name": "CountryLanguage",
               "access_type": "ALL",
               "possible_keys": [
                 "PRIMARY",
                 "CountryCode"
               ],
               "rows_examined_per_scan": 984,
               "rows_produced_per_join": 492,
               "filtered": "50.00",
               "cost_info": {
                 "read_cost": "104.40",
                 "eval_cost": "98.40",
                 "prefix_cost": "202.80",
                 "data_read_per_join": "19K"
               },
               "used_columns": [
                 "CountryCode",
                 "IsOfficial"
               ],
               "attached_condition": "(`world`.`CountryLanguage`.`IsOfficial` = 1)"
             }
           }
         }
...

例子17的 EXPLAIN 结果 OPTIMIZER_TRACE 可以看出优化器指出该查询不能重写成连接查询,而是“半连接”。优化器有几种策略来执行半连接:首次匹配、查临时表、去重。在这个例子中,优化器采取了(代价最低的)临时表策略来查询。

例子17:子查询的半连接策略

SET OPTIMIZER_TRACE="enabled=on";
SET optimizer_trace_max_mem_size = 1024 * 1024;
SELECT * FROM Country WHERE Code IN (
  SELECT CountryCode FROM CountryLanguage WHERE isOfficial=1
);
SELECT * FROM information_schema.optimizer_trace;
...
                   "semijoin_strategy_choice": [
                     {
                       "strategy": "FirstMatch",           # 首次匹配
                       "recalculate_access_paths_and_cost": {
                         "tables": [
                         ]
                       },
                       "cost": 499.63,
                       "rows": 239,
                       "chosen": true
                     },
                     {
                       "strategy": "MaterializeLookup",    # 查找临时表
                       "cost": 407.8,  # 查询代价是最低的
                       "rows": 239,
                       "duplicate_tables_left": false,
                       "chosen": true
                     },
                     {
                       "strategy": "DuplicatesWeedout",    #去重
                       "cost": 650.36,
                       "rows": 239,
                       "duplicate_tables_left": false,
                       "chosen": false
                     }
                   ],
                   "chosen": true
                 }
...
         {
           "final_semijoin_strategy": "MaterializeLookup"  # 最终选择了临时表
         }
...

NOT IN 子查询

一个 NOT IN 子查询无法使用临时表或其他策略来优化。为了说明两种方式的区别,考虑如下例子:

  1. SELECT * FROM City WHERE CountryCode NOT IN (SELECT code FROM Country);
  2. SELECT * FROM City WHERE CountryCode NOT IN (SELECT code FROM Country WHERE continent IN ('Asia', 'Europe', 'North America'));

在第一个查询中,其子查询或多或少是其最理想的形式。code 列是 Country 的主键, 而通过索引扫描就可以构建一个不重复集。
在第二个查询中,附加了一个条件:continent IN ('Asia', 'Europe', 'North America'))。考虑到 City 表的每一行都需要对照判断NOT IN,创建一个临时表去储存匹配到条件的行是合理的,这样就不必对 City 表每一行都去检查附加条件。

例子18:采用临时表的 NOT IN 子查询

EXPLAIN FORMAT=JSON
SELECT * FROM City WHERE CountryCode NOT IN (
  SELECT code FROM Country WHERE continent IN ('Asia', 'Europe', 'North America')
);
...
     "attached_subqueries": [
       {
         "table": {
           "table_name": "",  # 采用临时表
           "access_type": "eq_ref",
           "key": "",
           "key_length": "3",
           "rows_examined_per_scan": 1,
           "materialized_from_subquery": {
             "using_temporary_table": true,
             "dependent": true,
             "cacheable": false,
             "query_block": {
               "select_id": 2,
               "cost_info": {
                 "query_cost": "54.67"
               },
...

派生表

SELECT查询的FROM后跟着的子查询产生的表就是派生表。这种子查询不需要产生临时表,MySQL通常可以把它“合并”回来。

例子19:派生表被合并回主表

EXPLAIN FORMAT=JSON
SELECT * FROM Country, (SELECT * FROM City WHERE CountryCode ='CAN' ) as CityTmp
WHERE Country.code=CityTmp.CountryCode AND CityTmp.name ='Toronto';
...
   {
      "table": {
      "table_name": "City",    # 派生表
      "access_type": "ref",    # 使用索引
      "possible_keys": [
         "CountryCode",
         "n"
      ],
      "key": "n",
      "used_key_parts": [
         "Name"
      ],
      "key_length": "35",
      "ref": [
         "const"
      ],
      "rows_examined_per_scan": 1,
      "rows_produced_per_join": 0,
      "filtered": "5.00",
      "cost_info": {
         "read_cost": "1.00",
         "eval_cost": "0.05",
         "prefix_cost": "2.00",
         "data_read_per_join": "3"
      },
...

潜在的问题是,这种“合并”会让一些查询不再合法。如果你升级版本后看到了语法警告,你可以选择关闭derived_merge优化,这会导致查询代价提升,因为产生临时表的代价比较高。

译自:
Subqueries - The Unofficial MySQL 8.0 Optimizer Guide

你可能感兴趣的:(子查询 - 非官方 MySQL 8.0 优化指南 - 学习笔记)