MySQL-优化-ICP(Index condition pushdown)详解

文章目录

  • MySQL-优化-ICP(Index condition pushdown)详解
    • 一、关于ICP
    • 二、什么情况下可以用到MySQL ICP特性
    • 三、疑问:如下这个sql为什么会使用到ICP呢?
    • 四、代码控制
      • server层处理
      • engine层处理

MySQL-优化-ICP(Index condition pushdown)详解

一、关于ICP

ICP是index condition pushdown的简称,目的是为了减少server层和Innodb层的交互次数,加快查询效率。
官方文档给了一个例子,如下
假设有一张表people的二级索引INDEX (zipcode, lastname, firstname),那么对于如下的查询

SELECT * FROM people
WHERE zipcode='95054'
AND lastname LIKE '%etrunia%'
AND address LIKE '%Main Street%';

MySQL可以利用索引来检索zipcode=‘95054’,但是没有办法通过索引查找来检索lastname LIKE ‘%etrunia%’,如果没有ICP的话,第一次交互过程中,查到所有的匹配zipcode='95054’的主键,第二次根据主键拿到整行数据信息,server层再根据整行数据来过滤lastname LIKE ‘%etrunia%’,address LIKE ‘%Main Street%’。如果有ICP的话,第一次交互的过程中,就可以根据lastname LIKE '%etrunia%'进行过滤了。

如上文档只是给了一个例子,而并不是说一定必须是组合索引,条件中包含了最左侧条件,然后是其他索引列的模糊匹配,我们知道非左前缀的模糊匹配,是无法通过B+树进行检索的,所以要通过ICP来进行判断。

二、什么情况下可以用到MySQL ICP特性

首先必要条件就是要打开ICP的开关,默认时开启的,如下

mysql> show global variables like '%optimizer_switch%'
    -> ;
+------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Variable_name    | Value                                                                                                                                                                                                                                                                                                                                                                                                              |
+------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| optimizer_switch | index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=off,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,duplicateweedout=on,subquery_materialization_cost_based=on,use_index_extensions=off,condition_fanout_filter=on,derived_merge=on |
+------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql>
mysql>
mysql> set global optimizer_switch='index_condition_pushdown=on'
    -> ;
Query OK, 0 rows affected (0.00 sec)

mysql> show global variables like '%optimizer_switch%';
+------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Variable_name    | Value                                                                                                                                                                                                                                                                                                                                                                                                              |
+------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| optimizer_switch | index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=off,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,duplicateweedout=on,subquery_materialization_cost_based=on,use_index_extensions=off,condition_fanout_filter=on,derived_merge=on |
+------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

官方手册8.2.1.5 Index Condition Pushdown Optimization中有相关的介绍,关于什么时候可以利用到ICP特性的,如下

  • 只有在access type,也就是执行计划的type列为rang,ref,eq_ref,ref_or_null时,才有可能会用到ICP
  • ICP对于分区表同样生效,无论是Innodb,还是MyISAM。
  • 对于InnoDB类型的表来说,ICP只适用于二级索引,ICP的目的时为了读取整行数据的数量,减少IO,所以对于主键索引来讲,本身就是整行数据,所以不需要ICP
  • 不支持虚拟列上的索引
  • 不支持自查询
  • 引用存储函数的条件不能下推。存储引擎不能调用存储功能
  • 触发条件不能被下推,这点我也没有深入了解下,有时间看下。官方说明Triggered conditions cannot be pushed down. (For information about triggered conditions, see Section 8.2.2.4, “Optimizing Subqueries with the EXISTS Strategy”.)

下面对比下有无ICP时,数据检索过程的区别

关闭ICP时如下:

  • 读取二级索引元组,然后使用索引元组中的主键去聚集索引上定位整行数据
  • 拿其他的where条件来看此行数据是否满足,满足的话保留,不满足的话丢弃

打开ICP时如下:

  • 获取二级索引元组
  • 检测此二级索引元组是否满足下推的条件,如果不满足,读取下一条二级索引元组
  • 如果满足,通过此二级索引元组到主键索引中定位读取整行数据
  • 然后再检测此行数据是否满足其他未能下推的条件,如果满足保留,如果不满足,丢弃。

如果使用到了ICP的话,explain结果的extra列,会有Using index condition.

三、疑问:如下这个sql为什么会使用到ICP呢?

如下:

mysql> show create table test_in\G
*************************** 1. row ***************************
       Table: test_in
Create Table: CREATE TABLE `test_in` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(10) DEFAULT NULL,
  `age` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `age` (`age`),
  KEY `name` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=61596517 DEFAULT CHARSET=utf8mb4
1 row in set (0.00 sec)

mysql> explain select * from test_in where age in (1,2,3,4,5,6,7,8);
+----+-------------+---------+------------+-------+---------------+------+---------+------+------+----------+-----------------------+
| id | select_type | table   | partitions | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+---------+------------+-------+---------------+------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | test_in | NULL       | range | age           | age  | 5       | NULL |    8 |   100.00 | Using index condition |
+----+-------------+---------+------------+-------+---------------+------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

可以看到并不是只有在多列索引中才会用到ICP特性,对于索引条件的in或者or的查询,都会用到。而下推的条件正是in或者or条件。

这种索引下推对性能有很大的提升吗?这部分正在测试研究中。

四、代码控制

关于ICP的流程控制大体上分为两部分,一是server层判断是否可以进行ICP,并且确定ICP的条件,传递给存储引擎;二是存储引擎去真正的执行ICP的过程。

server层处理

server层的处理在sql优化阶段,入口函数为QEP_TAB::push_index_cond(JOIN_TAB const*, unsigned int, Opt_trace_object*) sql_select.cc:1769,如下

 if (condition() &&
      tbl->file->index_flags(keyno, 0, 1) &
      HA_DO_INDEX_COND_PUSHDOWN &&
      hint_key_state(join_->thd, tbl, keyno, ICP_HINT_ENUM,
                     OPTIMIZER_SWITCH_INDEX_CONDITION_PUSHDOWN) &&
      join_->thd->lex->sql_command != SQLCOM_UPDATE_MULTI &&
      join_->thd->lex->sql_command != SQLCOM_DELETE_MULTI &&
      !has_guarded_conds() &&
      type() != JT_CONST && type() != JT_SYSTEM &&
      !(keyno == tbl->s->primary_key &&
        tbl->file->primary_key_is_clustered())) //判断是否满足ICP的各种前置条件
  {
    DBUG_EXECUTE("where", print_where(condition(), "full cond",
                 QT_ORDINARY););
    Item *idx_cond= make_cond_for_index(condition(), tbl,
                                        keyno, other_tbls_ok); //生成下推条件
    DBUG_EXECUTE("where", print_where(idx_cond, "idx cond", QT_ORDINARY););
    if (idx_cond)
    {
      /*
        Check that the condition to push actually contains fields from
        the index. Without any fields from the index it is unlikely
        that it will filter out any records since the conditions on
        fields from other tables in most cases have already been
        evaluated.
      */
      idx_cond->update_used_tables();
      if ((idx_cond->used_tables() & table_ref->map()) == 0)
      {
        /*
          The following assert is to check that we only skip pushing the
          index condition for the following situations:
          1. We actually are allowed to generate an index condition on another
             table.
          2. The index condition is a constant item.
          3. The index condition contains an updatable user variable
             (test this by checking that the RAND_TABLE_BIT is set).
        */
        DBUG_ASSERT(other_tbls_ok ||                                  // 1
                    idx_cond->const_item() ||                         // 2
                    (idx_cond->used_tables() & RAND_TABLE_BIT) );     // 3
        DBUG_VOID_RETURN;
      }

      Item *idx_remainder_cond= 0;

      /*
        For BKA cache we store condition to special BKA cache field
        because evaluation of the condition requires additional operations
        before the evaluation. This condition is used in 
        JOIN_CACHE_BKA[_UNIQUE]::skip_index_tuple() functions.
      */
      if (join_tab->use_join_cache() &&
          /*
            if cache is used then the value is TRUE only 
            for BKA[_UNIQUE] cache (see setup_join_buffering() func).
            In this case other_tbls_ok is an equivalent of
            cache->is_key_access().
          */
          other_tbls_ok &&
          (idx_cond->used_tables() &
           ~(table_ref->map() | join_->const_table_map)))
      {
        cache_idx_cond= idx_cond;
        trace_obj->add("pushed_to_BKA", true);
      }
      else
      {
        idx_remainder_cond= tbl->file->idx_cond_push(keyno, idx_cond);
        DBUG_EXECUTE("where",
                     print_where(tbl->file->pushed_idx_cond, "icp cond", 
                                 QT_ORDINARY););
      }
      /*
        Disable eq_ref's "lookup cache" if we've pushed down an index
        condition. 
        TODO: This check happens to work on current ICP implementations, but
        there may exist a compliant implementation that will not work 
        correctly with it. Sort this out when we stabilize the condition
        pushdown APIs.
      */
      if (idx_remainder_cond != idx_cond)
      {
        ref().disable_cache= TRUE;
        trace_obj->add("pushed_index_condition", idx_cond);
      }

      Item *row_cond= make_cond_remainder(condition(), TRUE);
      DBUG_EXECUTE("where", print_where(row_cond, "remainder cond",
                   QT_ORDINARY););
      
      if (row_cond)
      {
        if (idx_remainder_cond)
          and_conditions(&row_cond, idx_remainder_cond);
        idx_remainder_cond= row_cond;
      }
      set_condition(idx_remainder_cond);
      trace_obj->add("table_condition_attached", idx_remainder_cond);
    }
  }

engine层处理

负责处理ICP逻辑在函数row_search_mvcc(unsigned char*, page_cur_mode_t, row_prebuilt_t*, unsigned long, unsigned long) row0sel.cc:5723,如下

	switch (row_search_idx_cond_check(buf, prebuilt, rec, offsets)) {
	case ICP_NO_MATCH:
		if (did_semi_consistent_read) {
			row_unlock_for_mysql(prebuilt, TRUE);
		}
		goto next_rec;
	case ICP_OUT_OF_RANGE:
		err = DB_RECORD_NOT_FOUND;
		goto idx_cond_failed;
	case ICP_MATCH:
		break;
	}

其中row_search_idx_cond_check来判断索引元组是否匹配下推的条件。

你可能感兴趣的:(MySQL,MySQL-性能测试与优化)