MySQL: 执行计划中filtered的计算方式和影响


filtered主要的功能是说明,引擎层扫描数据后根据额外的where条件过滤后剩下值的一个百分比,因为遇到一个问题和其相关,不由得想看看其估算过程,主要考虑如下

  • 主要代码5.7.22
  • 主要考虑等值过滤条件
  • 8.0.28带直方图测试

一、计算方式

经过debug发现其实filter的计算还是比较简单的,其主要函数为calculate_condition_filter

其中定义了几个宏
/// Filtering effect for equalities: col1 = col2
#define COND_FILTER_EQUALITY 0.1f
/// Filtering effect for inequalities: col1 > col2
#define COND_FILTER_INEQUALITY 0.3333f
/// Filtering effect for between: col1 BETWEEN a AND b
#define COND_FILTER_BETWEEN 0.1111f

对于等值来讲我们用到的就是COND_FILTER_EQUALITY宏,其固定为10%,下列的估算方式这个宏会参与计算。主要的计算流程为:

1. 每个不能使用索引的where条件计算为 max(1/n_rows,0.1)

这个部分来自Item_func_eq::get_filtering_effect,这里的Item_func_eq就是每个where等值条件,可以清晰看到

  return fld->get_cond_filter_default_probability(rows_in_table,
                                                  COND_FILTER_EQUALITY);//0.1f
                                                  
Item_field::get_cond_filter_default_probability

return std::max(static_cast(1/max_distinct_values), default_filter);   
                                              

2. 关于n_rows到底是什么?

那么这里有一个主要的问题,就是这个n_rows 行数到底代表的是什么,首先在函数calculate_condition_filter中可以看到,如下代码

    filtered*=
      tab->join()->where_cond->get_filtering_effect(tab->table_ref->map(),
                                                    used_tables,
                                                    &table->tmp_set,
                                          static_cast(tab->records()));

而这里的where_cond就是我们where条件后面用不到索引的那些条件,其中行数比较关键为tab->records(),如果简单看一下实际上这个值为

QEP_shared中的(执行计划相关)
  /**
     Either #rows in the table or 1 for const table.
     Used in optimization, and also in execution for FOUND_ROWS().
  */
  ha_rows   m_records;

其设置为 QEP_shared::set_records,

image.png

最终可以看到这个值实际上来自

tab->set_records(tab->found_records= tab->table()->file->stats.records);
其中tab->table()返回对象为TABLE结构体,也就是我们常说的table_cache

它就是innodb底层的统计数据,代表是表中预估的行数,可在函数ha_innobase::info_low中找到如下:

n_rows = ib_table->stat_n_rows;
....
stats.records = (ha_rows) n_rows;

简单说就是mysql.innodb_table_stats/information_schema.tables下那个表数据的行数。

3. 如果有多个条件则通过各自filter相乘的方式进行计算,得到最终值

这部分来自Item_cond_and::get_filtering_effect,and 也是一个item,其下面包含2个time,代码和注释说明了一切

  /*
    Calculated as "Conjunction of independent events":
       P(A and B ...) = P(A) * P(B) * ...
  */
  while ((item= it++))
    filtered*= item->get_filtering_effect(filter_for_table,
                                        read_tables,
                                        fields_to_ignore,
                                        rows_in_table);
  return filtered;

这个就不需要过多解释了吧。

4. 如果最终计算的filter和1/num_rows之间判断,取max(1/n_rows,0.1)

注意这里和前面不同,第1点是每个where条件的filter不同,这里是最终计算的filter

filtered 在 calculate_condition_filter 中进行计算,如下

filtered= max(filtered, 1.0f / tab->records());

5. 如果预估扫描的行数*计算的最终filter 小于 0.05则取0.05

注意这里和前面不同,第1点是每个where条件的filter不同,这里是最终计算的filtered 在 calculate_condition_filter 中进行计算,如下

  if ((filtered * fanout) < 0.05f)  //扇出
    filtered= 0.05f/static_cast(fanout);  

总的来说,filtered几乎是一个预估的计算,可以作为参考但是一般不会太准,其参与计算的元素主要包含:

  • 0.1 固定写死值
  • 0.05 固定写死的值
  • 1/num_rows 也没有过多的考虑偏移量的问题,其中num_rows 为表的总行数
  • 预估引擎层扫描的行数

这就可能导致执行计划中join 之类的选择错驱动表,因为这种算法的过滤性完全是估算的,没有基数等统计数据作为标准(或者索引下探动态统计dive ),当然到了8.0可以考虑直方图,最后我们来进行简单测试。下面来测试一下,看看是不是这样计算的。

二、测试

先建立2个表如下:

mysql> create table testfil(a int,b int,c int);
Query OK, 0 rows affected (0.01 sec)

mysql> insert into testfil values(1,1,1);
Query OK, 1 row affected (0.00 sec)

mysql> insert into testfil values(2,2,2);
Query OK, 1 row affected (0.01 sec)

mysql> insert into testfil values(3,3,3);
Query OK, 1 row affected (0.01 sec)

mysql> insert into testfil values(4,4,4);
Query OK, 1 row affected (0.00 sec)

mysql> insert into testfil values(4,4,4);
Query OK, 1 row affected (0.00 sec)

mysql> insert into testfil values(4,4,4);
Query OK, 1 row affected (0.00 sec)

mysql> insert into testfil values(4,4,4);
Query OK, 1 row affected (0.01 sec)

mysql> insert into testfil values(3,3,3);
Query OK, 1 row affected (0.00 sec)

mysql> select * from testfil;
+------+------+------+
| a    | b    | c    |
+------+------+------+
|    1 |    1 |    1 |
|    2 |    2 |    2 |
|    4 |    4 |    4 |
|    4 |    4 |    4 |
|    4 |    4 |    4 |
|    4 |    4 |    4 |
|    3 |    3 |    3 |
+------+------+------+
7 rows in set (0.00 sec)

mysql> alter table testfil add key(a);
Query OK, 0 rows affected (0.02 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> create table testfil2 like testfil;
Query OK, 0 rows affected (0.01 sec)

mysql> insert into testfil2 select * from testfil;
Query OK, 7 rows affected (0.01 sec)
Records: 7  Duplicates: 0  Warnings: 0
...
mysql> insert into testfil2 select * from testfil2;
Query OK, 14336 rows affected (4.91 sec)
Records: 14336  Duplicates: 0  Warnings: 0

mysql> select table_name,n_rows from mysql.innodb_table_stats where table_name in ('testfil','testfil2');
+------------+--------+
| table_name | n_rows |
+------------+--------+
| testfil    |      7 |
| testfil2   |  28755 |
+------------+--------+
2 rows in set (0.09 sec)

mysql> select table_name,TABLE_ROWS from information_schema.tables  where table_name in ('testfil','testfil2');
+------------+------------+
| table_name | TABLE_ROWS |
+------------+------------+
| testfil    |          7 |
| testfil2   |      28755 |
+------------+------------+
2 rows in set (0.13 sec)

这里主要是建立2个表1个小表1个大表,我们分别来看。

  • 测试单个where条件,小表
mysql> desc select * from testfil where b=1 ;
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table   | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | testfil | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    7 |    14.29 | Using where |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.01 sec)

mysql> desc select * from testfil where b=4 ;
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table   | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | testfil | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    7 |    14.29 | Using where |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

这里我们明显的看到,虽然等于4的行是比较多的,等于1的行只有1行,但是他们计算的filtered都是1/7左右,又因为1/7 > 0.1 因此显示为14.29%,很明显没有考虑数据的倾斜度

  • 测试单个where条件,大表
mysql> desc select * from testfil2 where b=1 ;
+----+-------------+----------+------------+------+---------------+------+---------+------+-------+----------+-------------+
| id | select_type | table    | partitions | type | possible_keys | key  | key_len | ref  | rows  | filtered | Extra       |
+----+-------------+----------+------------+------+---------------+------+---------+------+-------+----------+-------------+
|  1 | SIMPLE      | testfil2 | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 28755 |    10.00 | Using where |
+----+-------------+----------+------------+------+---------------+------+---------+------+-------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql> desc select * from testfil2 where b=4 ;
+----+-------------+----------+------------+------+---------------+------+---------+------+-------+----------+-------------+
| id | select_type | table    | partitions | type | possible_keys | key  | key_len | ref  | rows  | filtered | Extra       |
+----+-------------+----------+------------+------+---------------+------+---------+------+-------+----------+-------------+
|  1 | SIMPLE      | testfil2 | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 28755 |    10.00 | Using where |
+----+-------------+----------+------------+------+---------------+------+---------+------+-------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

这里也显然,1/28755< 0.1,因此都显示10%,不管数据偏移量。

  • 测试两个个where条件,小表
mysql> desc select * from testfil where b=1 and c=1;
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table   | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | testfil | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    7 |    14.29 | Using where |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.01 sec)

mysql> desc select * from testfil where b=4 and c=4;
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table   | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | testfil | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    7 |    14.29 | Using where |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

这里依旧显示14.29%,因为filter初始化计算出来为(1/7)*(1/7),根据第4点小于1/7,依旧算1/7,也就是14.29%

DEBUG如下

1416      filtered= max(filtered, 1.0f / tab->records());
(gdb) p filtered
$1 = 0.0204081647 ((1/7)*(1/7))
(gdb) n
1428      if ((filtered * fanout) < 0.05f)
(gdb) p filtered
$2 = 0.142857149

依旧不考虑数据切斜度。

  • 测试两个where条件,大表
mysql> desc select * from testfil2 where b=1 and c=1;
+----+-------------+----------+------------+------+---------------+------+---------+------+-------+----------+-------------+
| id | select_type | table    | partitions | type | possible_keys | key  | key_len | ref  | rows  | filtered | Extra       |
+----+-------------+----------+------------+------+---------------+------+---------+------+-------+----------+-------------+
|  1 | SIMPLE      | testfil2 | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 28755 |     1.00 | Using where |
+----+-------------+----------+------------+------+---------------+------+---------+------+-------+----------+-------------+
1 row in set, 1 warning (0.01 sec)

mysql> desc select * from testfil2 where b=4 and c=4;
+----+-------------+----------+------------+------+---------------+------+---------+------+-------+----------+-------------+
| id | select_type | table    | partitions | type | possible_keys | key  | key_len | ref  | rows  | filtered | Extra       |
+----+-------------+----------+------------+------+---------------+------+---------+------+-------+----------+-------------+
|  1 | SIMPLE      | testfil2 | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 28755 |     1.00 | Using where |
+----+-------------+----------+------------+------+---------------+------+---------+------+-------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

这里很显然就是两个0.1相乘,0.1*0.1=0.01 也就是1%,而0.01 > 1/28755 ,因此直接显示

  • 测试带索引定位数据扫描和两个where条件

这种情况主要是要测试最后一点就是第5点的0.05测试出来,建立的表如下:

mysql> show create table testauto \G
*************************** 1. row ***************************
       Table: testauto
Create Table: CREATE TABLE `testauto` (
  `a` int(11) NOT NULL,
  `b` int(11) DEFAULT NULL,
  `c` int(11) DEFAULT NULL,
  KEY `a` (`a`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
1 row in set (0.00 sec)

mysql> select count(*) from testauto;
+----------+
| count(*) |
+----------+
|    16384 |
+----------+
1 row in set (0.77 sec)

mysql> select table_name,n_rows from mysql.innodb_table_stats where table_name in ('testauto');
+------------+--------+
| table_name | n_rows |
+------------+--------+
| testauto   |  16188 |
+------------+--------+
1 row in set (0.10 sec)

这里字段a有索引,且里面的数字是唯一值,那么如下语句

mysql> desc select * from testauto where a=100 and b=1 and c=1;
+----+-------------+----------+------------+------+---------------+------+---------+-------+------+----------+-------------+
| id | select_type | table    | partitions | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra       |
+----+-------------+----------+------------+------+---------------+------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | testauto | NULL       | ref  | a             | a    | 4       | const |    1 |     5.00 | Using where |
+----+-------------+----------+------------+------+---------------+------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.01 sec)

我们看到filtered为5%,也就是第5点生效,因为 rows=1,计算出来的filter应该为0.01,而1*0.01< 0.05 所以直接置为0.05,也就是5%。看下DEBUG

1416      filtered= max(filtered, 1.0f / tab->records());
(gdb) 
1428      if ((filtered * fanout) < 0.05f)
(gdb) p filtered
$3 = 0.0100000007
(gdb) p fanout
$4 = 1
(gdb) n
1429        filtered= 0.05f/static_cast(fanout);
(gdb) n
1433      bitmap_clear_all(&table->tmp_set);
(gdb) p filtered
$5 = 0.0500000007

  • join测试
mysql> desc select * from testauto a,testfil2 b  where a.a=b.a and a.b=1 and a.c=1;
+----+-------------+-------+------------+------+---------------+------+---------+--------+-------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref    | rows  | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+--------+-------+----------+-------------+
|  1 | SIMPLE      | a     | NULL       | ALL  | a             | NULL | NULL    | NULL   | 16188 |     1.00 | Using where |
|  1 | SIMPLE      | b     | NULL       | ref  | a             | a    | 5       | tp.a.a |     1 |   100.00 | NULL        |
+----+-------------+-------+------------+------+---------------+------+---------+--------+-------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)

mysql> desc select * from testauto a,testfil2 b  where a.a=b.a and a.b=1 and a.c=1 and a.a=1;
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | a     | NULL       | ref  | a             | a    | 4       | const |    1 |     5.00 | Using where |
|  1 | SIMPLE      | b     | NULL       | ref  | a             | a    | 5       | const | 4096 |   100.00 | NULL        |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)

mysql> desc select * from testauto a,testfil2 b  where a.a=b.a and a.b=1 and a.c=1 and a.a=1 and b.c=1;
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | a     | NULL       | ref  | a             | a    | 4       | const |    1 |     5.00 | Using where |
|  1 | SIMPLE      | b     | NULL       | ref  | a             | a    | 5       | const | 4096 |    10.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------------+
2 rows in set, 1 warning (0.01 sec)

mysql> desc select * from testauto a,testfil2 b  where a.a=b.a and a.b=1 and a.c=1 and a.a=1 and b.c=1 and b.b=1;
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | a     | NULL       | ref  | a             | a    | 4       | const |    1 |     5.00 | Using where |
|  1 | SIMPLE      | b     | NULL       | ref  | a             | a    | 5       | const | 4096 |     1.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------------+

这里过多分析了,实际上join中依旧生效,只是a.a=b.a关联条件要作为被驱动表的一个where条件来考虑。

三、8.0 简单测试(8.0.28)

依旧使用上面的表,进行测试

  • 不带直方图
mysql> desc select * from testfil where b=4  and c=4;
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table   | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | testfil | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    7 |    14.29 | Using where |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql> desc select * from testfil2 where b=4  and c=4;
+----+-------------+----------+------------+------+---------------+------+---------+------+-------+----------+-------------+
| id | select_type | table    | partitions | type | possible_keys | key  | key_len | ref  | rows  | filtered | Extra       |
+----+-------------+----------+------------+------+---------------+------+---------+------+-------+----------+-------------+
|  1 | SIMPLE      | testfil2 | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 86265 |     1.00 | Using where |
+----+-------------+----------+------------+------+---------------+------+---------+------+-------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

依旧不考虑数据的倾斜度。

  • 收集直方图后执行

我这里不同的数据很少,使用等宽直方图就可以了,如下:


mysql> ANALYZE TABLE testfil UPDATE HISTOGRAM ON b,c WITH 10 BUCKETS;
+----------------+-----------+----------+----------------------------------------------+
| Table          | Op        | Msg_type | Msg_text                                     |
+----------------+-----------+----------+----------------------------------------------+
| mytest.testfil | histogram | status   | Histogram statistics created for column 'b'. |
| mytest.testfil | histogram | status   | Histogram statistics created for column 'c'. |
+----------------+-----------+----------+----------------------------------------------+
2 rows in set (0.04 sec)

mysql> ANALYZE TABLE testfil2 UPDATE HISTOGRAM ON b,c WITH 10 BUCKETS;
+-----------------+-----------+----------+----------------------------------------------+
| Table           | Op        | Msg_type | Msg_text                                     |
+-----------------+-----------+----------+----------------------------------------------+
| mytest.testfil2 | histogram | status   | Histogram statistics created for column 'b'. |
| mytest.testfil2 | histogram | status   | Histogram statistics created for column 'c'. |
+-----------------+-----------+----------+----------------------------------------------+
2 rows in set (5.21 sec)

mysql> desc select * from testfil where b=4  and c=4;
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table   | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | testfil | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    7 |    32.65 | Using where |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql> desc select * from testfil2 where b=4  and c=4;
+----+-------------+----------+------------+------+---------------+------+---------+------+-------+----------+-------------+
| id | select_type | table    | partitions | type | possible_keys | key  | key_len | ref  | rows  | filtered | Extra       |
+----+-------------+----------+------------+------+---------------+------+---------+------+-------+----------+-------------+
|  1 | SIMPLE      | testfil2 | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 86265 |    32.65 | Using where |
+----+-------------+----------+------------+------+---------------+------+---------+------+-------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql> 

我们可以看到直方图确实影响了filtered的计算,也更加准确了,但是8.0的直方图默认没有收集,因此使用不多,我的了解也比较有限。如果不使用直方图filter还是5.7的计算方式。

附录:使用的BUG点

1       breakpoint     keep y   0x0000000000eb13ac in main(int, char**) at /opt/percona-server-locks-detail-5.7.22/sql/main.cc:25
        breakpoint already hit 1 time
2       breakpoint     keep y   0x0000000001727699 in Explain_join::explain_rows_and_filtered() at /opt/percona-server-locks-detail-5.7.22/sql/opt_explain.cc:1480
        breakpoint already hit 14 times
3       breakpoint     keep y   0x0000000001589411 in calculate_condition_filter(JOIN_TAB const*, Key_use const*, unsigned long long, double, bool) 
                                                   at /opt/percona-server-locks-detail-5.7.22/sql/sql_planner.cc:1219
        breakpoint already hit 26 times
5       breakpoint     keep y   0x0000000000fadb41 in Item_equal::get_filtering_effect(unsigned long long, unsigned long long, st_bitmap const*, double) 
                                                   at /opt/percona-server-locks-detail-5.7.22/sql/item_cmpfunc.cc:7478
        breakpoint already hit 4 times
6       breakpoint     keep y   0x0000000000f83dc2 in Item_field::get_cond_filter_default_probability(double, float) const at /opt/percona-server-locks-detail-5.7.22/sql/item.cc:7905
        breakpoint already hit 3 times

你可能感兴趣的:(MySQL: 执行计划中filtered的计算方式和影响)