MySQL对COUNT(*)一直在优化。
# MySQL版本是8.0.20
[[email protected]]>\s
...
Server version: 8.0.20 MySQL Community Server - GPL
...
# 确认 long_query_time
[[email protected]]>select @@global.long_query_time, @@session.long_query_time;
+--------------------------+---------------------------+
| @@global.long_query_time | @@session.long_query_time |
+--------------------------+---------------------------+
| 0.010000 | 0.010000 |
+--------------------------+---------------------------+
# 执行 COUNT(*),耗时超过 0.01,但slow log没有记录
[[email protected]]>select count(*) from t1;
+----------+
| count(*) |
+----------+
| 799994 |
+----------+
1 row in set (0.27 sec)
这到底是为什么呢?
[[email protected]]>show global variables;
...
| log_slow_admin_statements | OFF |
| log_slow_extra | ON |
| log_slow_slave_statements | OFF |
| long_query_time | 0.010000 |
| slow_query_log | ON |
| slow_query_log_file | slow.log |
| log_output | FILE |
| min_examined_row_limit | 100 |
| log_queries_not_using_indexes | 1 |
| log_throttle_queries_not_using_indexes | 60 |
...
上面几个参数中,比较可疑有下面几个:
| min_examined_row_limit | 100 |
| log_queries_not_using_indexes | 1 |
| log_throttle_queries_not_using_indexes | 60 |
先说 log_queries_not_using_indexes,这表示把没有使用索引的SQL也当成slow query记录下来,但在本例中,是有走索引的:
[[email protected]]>desc select count(*) from t1\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
partitions: NULL
type: index
possible_keys: NULL
key: PRIMARY
key_len: 4
ref: NULL
rows: 42760
filtered: 100.00
Extra: Using index
由此也顺便排除了参数 log_throttle_queries_not_using_indexes 的嫌疑。
那么,只剩下参数 min_examined_row_limit 的嫌疑,它表示当扫描行数少于设定值时,这个SQL也不会被当做slow query记录下来。
那么,本例中的COUNT(*)是否符合这种情况呢?
我们先把参数 min_examined_row_limit 值设置为 0,也就是默认值。
[[email protected]]>set global min_examined_row_limit=0;
[[email protected]]>set session min_examined_row_limit=0;
[[email protected]]>select @@global.min_examined_row_limit, @@session.min_examined_row_limit;
+---------------------------------+----------------------------------+
| @@global.min_examined_row_limit | @@session.min_examined_row_limit |
+---------------------------------+----------------------------------+
| 0 | 0 |
+---------------------------------+----------------------------------+
再执行一次 COUNT(*) 查询
[[email protected]]>select count(*) from t1;
+----------+
| count(*) |
+----------+
| 43462 |
+----------+
1 row in set (0.02 sec)
果然,这次被记录到slow log中了
# Query_time: 0.026083 Lock_time: 0.000110 Rows_sent: 1 Rows_examined: 0
...
select count(*) from t1;
注意到 Rows_examined 的值是 0,嗯,好像不太科学?
到这里,原因查明了,参数 min_examined_row_limit 的值设置大于 0 了,而本例中的 COUNT(*) 操作因为 Rows_examined=0,所以不会被记录到slow log中。
• Rows_examined:
The number of rows examined by the server layer (not counting any processing internal to storage engines).
可能字面意思上看起来不太好理解,换个思路,其实就是我们执行完一个SQL后,查看状态变量中名为 Handler_read_% 的几个指标即可,例如:
[[email protected]]> flush status;
[[email protected]]> select count(*) from t1;
...
[[email protected]]> show status like 'handler%read%';
+-----------------------+-------+
| Variable_name | Value |
+-----------------------+-------+
| Handler_read_first | 0 |
| Handler_read_key | 0 |
| Handler_read_last | 0 |
| Handler_read_next | 0 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 0 |
| Handler_read_rnd_next | 0 |
+-----------------------+-------+
可以看到以上几个值均为 0,因此 Rows_examined 也为 0,就不会被记录到slow log中了。
3.1 关于聚集索引并行读
说到这里,我还要隆重介绍MySQL 8.0的另一个新特性。
从8.0.14版本起,新增参数 innodb_parallel_read_threads,支持对聚集索引的并行扫描,需要满足以下几个条件:
参数 innodb_parallel_read_threads 值 > 0
只支持聚集索引
只支持无锁查询
不是INSERT…SELECT查询
主要用于加速以下两种场景:
CHECK TABLE操作
不带WHERE条件的全表COUNT(*)
因此,COUNT(*)也是可以并行读聚集索引的,从error log中可以看到类似下面的信息:
[Note] [MY-011825] [InnoDB] Parallel scan: 4
[Note] [MY-011825] [InnoDB] ranges: 130 max_threads: 4 split: 128 depth: 1
[Note] [MY-011825] [InnoDB] n: 20914
[Note] [MY-011825] [InnoDB] n: 18066
[Note] [MY-011825] [InnoDB] n: 4482
从上述日志能看出来几点:
设置了最高4个并行线程
实际并行3个线程,实际并行数从1~4,不一定每次都跑最高并行
分别扫描行数是 20914、18066、4482,即 COUNT() 结果总数是 43462
对t1表加上一个辅助索引后,再来看下面这个COUNT(*)
# 看起来这个查询是走辅助索引
[[email protected]]>desc select count(*) from t1\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
partitions: NULL
type: index
possible_keys: NULL
key: k1
key_len: 5
ref: NULL
rows: 42760
filtered: 100.00
Extra: Using index
# 实际执行一把
[[email protected]]>select count(*) from t1;
+----------+
| count(*) |
+----------+
| 43462 |
+----------+
1 row in set (0.01 sec)
# 发现 Handler_read_% 的值还是 0
[[email protected]]>show status like 'handler%read%';
+-----------------------+-------+
| Variable_name | Value |
+-----------------------+-------+
| Handler_read_first | 0 |
| Handler_read_key | 0 |
| Handler_read_last | 0 |
| Handler_read_next | 0 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 0 |
| Handler_read_rnd_next | 0 |
+-----------------------+-------+
且此时error log中依然有并行扫描的记录
[Note] [MY-011825] [InnoDB] Parallel scan: 4
[Note] [MY-011825] [InnoDB] ranges: 91 max_threads: 4 split: 88 depth: 1
[Note] [MY-011825] [InnoDB] n: 21493
[Note] [MY-011825] [InnoDB] n: 21486
[Note] [MY-011825] [InnoDB] n: 483
看到了么,实际上还是用到了聚集索引的并行扫描特性来加速。
提醒:上述error log中记录并行扫描聚集索引信息的功能在8.0.20中又被去掉了,上面之所以能看到这段日志是因为我排查到后面又回退到8.0.19版本了,有点费劲。。。
看下8.0.20 Release Notes
InnoDB: Unnecessary messages about parallel scans were printed to the error log. (Bug #30330448)
其实留着不挺好的嘛,搞不懂为毛要去掉的说。。。