现象:
某日发现比较varchar字符串内容的时候, 大小写不敏感,本来应该走索引的却全表扫描了。
--演示:
创建ta表:
mysql> create table ta(id int unsigned not null auto_increment primary key,keyname varchar(20));
Query OK, 0 rows affected (0.16 sec)
mysql>insert into ta(keyname)values('AB'),('Ab'),('aB'),('ab');
mysql>create index ix_ta on ta(keyname);
--查询:
mysql> select * from ta where keyname='AB';
+----+---------+
| id | keyname |
+----+---------+
| 1 | AB |
| 2 | Ab |
| 3 | aB |
| 4 | ab |
+----+---------+
4 rows in set (0.00 sec)
说明:本意是想精确查询出AB的记录,结果查询出了不区分大小写的所有记录。
--以下为可以精确查找的:
select * from ta where binary keyname='AB';
select * from ta where keyname COLLATE utf8mb4_bin ='AB';
--分析表信息,收集统计信息:
mysql> optimize table ta;
+----------+----------+----------+-------------------------------------------------------------------+
| Table | Op | Msg_type | Msg_text |
+----------+----------+----------+-------------------------------------------------------------------+
| wuhan.ta | optimize | note | Table does not support optimize, doing recreate + analyze instead |
| wuhan.ta | optimize | status | OK |
+----------+----------+----------+-------------------------------------------------------------------+
2 rows in set (2.09 sec)
mysql> show index from ta;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| ta | 0 | PRIMARY | 1 | id | A | 4 | NULL | NULL | | BTREE | | |
| ta | 1 | ix_ta | 1 | keyname | A | 1 | NULL | NULL | YES | BTREE | | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
2 rows in set (0.00 sec)
--查看执行计划:
方法1:
mysql> explain select * from ta where keyname='AB';
+----+-------------+-------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
| 1 | SIMPLE | ta | NULL | ref | ix_ta | ix_ta | 83 | const | 4 | 100.00 | Using index |
+----+-------------+-------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.01 sec)
方法2:
mysql> explain select * from ta where binary keyname='AB';
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+--------------------------+
| 1 | SIMPLE | ta | NULL | index | NULL | ix_ta | 83 | NULL | 4 | 100.00 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)
方法3:
mysql> explain select * from ta where keyname COLLATE utf8mb4_bin ='AB';
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+--------------------------+
| 1 | SIMPLE | ta | NULL | index | NULL | ix_ta | 83 | NULL | 4 | 100.00 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)
可以看到方法1 走索引的最优方法,但是查询出来的数据不准确
方法2 使用index类型,但是查询出来的数据精准。
方法2 使用index 类型,但是查询出来的数据精准。
在生产上由于数据较多,方法2和方法3实际发生了类型隐式转换,索引类型是全表扫描(type:all)
由于在5.7版本我们选择选择使用了utf8mb4字符集,校验规则为utf8mb4_unicode_ci.可以查询表的定义看到:
mysql> show create table ta\G
*************************** 1. row ***************************
Table: ta
Create Table: CREATE TABLE `ta` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`keyname` varchar(20) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `ix_ta` (`keyname`)
) ENGINE=InnoDB AUTO_INCREMENT=9 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
1 row in set (0.00 sec)
采用utf8mb4字符集通常采用的校验规则为utf8mb4_general(速度较快,默认的值),utf8mb4_unicode_ci(比对比较精准),但是二者都是大小写不敏感(case-insensitive)。
mysql5.7版本支持的校验规则为:
SELECT * FROM information_schema.COLLATIONS c WHERE c.CHARACTER_SET_NAME='utf8mb4';
COLLATION_NAME CHARACTER_SET_NAME ID IS_DEFAULT IS_COMPILED SORTLEN
---------------------- ------------------ ------ ---------- ----------- ---------
utf8mb4_general_ci utf8mb4 45 Yes Yes 1
utf8mb4_bin utf8mb4 46 Yes 1
utf8mb4_unicode_ci utf8mb4 224 Yes 8
utf8mb4_icelandic_ci utf8mb4 225 Yes 8
utf8mb4_latvian_ci utf8mb4 226 Yes 8
utf8mb4_romanian_ci utf8mb4 227 Yes 8
utf8mb4_slovenian_ci utf8mb4 228 Yes 8
utf8mb4_polish_ci utf8mb4 229 Yes 8
utf8mb4_estonian_ci utf8mb4 230 Yes 8
utf8mb4_spanish_ci utf8mb4 231 Yes 8
utf8mb4_swedish_ci utf8mb4 232 Yes 8
utf8mb4_turkish_ci utf8mb4 233 Yes 8
utf8mb4_czech_ci utf8mb4 234 Yes 8
utf8mb4_danish_ci utf8mb4 235 Yes 8
utf8mb4_lithuanian_ci utf8mb4 236 Yes 8
utf8mb4_slovak_ci utf8mb4 237 Yes 8
utf8mb4_spanish2_ci utf8mb4 238 Yes 8
utf8mb4_roman_ci utf8mb4 239 Yes 8
utf8mb4_persian_ci utf8mb4 240 Yes 8
utf8mb4_esperanto_ci utf8mb4 241 Yes 8
utf8mb4_hungarian_ci utf8mb4 242 Yes 8
utf8mb4_sinhala_ci utf8mb4 243 Yes 8
utf8mb4_german2_ci utf8mb4 244 Yes 8
utf8mb4_croatian_ci utf8mb4 245 Yes 8
utf8mb4_unicode_520_ci utf8mb4 246 Yes 8
utf8mb4_vietnamese_ci utf8mb4 247 Yes 8
utf8mb4_bin:字符串每个字符串用二进制数据编译存储。 区分大小写,而且可以存二进制的内容.
因此我们可以在实际使用的时候将需要大小写敏感的校验规则设置为utf8mb4_bin,可以精准查找还可以用上索引。
--验证:
mysql> create table tb(id int unsigned not null auto_increment primary key,keyname varchar(20));
Query OK, 0 rows affected (0.03 sec)
mysql> insert into tb(keyname)values('AB'),('Ab'),('aB'),('ab');
Query OK, 4 rows affected (0.00 sec)
Records: 4 Duplicates: 0 Warnings: 0
mysql> create index ix_tb on tb(keyname);
Query OK, 0 rows affected (0.01 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> alter table tb CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;
Query OK, 0 rows affected (0.03 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> show create table tb\G
*************************** 1. row ***************************
Table: tb
Create Table: CREATE TABLE `tb` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`keyname` varchar(20) COLLATE utf8mb4_bin DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `ix_tb` (`keyname`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
1 row in set (0.00 sec)
mysql> optimize table tb;
+----------+----------+----------+-------------------------------------------------------------------+
| Table | Op | Msg_type | Msg_text |
+----------+----------+----------+-------------------------------------------------------------------+
| wuhan.tb | optimize | note | Table does not support optimize, doing recreate + analyze instead |
| wuhan.tb | optimize | status | OK |
+----------+----------+----------+-------------------------------------------------------------------+
2 rows in set (0.06 sec)
mysql> select * from tb where keyname='AB';
+----+---------+
| id | keyname |
+----+---------+
| 1 | AB |
+----+---------+
1 row in set (0.00 sec)
mysql> explain select * from tb where keyname='AB';
+----+-------------+-------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
| 1 | SIMPLE | tb | NULL | ref | ix_tb | ix_tb | 83 | const | 1 | 100.00 | Using index |
+----+-------------+-------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
可以看到使用utf8mb4_bin的校验规则的表过滤的rows为1,type为:ref.
对比下使用utf8mb4_unicode_ci校验规则的表过滤的rows为4,type为:ref.
mysql> explain select * from ta where keyname='AB';
+----+-------------+-------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
| 1 | SIMPLE | ta | NULL | ref | ix_ta | ix_ta | 83 | const | 4 | 100.00 | Using index |
+----+-------------+-------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
对比下ta和tb表:
mysql> show index from ta;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| ta | 0 | PRIMARY | 1 | id | A | 4 | NULL | NULL | | BTREE | | |
| ta | 1 | ix_ta | 1 | keyname | A | 1 | NULL | NULL | YES | BTREE | | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
2 rows in set (0.00 sec)
mysql> show index from tb;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| tb | 0 | PRIMARY | 1 | id | A | 4 | NULL | NULL | | BTREE | | |
| tb | 1 | ix_tb | 1 | keyname | A | 4 | NULL | NULL | YES | BTREE | | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
2 rows in set (0.00 sec)
主要查看Cardinality值,ta为1,tb为4.
--关联查询验证:
mysql> select a.keyname,b.keyname from ta a inner join tb b on a.keyname=b.keyname where b.keyname='AB';
+---------+---------+
| keyname | keyname |
+---------+---------+
| AB | AB |
+---------+---------+
1 row in set (0.00 sec)
mysql> select a.keyname,b.keyname from ta a inner join tb b on a.keyname=b.keyname where a.keyname='AB';
+---------+---------+
| keyname | keyname |
+---------+---------+
| AB | AB |
| Ab | Ab |
| aB | aB |
| ab | ab |
+---------+---------+
4 rows in set (0.01 sec)
--执行计划:
mysql> explain select a.keyname,b.keyname from ta a inner join tb b on a.keyname=b.keyname where b.keyname='AB'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: b
partitions: NULL
type: ref
possible_keys: ix_tb
key: ix_tb
key_len: 83
ref: const
rows: 1
filtered: 100.00
Extra: Using index
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: a
partitions: NULL
type: index
possible_keys: ix_ta
key: ix_ta
key_len: 83
ref: NULL
rows: 4
filtered: 100.00
Extra: Using where; Using index; Using join buffer (Block Nested Loop)
2 rows in set, 2 warnings (0.00 sec)
mysql> explain select a.keyname,b.keyname from ta a inner join tb b on a.keyname=b.keyname where a.keyname='AB'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: a
partitions: NULL
type: ref
possible_keys: ix_ta
key: ix_ta
key_len: 83
ref: const
rows: 4
filtered: 100.00
Extra: Using where; Using index
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: b
partitions: NULL
type: ref
possible_keys: ix_tb
key: ix_tb
key_len: 83
ref: wuhan.a.keyname
rows: 1
filtered: 100.00
Extra: Using where; Using index
2 rows in set, 2 warnings (0.00 sec)
--备注:
--修改表的默认字符集:
alter table ta default CHARACTER SET utf8mb4;
--修改表的默认字符集和校验规则:
alter table tb CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;
-- 修改某列的校验规则和字符集:
mysql> alter table ta change keyname keyname varchar(20) CHARACTER SET utf8 COLLATE utf8_bin;
Query OK, 4 rows affected (0.04 sec)
Records: 4 Duplicates: 0 Warnings: 0
--结论:
在实际生产环境中若有需求:
1.英文字符严格区分大小写
2.在varchar类型中建立索引并可用。
此时需要将该字段设置为utf8mb4_bin的校验规则。