MySQL 5.7 varchar类型字符大小写比较

现象:
某日发现比较varchar字符串内容的时候, 大小写不敏感,本来应该走索引的却全表扫描了。

--演示:
创建ta表:
mysql> create table ta(id int unsigned not null auto_increment primary key,keyname varchar(20));  
Query OK, 0 rows affected (0.16 sec)
mysql>insert into ta(keyname)values('AB'),('Ab'),('aB'),('ab');
mysql>create index ix_ta on ta(keyname);
--查询:
mysql> select * from ta where keyname='AB';
+----+---------+
| id | keyname |
+----+---------+
|  1 | AB      |
|  2 | Ab      |
|  3 | aB      |
|  4 | ab      |
+----+---------+
4 rows in set (0.00 sec)
说明:本意是想精确查询出AB的记录,结果查询出了不区分大小写的所有记录。
--以下为可以精确查找的:
select * from ta where binary keyname='AB';
select * from ta where  keyname COLLATE utf8mb4_bin ='AB';
--分析表信息,收集统计信息:
mysql> optimize table ta;
+----------+----------+----------+-------------------------------------------------------------------+
| Table    | Op       | Msg_type | Msg_text                                                          |
+----------+----------+----------+-------------------------------------------------------------------+
| wuhan.ta | optimize | note     | Table does not support optimize, doing recreate + analyze instead |
| wuhan.ta | optimize | status   | OK                                                                |
+----------+----------+----------+-------------------------------------------------------------------+
2 rows in set (2.09 sec)
mysql> show index from ta;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| ta    |          0 | PRIMARY  |            1 | id          | A         |           4 |     NULL | NULL   |      | BTREE      |         |               |
| ta    |          1 | ix_ta    |            1 | keyname     | A         |           1 |     NULL | NULL   | YES  | BTREE      |         |               |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
2 rows in set (0.00 sec)
--查看执行计划:
方法1:
mysql> explain select * from ta where keyname='AB';
+----+-------------+-------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key   | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | ta    | NULL       | ref  | ix_ta         | ix_ta | 83      | const |    4 |   100.00 | Using index |
+----+-------------+-------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.01 sec)
方法2:
mysql> explain select * from ta where binary keyname='AB';
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys | key   | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | ta    | NULL       | index | NULL          | ix_ta | 83      | NULL |    4 |   100.00 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)
方法3:
mysql> explain select * from ta where  keyname COLLATE utf8mb4_bin ='AB';                                                          
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys | key   | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | ta    | NULL       | index | NULL          | ix_ta | 83      | NULL |    4 |   100.00 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)

可以看到方法1 走索引的最优方法,但是查询出来的数据不准确
方法2 使用index类型,但是查询出来的数据精准。
方法2 使用index 类型,但是查询出来的数据精准。
在生产上由于数据较多,方法2和方法3实际发生了类型隐式转换,索引类型是全表扫描(type:all)
由于在5.7版本我们选择选择使用了utf8mb4字符集,校验规则为utf8mb4_unicode_ci.可以查询表的定义看到:
mysql> show create table ta\G
*************************** 1. row ***************************
       Table: ta
Create Table: CREATE TABLE `ta` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `keyname` varchar(20) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `ix_ta` (`keyname`)
) ENGINE=InnoDB AUTO_INCREMENT=9 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
1 row in set (0.00 sec)


采用utf8mb4字符集通常采用的校验规则为utf8mb4_general(速度较快,默认的值),utf8mb4_unicode_ci(比对比较精准),但是二者都是大小写不敏感(case-insensitive)。
mysql5.7版本支持的校验规则为:
SELECT * FROM information_schema.COLLATIONS c WHERE c.CHARACTER_SET_NAME='utf8mb4';
COLLATION_NAME          CHARACTER_SET_NAME      ID  IS_DEFAULT  IS_COMPILED  SORTLEN  
----------------------  ------------------  ------  ----------  -----------  ---------
utf8mb4_general_ci      utf8mb4                 45  Yes         Yes                  1
utf8mb4_bin             utf8mb4                 46              Yes                  1
utf8mb4_unicode_ci      utf8mb4                224              Yes                  8
utf8mb4_icelandic_ci    utf8mb4                225              Yes                  8
utf8mb4_latvian_ci      utf8mb4                226              Yes                  8
utf8mb4_romanian_ci     utf8mb4                227              Yes                  8
utf8mb4_slovenian_ci    utf8mb4                228              Yes                  8
utf8mb4_polish_ci       utf8mb4                229              Yes                  8
utf8mb4_estonian_ci     utf8mb4                230              Yes                  8
utf8mb4_spanish_ci      utf8mb4                231              Yes                  8
utf8mb4_swedish_ci      utf8mb4                232              Yes                  8
utf8mb4_turkish_ci      utf8mb4                233              Yes                  8
utf8mb4_czech_ci        utf8mb4                234              Yes                  8
utf8mb4_danish_ci       utf8mb4                235              Yes                  8
utf8mb4_lithuanian_ci   utf8mb4                236              Yes                  8
utf8mb4_slovak_ci       utf8mb4                237              Yes                  8
utf8mb4_spanish2_ci     utf8mb4                238              Yes                  8
utf8mb4_roman_ci        utf8mb4                239              Yes                  8
utf8mb4_persian_ci      utf8mb4                240              Yes                  8
utf8mb4_esperanto_ci    utf8mb4                241              Yes                  8
utf8mb4_hungarian_ci    utf8mb4                242              Yes                  8
utf8mb4_sinhala_ci      utf8mb4                243              Yes                  8
utf8mb4_german2_ci      utf8mb4                244              Yes                  8
utf8mb4_croatian_ci     utf8mb4                245              Yes                  8
utf8mb4_unicode_520_ci  utf8mb4                246              Yes                  8
utf8mb4_vietnamese_ci   utf8mb4                247              Yes                  8
utf8mb4_bin:字符串每个字符串用二进制数据编译存储。 区分大小写,而且可以存二进制的内容.

因此我们可以在实际使用的时候将需要大小写敏感的校验规则设置为utf8mb4_bin,可以精准查找还可以用上索引。

--验证:
mysql> create table tb(id int unsigned not null auto_increment primary key,keyname varchar(20));
Query OK, 0 rows affected (0.03 sec)

mysql> insert into tb(keyname)values('AB'),('Ab'),('aB'),('ab');
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> create index ix_tb on tb(keyname);
Query OK, 0 rows affected (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> alter table tb CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;
Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql>  show create table tb\G
*************************** 1. row ***************************
       Table: tb
Create Table: CREATE TABLE `tb` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `keyname` varchar(20) COLLATE utf8mb4_bin DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `ix_tb` (`keyname`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
1 row in set (0.00 sec)

mysql> optimize table tb;
+----------+----------+----------+-------------------------------------------------------------------+
| Table    | Op       | Msg_type | Msg_text                                                          |
+----------+----------+----------+-------------------------------------------------------------------+
| wuhan.tb | optimize | note     | Table does not support optimize, doing recreate + analyze instead |
| wuhan.tb | optimize | status   | OK                                                                |
+----------+----------+----------+-------------------------------------------------------------------+
2 rows in set (0.06 sec)

mysql> select * from tb where keyname='AB';
+----+---------+
| id | keyname |
+----+---------+
|  1 | AB      |
+----+---------+
1 row in set (0.00 sec)

mysql> explain select * from tb where keyname='AB';
+----+-------------+-------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key   | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | tb    | NULL       | ref  | ix_tb         | ix_tb | 83      | const |    1 |   100.00 | Using index |
+----+-------------+-------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
可以看到使用utf8mb4_bin的校验规则的表过滤的rows为1,type为:ref.
对比下使用utf8mb4_unicode_ci校验规则的表过滤的rows为4,type为:ref.
mysql> explain select * from ta where keyname='AB';
+----+-------------+-------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key   | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | ta    | NULL       | ref  | ix_ta         | ix_ta | 83      | const |    4 |   100.00 | Using index |
+----+-------------+-------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
对比下ta和tb表:
mysql> show index from ta;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| ta    |          0 | PRIMARY  |            1 | id          | A         |           4 |     NULL | NULL   |      | BTREE      |         |               |
| ta    |          1 | ix_ta    |            1 | keyname     | A         |           1 |     NULL | NULL   | YES  | BTREE      |         |               |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
2 rows in set (0.00 sec)

mysql> show index from tb;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| tb    |          0 | PRIMARY  |            1 | id          | A         |           4 |     NULL | NULL   |      | BTREE      |         |               |
| tb    |          1 | ix_tb    |            1 | keyname     | A         |           4 |     NULL | NULL   | YES  | BTREE      |         |               |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
2 rows in set (0.00 sec)
主要查看Cardinality值,ta为1,tb为4.
--关联查询验证:
mysql> select a.keyname,b.keyname from ta a inner join tb b on a.keyname=b.keyname where b.keyname='AB';
+---------+---------+
| keyname | keyname |
+---------+---------+
| AB      | AB      |
+---------+---------+
1 row in set (0.00 sec)

mysql> select a.keyname,b.keyname from ta a inner join tb b on a.keyname=b.keyname where a.keyname='AB'; 
+---------+---------+
| keyname | keyname |
+---------+---------+
| AB      | AB      |
| Ab      | Ab      |
| aB      | aB      |
| ab      | ab      |
+---------+---------+
4 rows in set (0.01 sec)
--执行计划:
mysql> explain select a.keyname,b.keyname from ta a inner join tb b on a.keyname=b.keyname where b.keyname='AB'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: b
   partitions: NULL
         type: ref
possible_keys: ix_tb
          key: ix_tb
      key_len: 83
          ref: const
         rows: 1
     filtered: 100.00
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
   partitions: NULL
         type: index
possible_keys: ix_ta
          key: ix_ta
      key_len: 83
          ref: NULL
         rows: 4
     filtered: 100.00
        Extra: Using where; Using index; Using join buffer (Block Nested Loop)
2 rows in set, 2 warnings (0.00 sec)
mysql> explain select a.keyname,b.keyname from ta a inner join tb b on a.keyname=b.keyname where a.keyname='AB'\G 
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
   partitions: NULL
         type: ref
possible_keys: ix_ta
          key: ix_ta
      key_len: 83
          ref: const
         rows: 4
     filtered: 100.00
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: b
   partitions: NULL
         type: ref
possible_keys: ix_tb
          key: ix_tb
      key_len: 83
          ref: wuhan.a.keyname
         rows: 1
     filtered: 100.00
        Extra: Using where; Using index
2 rows in set, 2 warnings (0.00 sec)


--备注:
--修改表的默认字符集:
alter table ta default CHARACTER SET utf8mb4;
--修改表的默认字符集和校验规则:
alter table tb CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;
-- 修改某列的校验规则和字符集:
mysql> alter table ta change keyname keyname varchar(20) CHARACTER SET utf8 COLLATE  utf8_bin;   
Query OK, 4 rows affected (0.04 sec)
Records: 4  Duplicates: 0  Warnings: 0

--结论:
在实际生产环境中若有需求:
1.英文字符严格区分大小写
2.在varchar类型中建立索引并可用。
此时需要将该字段设置为utf8mb4_bin的校验规则。




 

你可能感兴趣的:(MySQL)