mysql的collation大致的意思就是字符序。首先字符本来是不分大小的,那么对字符的>, = , < 操作就需要有个字符序的规则。collation做的就是这个事情,你可以对表进行字符序的设置,也可以单独对某个字段进行字符序的设置。一个字符类型,它的字符序有多个,比如:
下面是UTF8对应的字符序。
mysql> show collation like '%utf8%'
-> ;
+-----------------------+---------+-----+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+-----------------------+---------+-----+---------+----------+---------+
| utf8_general_ci | utf8 | 33 | Yes | Yes | 1 |
| utf8_bin | utf8 | 83 | | Yes | 1 |
| utf8_unicode_ci | utf8 | 192 | | Yes | 8 |
| utf8_icelandic_ci | utf8 | 193 | | Yes | 8 |
| utf8_latvian_ci | utf8 | 194 | | Yes | 8 |
| utf8_romanian_ci | utf8 | 195 | | Yes | 8 |
| utf8_slovenian_ci | utf8 | 196 | | Yes | 8 |
| utf8_polish_ci | utf8 | 197 | | Yes | 8 |
| utf8_estonian_ci | utf8 | 198 | | Yes | 8 |
| utf8_spanish_ci | utf8 | 199 | | Yes | 8 |
| utf8_swedish_ci | utf8 | 200 | | Yes | 8 |
| utf8_turkish_ci | utf8 | 201 | | Yes | 8 |
| utf8_czech_ci | utf8 | 202 | | Yes | 8 |
| utf8_danish_ci | utf8 | 203 | | Yes | 8 |
| utf8_lithuanian_ci | utf8 | 204 | | Yes | 8 |
| utf8_slovak_ci | utf8 | 205 | | Yes | 8 |
| utf8_spanish2_ci | utf8 | 206 | | Yes | 8 |
| utf8_roman_ci | utf8 | 207 | | Yes | 8 |
| utf8_persian_ci | utf8 | 208 | | Yes | 8 |
| utf8_esperanto_ci | utf8 | 209 | | Yes | 8 |
| utf8_hungarian_ci | utf8 | 210 | | Yes | 8 |
| utf8_sinhala_ci | utf8 | 211 | | Yes | 8 |
mysql的字符序遵从命名惯例。以_ci( [case insensitive]表示大小写不敏感),以_cs([case sensitive]表示大小写敏感),以_bin(表示用编码值进行比较)。比如:
CREATE TABLE IF NOT EXISTS userINFO
(id INT AUTO_INCREMENT PRIMARY KEY,NAME VARCHAR(20)
) DEFAULT CHARSET utf8 COLLATE utf8_general_ci
这个表下面的两个sql会出现同样的结果
INSERT INTO userINFO(name) VALUES('aaa');
SELECT * FROM userINFO WHERE name='aaa'
SELECT * FROM userINFO WHERE name='Aaa'
也可以改成在某个列上面定义:
CREATE TABLE IF NOT EXISTS userINFO
(id INT AUTO_INCREMENT PRIMARY KEY,NAME VARCHAR(20) COLLATE utf8_bin
) DEFAULT CHARSET utf8
那么上面两个sql结果就会不一样了
所以,如果对字符大小敏感的话,最好将数据库中默认的utf8_general_ci设置为utf8_bin。