类型 | 说明 | N的含义 | 是否有字符集 | 最大长度 |
---|---|---|---|---|
char(n) | 定长字符 | 字符 | 是 | 255 |
varchar(n) | 变长字符 | 字符 | 是 | 65535 |
binary(n) | 定长二进制字节 | 字节 | 否 | 255 |
varbinary(n) | 变长二进制字节 | 字节 | 否 | 65535 |
tinyblob | 二进制大对象 | 字节 | 否 | 255 |
blob(n) | 二进制大对象 | 字节 | 否 | 65535 |
mediumblob(n) | 二进制大对象 | 字节 | 否 | 16M |
longblob(n) | 二进制大对象 | 字节 | 否 | 4G |
tinytext(n) | 大对象 | 字节 | 是 | 256 |
text(n) | 大对象 | 字节 | 是 | 65535 |
mediumtext(n) | 大对象 | 字节 | 是 | 16M |
longtext(n) | 大对象 | 字节 | 是 | 4G |
CHAR(N)用来保存固定长度的字符,N的范围是0255,请牢记,N表示的是字符,而不是字节。VARCHAR(N)用来保存变长字符,N的范围为0 65536,N同样表示字符。
在超出65536个字节的情况下,可以考虑使用更大的字符类型TEXT或BLOB,两者最大存储长度为4G,其区别是BLOB没有字符集属性,纯属二进制存储。
VARCHAR字符类型,最大能够存储65536个字节,所以在MySQL数据库下,绝大部分场景使用类型VARCHAR就足够了。
在表结构设计中,除了将列定义为char和varchar用以存储字符以外,还需要额外定义字符对应的字符集,因为每种字符在不同字符集编码下,对应着不同的二进制值。常见的字符集有gbk、utf8,通常推荐把默认字符集设置为utf8。
查看mysql支持的字符集:
mysql> show charset;
+----------+---------------------------------+---------------------+--------+
| Charset | Description | Default collation | Maxlen |
+----------+---------------------------------+---------------------+--------+
| big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 |
| dec8 | DEC West European | dec8_swedish_ci | 1 |
| cp850 | DOS West European | cp850_general_ci | 1 |
| hp8 | HP West European | hp8_english_ci | 1 |
| koi8r | KOI8-R Relcom Russian | koi8r_general_ci | 1 |
| latin1 | cp1252 West European | latin1_swedish_ci | 1 |
| latin2 | ISO 8859-2 Central European | latin2_general_ci | 1 |
| swe7 | 7bit Swedish | swe7_swedish_ci | 1 |
| ascii | US ASCII | ascii_general_ci | 1 |
| ujis | EUC-JP Japanese | ujis_japanese_ci | 3 |
| sjis | Shift-JIS Japanese | sjis_japanese_ci | 2 |
| hebrew | ISO 8859-8 Hebrew | hebrew_general_ci | 1 |
| tis620 | TIS620 Thai | tis620_thai_ci | 1 |
| euckr | EUC-KR Korean | euckr_korean_ci | 2 |
| koi8u | KOI8-U Ukrainian | koi8u_general_ci | 1 |
| gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2 |
| greek | ISO 8859-7 Greek | greek_general_ci | 1 |
| cp1250 | Windows Central European | cp1250_general_ci | 1 |
| gbk | GBK Simplified Chinese | gbk_chinese_ci | 2 |
| latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | 1 |
| armscii8 | ARMSCII-8 Armenian | armscii8_general_ci | 1 |
| utf8 | UTF-8 Unicode | utf8_general_ci | 3 |
| ucs2 | UCS-2 Unicode | ucs2_general_ci | 2 |
| cp866 | DOS Russian | cp866_general_ci | 1 |
| keybcs2 | DOS Kamenicky Czech-Slovak | keybcs2_general_ci | 1 |
| macce | Mac Central European | macce_general_ci | 1 |
| macroman | Mac West European | macroman_general_ci | 1 |
| cp852 | DOS Central European | cp852_general_ci | 1 |
| latin7 | ISO 8859-13 Baltic | latin7_general_ci | 1 |
| utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci | 4 |
| cp1251 | Windows Cyrillic | cp1251_general_ci | 1 |
| utf16 | UTF-16 Unicode | utf16_general_ci | 4 |
| utf16le | UTF-16LE Unicode | utf16le_general_ci | 4 |
| cp1256 | Windows Arabic | cp1256_general_ci | 1 |
| cp1257 | Windows Baltic | cp1257_general_ci | 1 |
| utf32 | UTF-32 Unicode | utf32_general_ci | 4 |
| binary | Binary pseudo charset | binary | 1 |
| geostd8 | GEOSTD8 Georgian | geostd8_general_ci | 1 |
| cp932 | SJIS for Windows Japanese | cp932_japanese_ci | 2 |
| eucjpms | UJIS for Windows Japanese | eucjpms_japanese_ci | 3 |
| gb18030 | China National Standard GB18030 | gb18030_chinese_ci | 4 |
+----------+---------------------------------+---------------------+--------+
41 rows in set (0.00 sec)
而且随着移动互联网的飞速发展,推荐把MySQL的默认字符集设置为UTF8MB4,否则,某些emoji表情字符无法在UTF8字符集下存储,比如emoji 笑脸表情,对应的字符编码为0xF09F988E:
mysql> select cast(0xF09F988E as char charset utf8);
+---------------------------------------+
| cast(0xF09F988E as char charset utf8) |
+---------------------------------------+
| ???? |
+---------------------------------------+
1 row in set (0.00 sec)
mysql> select cast(0xF09F988E as char charset utf8mb4);
+------------------------------------------+
| cast(0xF09F988E as char charset utf8mb4) |
+------------------------------------------+
| |
+------------------------------------------+
1 row in set (0.00 sec)
要想在命令行显示表情符号,需要将客户端的默认字符集设置为utfmb4。
[client]
default-character-set = utf8mb4
... ...
若强行在字符集为UTF8的列上插入emoji表情字符,MySQL会抛出如下错误信息:
mysql> show create table emoji_test;
+------------+-------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+------------+-------------------------------------------------------------------------------------------------------------------+
| emoji_test | CREATE TABLE `emoji_test` (
`a` varchar(100) NOT NULL,
PRIMARY KEY (`a`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
+------------+-------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> INSERT INTO emoji_test VALUES (0xF09F988E);
ERROR 1366 (HY000): Incorrect string value: '\xF0\x9F\x98\x8E' for column 'a' at row 1
包括MySQL8.0版本在内,字符集默认设置成UTF8MB4,8.0版本之前默认的字符集为Latin1。因为不同版本默认字符集的不同,你要显式地在配置文件中进行相关参数的配置:
[mysqld]
character-set-server = utf8mb4
... ...
另外,不同的字符集,CHAR(N)、VARCHAR(N) 对应最长的字节也不相同。比如GBK字符集,1个字符最大存储 2 个字节,UTF8MB4字符集1个字符最大存储4 个字节。所以从底层存储内核看,在多字节字符集下,CHAR和VARCHAR底层的实现完全相同,都是变长存储!
mysql> select cast(0x61 as char(1) charset utf8mb4);
+---------------------------------------+
| cast(0x61 as char(1) charset utf8mb4) |
+---------------------------------------+
| a |
+---------------------------------------+
1 row in set (0.00 sec)
mysql> select cast(0xF09F988E as char(1) charset utf8mb4);
+---------------------------------------------+
| cast(0xF09F988E as char(1) charset utf8mb4) |
+---------------------------------------------+
| |
+---------------------------------------------+
1 row in set (0.00 sec)
从上面的例子可以看到,CHAR(1)既可以存储1个’a’字节,也可以存储4个字节的emoji笑脸表情,因此CHAR本质也是变长的。
鉴于目前默认字符集推荐设置为UTF8MB4,所以在表结构设计时,可以把CHAR全部用VARCHAR替换,底层存储的本质实现一模一样。
排序规则(Collation)是比较和排序字符串的一种规则,每个字符集都会有默认的排序规则,你可以用命令SHOW COLLATION来查看:
mysql> SHOW COLLATION LIKE 'utf8mb4%';
+------------------------+---------+-----+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+------------------------+---------+-----+---------+----------+---------+
| utf8mb4_general_ci | utf8mb4 | 45 | Yes | Yes | 1 |
| utf8mb4_bin | utf8mb4 | 46 | | Yes | 1 |
| utf8mb4_unicode_ci | utf8mb4 | 224 | | Yes | 8 |
| utf8mb4_icelandic_ci | utf8mb4 | 225 | | Yes | 8 |
| utf8mb4_latvian_ci | utf8mb4 | 226 | | Yes | 8 |
| utf8mb4_romanian_ci | utf8mb4 | 227 | | Yes | 8 |
| utf8mb4_slovenian_ci | utf8mb4 | 228 | | Yes | 8 |
| utf8mb4_polish_ci | utf8mb4 | 229 | | Yes | 8 |
| utf8mb4_estonian_ci | utf8mb4 | 230 | | Yes | 8 |
| utf8mb4_spanish_ci | utf8mb4 | 231 | | Yes | 8 |
| utf8mb4_swedish_ci | utf8mb4 | 232 | | Yes | 8 |
| utf8mb4_turkish_ci | utf8mb4 | 233 | | Yes | 8 |
| utf8mb4_czech_ci | utf8mb4 | 234 | | Yes | 8 |
| utf8mb4_danish_ci | utf8mb4 | 235 | | Yes | 8 |
| utf8mb4_lithuanian_ci | utf8mb4 | 236 | | Yes | 8 |
| utf8mb4_slovak_ci | utf8mb4 | 237 | | Yes | 8 |
| utf8mb4_spanish2_ci | utf8mb4 | 238 | | Yes | 8 |
| utf8mb4_roman_ci | utf8mb4 | 239 | | Yes | 8 |
| utf8mb4_persian_ci | utf8mb4 | 240 | | Yes | 8 |
| utf8mb4_esperanto_ci | utf8mb4 | 241 | | Yes | 8 |
| utf8mb4_hungarian_ci | utf8mb4 | 242 | | Yes | 8 |
| utf8mb4_sinhala_ci | utf8mb4 | 243 | | Yes | 8 |
| utf8mb4_german2_ci | utf8mb4 | 244 | | Yes | 8 |
| utf8mb4_croatian_ci | utf8mb4 | 245 | | Yes | 8 |
| utf8mb4_unicode_520_ci | utf8mb4 | 246 | | Yes | 8 |
| utf8mb4_vietnamese_ci | utf8mb4 | 247 | | Yes | 8 |
+------------------------+---------+-----+---------+----------+---------+
26 rows in set (0.01 sec)
排序规则以_ci结尾,表示不区分大小写(Case Insentive),_cs 表示大小写敏感,_bin 表示通过存储字符的二进制进行比较。需要注意的是,比较MySQL字符串,Linux默认采用不区分大小的排序规则,而Windows下区分大小:
mysql> select 'a'='A';
+---------+
| 'a'='A' |
+---------+
| 1 |
+---------+
1 row in set (0.00 sec)
mysql> select cast('a' as char charset utf8mb4) collate utf8mb4_bin=cast('A' as char charset utf8mb4) collate utf8mb4_bin;
+-------------------------------------------------------------------------------------------------------------+
| cast('a' as char charset utf8mb4) collate utf8mb4_bin=cast('A' as char charset utf8mb4) collate utf8mb4_bin |
+-------------------------------------------------------------------------------------------------------------+
| 0 |
+-------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
牢记,绝大部分业务的表结构设计无须设置排序规则为大小写敏感!除非你能明白你的业务真正需要。
可以分别为数据库、表、字段设置字符集和排序规则,颗粒度越精确的优先级越高。
当然,相信不少业务在设计时没有考虑到字符集对于业务数据存储的影响,所以后期需要进行字符集转换,但很多同学会发现执行如下操作后,依然无法插入emoji这类UTF8MB4字符:
mysql> show create table emoji_test;
+------------+-------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+------------+-------------------------------------------------------------------------------------------------------------------+
| emoji_test | CREATE TABLE `emoji_test` (
`a` varchar(100) NOT NULL,
PRIMARY KEY (`a`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
+------------+-------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)
mysql> INSERT INTO emoji_test VALUES (0xF09F988E);
ERROR 1366 (HY000): Incorrect string value: '\xF0\x9F\x98\x8E' for column 'a' at row 1
mysql> alter table emoji_test charset utf8mb4;
Query OK, 0 rows affected (0.04 sec)
Records: 0 Duplicates: 0 Warnings: 0
其实,上述修改只是将表的字符集修改为UTF8MB4,下次新增列时,若不显式地指定字符集,新列的字符集会变更为UTF8MB4,但对于已经存在的列,其默认字符集并不做修改。
mysql> show create table emoji_test;
+------------+-----------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+------------+-----------------------------------------------------------------------------------------------------------------------------------------+
| emoji_test | CREATE TABLE `emoji_test` (
`a` varchar(100) CHARACTER SET utf8 NOT NULL,
PRIMARY KEY (`a`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 |
+------------+-----------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> INSERT INTO emoji_test VALUES (0xF09F988E);
ERROR 1366 (HY000): Incorrect string value: '\xF0\x9F\x98\x8E' for column 'a' at row 1
可以看到,列a的字符集依然是UTF8,而不是UTF8MB4。因此,正确修改列字符集的命令应该使用ALTER TABLE … CONVERT TO…这样才能将之前的列a字符集从 UTF8修改为UTF8MB4:
mysql> alter table emoji_test convert to charset utf8mb4;
Query OK, 0 rows affected (0.11 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> show create table emoji_test;
+------------+----------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+------------+----------------------------------------------------------------------------------------------------------------------+
| emoji_test | CREATE TABLE `emoji_test` (
`a` varchar(100) NOT NULL,
PRIMARY KEY (`a`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 |
+------------+----------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> INSERT INTO emoji_test VALUES (0xF09F988E);
Query OK, 1 row affected (0.04 sec)
mysql> select * from emoji_test;
+------+
| a |
+------+
| |
+------+
1 row in set (0.00 sec)
VARCHAR(N)用来保存变长字符,N的范围为0~ 65536,那么N最大为65536个字符?
mysql> create table test_varchar2(a varchar(65535));
ERROR 1118 (42000): Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, check the manual. You have to change some columns to TEXT or BLOBs
mysql> create table test_varchar(a varchar(16384));
ERROR 1074 (42000): Column length too big for column 'a' (max = 16383); use BLOB or TEXT instead
mysql> create table test_varchar(a varchar(16383));
Query OK, 0 rows affected (0.06 sec)
mysql> show create table test_varchar;
+--------------+---------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+--------------+---------------------------------------------------------------------------------------------------------+
| test_varchar | CREATE TABLE `test_varchar` (
`a` varchar(16383) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 |
+--------------+---------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)
那表test_varchar的a
字段的最大长度N为多少呢?
(65535−1−2)/4=16383
说明:
减1的原因是实际行存储从第二个字节开始
减2的原因是varchar头部的2个字节表示长度
除4的原因是字符编码是utf8mb4,utf8mb4一个字符最大占用4个字节
mysql> create table test_varchar2(a int(10), b char(32), c varchar(16351));
ERROR 1118 (42000): Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, check the manual. You have to change some columns to TEXT or BLOBs
mysql> create table test_varchar2(a int(10), b char(32), c varchar(16350));
Query OK, 0 rows affected (0.07 sec)
那表test_varchar2的c
字段的最大长度N为多少呢?
(65535−1−2−4−32*4)/4=16350
说明:
减1、减2的原因同上
减4的原因是int类型占用4个字节
减32*4的原因是一个utf8mb4编码的char类型占用4个字节
总结:**varchar到底能存多少个字符?**这与表使用的字符集相关,latin1、gbk、utf8、utf8mb4编码存放一个字符分别需要占1、2、3、4个字节,同时还要考虑到去除其他字段的占用影响。