最常见的问题就是数据库插入查看或导入导出出现了乱码,这应该是所有用过mysql的人都遇到过了吧。所以解决乱码问题,就应该理解其原因:就是客户端连接设置的编码与表编码不一致!
数据库字符集有多种,linux 系统默认是 utf-8 编码,而 windows 默认为 gbk 编码。一般我们常用的系统都是中英文命名和存储数据d系统,所以服务器和mysql数据库设置编码为 utf8。mysql 还有字符集 utf8mb4,兼容 utf8 ,可保持 emoji 表情,但比 utf8 多占用1个字节。
查看linux系统编码
shell> cat /etc/sysconfig/i18n
LANG="en_US.UTF-8"
SYSFONT="latarcyrheb-sun16"
shell> echo $LANG
en_US.UTF-8
shell> locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
查看数据库可设置的字符集及校对规则
mysql> SHOW CHARACTER SET;
mysql> SELECT * FROM INFORMATION_SCHEMA.CHARACTER_SETS ORDER BY CHARACTER_SET_NAME;
查看 mysql 数据库系统默认字符集
mysql> SHOW GLOBAL VARIABLES LIKE '%character_set%';
+--------------------------+----------------------------------+
| Variable_name | Value |
+--------------------------+----------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql/share/charsets/ |
+--------------------------+----------------------------------+
8 rows in set (0.01 sec)
说明:
character_set_client :客户端使用的字符集
character_set_results :服务器查询结果到客户端字符集
character_set_connection :客户端到服务器默认转换的编码
character_set_database :数据库创建时默认的字符集(MySQL 5.7.6 后不推荐使用)
character_set_server :数据库创建时默认的字符集
character_set_system :系统默认元数据字符集,总是为 “utf8”
character_set_filesystem :导入导出时文件名字符集,默认binary则保持与client设置一样。
更改系统字符集,改为 “gb2312”
shell> vi /etc/sysconfig/i18n
LANG="zh_CN.GB2312"
#LANG="en_US.UTF-8"
SYSFONT="latarcyrheb-sun16"
设置生效
shell> . /etc/sysconfig/i18n
改了系统编码后,再访问查看mysql字符集.字符集没有什么变化.
mysql> SHOW GLOBAL VARIABLES LIKE '%character_set%';
+--------------------------+-------------------------------------------------------------+
| Variable_name | Value |
+--------------------------+-------------------------------------------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql-5.7.16-linux-glibc2.5-i686/share/charsets/ |
+--------------------------+-------------------------------------------------------------+
在linux服务器端的终端访问mysql,再查看当前会话的字符集,发现客户端这几个字符集编码变为 “gb2312”!
mysql> SHOW VARIABLES LIKE '%character_set%';
+--------------------------+-------------------------------------------------------------+
| Variable_name | Value |
+--------------------------+-------------------------------------------------------------+
| character_set_client | gb2312 |
| character_set_connection | gb2312 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | gb2312 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql-5.7.16-linux-glibc2.5-i686/share/charsets/ |
+--------------------------+-------------------------------------------------------------+
数据库默认字符集为 "character_set_database = latin1",所以当我们创建数据库没有指定字符集时,则使用默认设置。
mysql> CREATE DATABASE test;
mysql> SHOW CREATE DATABASE test;
+----------+-----------------------------------------------------------------+
| Database | Create Database |
+----------+-----------------------------------------------------------------+
| test | CREATE DATABASE `test` /*!40100 DEFAULT CHARACTER SET latin1 */ |
+----------+-----------------------------------------------------------------+
此外,我们创建表的默认字符集为当前数据库的字符集。如下,新创建的表字符集也为 “latin1”
mysql> USE test;
mysql> CREATE TABLE tab(id INT NOT NULL AUTO_INCREMENT,name VARCHAR(30) NOT NULL,PRIMARY KEY(iD));
mysql> SHOW CREATE TABLE tab\G;
*************************** 1. row ***************************
Table: tab
Create Table: CREATE TABLE `tab` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(30) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
现在创建数据库,并且指定默认字符集为“utf8”,再创建一张未指定字符集的表和一张指定其他字符集的表。发现,未指定字符集的表默认字符集与数据库一致的。
mysql> CREATE DATABASE test2 CHARACTER SET utf8;
mysql> USE test2;
mysql> CREATE TABLE tab(id INT NOT NULL AUTO_INCREMENT,name VARCHAR(30) NOT NULL,PRIMARY KEY(iD));
mysql> CREATE TABLE tab1(id INT NOT NULL AUTO_INCREMENT,name VARCHAR(30) NOT NULL,PRIMARY KEY(iD)) DEFAULT CHARSET=utf8mb4;
mysql> SHOW CREATE TABLE tab\G;
*************************** 1. row ***************************
Table: tab
Create Table: CREATE TABLE `tab` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(30) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
mysql> SHOW CREATE TABLE tab1\G;
*************************** 1. row ***************************
Table: tab1
Create Table: CREATE TABLE `tab1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(30) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
查看所有数据库字符集及排序规则
mysql> SELECT SCHEMA_NAME,DEFAULT_CHARACTER_SET_NAME,DEFAULT_COLLATION_NAME FROM INFORMATION_SCHEMA.SCHEMATA;
+--------------------+----------------------------+------------------------+
| SCHEMA_NAME | DEFAULT_CHARACTER_SET_NAME | DEFAULT_COLLATION_NAME |
+--------------------+----------------------------+------------------------+
| information_schema | utf8 | utf8_general_ci |
| mysql | latin1 | latin1_swedish_ci |
| performance_schema | utf8 | utf8_general_ci |
| sys | utf8 | utf8_general_ci |
| test | latin1 | latin1_swedish_ci |
| test2 | utf8 | utf8_general_ci |
+--------------------+----------------------------+------------------------+
查看所有表的排序规则(字符集)
mysql> SELECT TABLE_SCHEMA,TABLE_NAME,TABLE_COLLATION FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA IN('test','test2');
+--------------+------------+--------------------+
| TABLE_SCHEMA | TABLE_NAME | TABLE_COLLATION |
+--------------+------------+--------------------+
| test | tab | latin1_swedish_ci |
| test2 | tab | utf8_general_ci |
| test2 | tab1 | utf8mb4_general_ci |
+--------------+------------+--------------------+
[navicat 测试]
现在开始插入表数据,我们插入数据到数据库表 test.tab,该数据库及表字符集都是 “latin1”。
当前我用 navicat 连接,该客户端默认设置的相关连接编码为:
character_set_client=utf8
character_set_connection=utf8
character_set_results=utf8
--插入英文字符,正常
mysql> use test;
mysql> insert into tab values(1,'123'),(2,'abc');
--插入中外字符,失败
mysql> insert into tab values(3,'文档');
1366 - Incorrect string value: '\xE6\x96\x87\xE6\xA1\xA3' for column 'name' at row 1
--怎么插入中文呢??现在将客户端会话编码设置为latin1,保持与数据库一致,再执行插入中文。
mysql> SET names latin1;
mysql> SHOW VARIABLES LIKE '%character_set%';
mysql> insert into tab values(3,'文档');
mysql> select * from tab;
+----+------+
| id | name |
+----+------+
| 1 | 123 |
| 2 | abc |
| 3 | 文档 |
+----+------+
这会儿就没有报错了!查询也正常!现在将连接会话改为 utf8 ,再次查询。
mysql> SET names utf8;
mysql> SHOW VARIABLES LIKE '%character_set%';
mysql> select * from tab;
+----+--------+
| id | name |
+----+--------+
| 1 | 123 |
| 2 | abc |
| 3 | ?–???£ |
+----+--------+
因为表字符编码(latin1)与当前客户端会话的字符编码(utf8)不一样,所以查询出来了乱码!!
为了防止乱码出现,插入数据时客户端要与表字符集一样,查询数据时,客户端也要与表编码一样。
以下更详细一步进行测试,再次确认,系统当前默认配置的字符编码如下:
+--------------------------+-------------------------------------------------------------+
| Variable_name | Value |
+--------------------------+-------------------------------------------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
+--------------------------+-------------------------------------------------------------+
-- 现将客户端字符集 utf8 改为 latin1 ,插入时中文不乱码(下一步有测试),查询时乱码!
mysql> set character_set_client = latin1;
mysql> insert into tab values(4,'测试');
mysql> select * from tab;
+----+--------+
| id | name |
+----+--------+
| 1 | 123 |
| 2 | abc |
| 3 | ?–???£ |
| 4 | ?μ?èˉ? |
+----+--------+
-- 单独将服务器查询结果字符集 utf8 改为 latin1,则查询结果正常;不改则乱码。
mysql> set character_set_results = latin1;
mysql> select * from tab;
+----+--------+
| id | name |
+----+--------+
| 1 | 123 |
| 2 | abc |
| 3 | 文档 |
| 4 | 测试 |
+----+--------+
客户端插入和查询都正常了,那么 character_set_connection 是干嘛用的呢?干嘛用的呢?
character_set_connection 只作为中间转换,如果字符串未设置字符集且设置了排序规则,默认字符集为character_set_connection设置的字符集,且会判断字符串显式设置的排序规则是否准确,不正确则出错。
+--------------------------------------------------------------------------------------+
| |
|character_set_client <------> character_set_connection <------> character_set_results |
| |
+--------------------------------------------------------------------------------------+
--设置 connection 字符集为 utf8,
mysql> set character_set_connection = utf8;
--只设置排序规则,但非当前 connection 对应的排序规则。错误!
mysql> select 'Müller' COLLATE latin1_german1_ci;
1253 - COLLATION 'latin1_german1_ci' is not valid for CHARACTER SET 'utf8'
--只设置排序规则,为当前connection对应的排序规则。正确!
mysql> select 'Müller' COLLATE utf8_general_ci;
--如果字符前面强制字符集 _latin1或_utf8 ,则不会进行connection转换。
【导出导入数据乱码问题】
数据库表字符集为 latin1 ,客户端字符集默认与linux操作系统设置一致为 utf8.导出数据时乱码!!
shell> mysqldump -uroot -p --opt -t test > test.sql
shell> egrep -v "#|\*|--|^$" test.sql
LOCK TABLES `tab` WRITE;
INSERT INTO `tab` VALUES (1,'123'),(2,'abc'),(3,'?–???£'),(4,'?μ?èˉ?');
UNLOCK TABLES;
现在设置导出时字符集为 latin1 ,与表一样。结果正常!!
shell> mysqldump -uroot -p --default-character-set=latin1 --opt -t test > test.sql
shell> egrep -v "#|\*|--|^$" test.sql
LOCK TABLES `tab` WRITE;
INSERT INTO `tab` VALUES (1,'123'),(2,'abc'),(3,'文档'),(4,'测试');
UNLOCK TABLES;
导入同理,若导入导出出现了乱码,那只要设置相同的字符集即可。
【更改字符集】
客户端的都设置,为会话级别:
mysql> SET names latin1;
客户端单个设置,为会话级别:
mysql> set character_set_connection = utf8;
mysql> set @@session.character_set_connection = utf8;
客户端单个设置,为全局设置:
mysql> set global character_set_client = gb2312;
mysql> set @@global.character_set_client = latin1;
永久设置,在配置文件中更改:
shell> vi /etc/my.cnf
[client]
default_character_set = utf8
[mysql] #客户端设置
default_character_set = utf8
[mysqld] #服务器设置
init-connect='SET NAMES utf8'
character-set-server = utf8
collation-server = utf8_general_ci
#default-character-set = utf8
skip-character-set-client-handshake#忽略客户端字符集,使用服务器设置