MySQL数据库查询结果乱码,这是大家比较常见的情形。到底是什么原因导致出现查询结果为乱码呢,本文主要通过演示来理解乱码产生的原因,以及如何解决字符集乱码,供大家参考。
一、字符编码对比
SELECT hex(convert('love' USING latin1)) latin_value,
hex(convert('love' USING gb2312)) gb2312_value,
hex(convert('love' USING gbk)) gbk_value,
hex(convert('love' USING utf8)) utf8_value;
+-------------+--------------+-----------+------------+
| latin_value | gb2312_value | gbk_value | utf8_value |
+-------------+--------------+-----------+------------+
| 6C6F7665 | 6C6F7665 | 6C6F7665 | 6C6F7665 |
+-------------+--------------+-----------+------------+
SELECT hex(convert('爱' USING latin1)) latin_value,
hex(convert('爱' USING gb2312)) gb2312_value,
hex(convert('爱' USING gbk)) gbk_value,
hex(convert('爱' USING utf8)) utf8_value;
+-------------+--------------+-----------+------------+
| latin_value | gb2312_value | gbk_value | utf8_value |
+-------------+--------------+-----------+------------+
| 3F | B0AE | B0AE | E788B1 |
+-------------+--------------+-----------+------------+
SELECT convert(0x3F USING latin1) latin_value,
convert(0xB0AE USING gb2312) gb2312_value,
convert(0xB0AE USING gbk) gbk_value,
convert(0xE788B1 USING utf8) utf8_value;
+-------------+--------------+-----------+------------+
| latin_value | gb2312_value | gbk_value | utf8_value |
+-------------+--------------+-----------+------------+
| ? | 爱 | 爱 | 爱 |
+-------------+--------------+-----------+------------+
二、乱码测试
1、环境准备
# grep -Ev "^#|^$" /etc/my.cnf -- 查看当前my.cnf配置
mysql> show variables like 'version';
+---------------+------------+
| Variable_name | Value |
+---------------+------------+
| version | 5.7.23-log |
+---------------+------------+
mysql> show variables like '%character%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | gbk |
| character_set_connection | gbk |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | gbk |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
DROP TABLE IF EXISTS sakila.colum_charset;
CREATE TABLE sakila.colum_charset
(
id int not null auto_increment primary key,
c1 varchar(20),
c2 char(20) CHAR SET gbk,
c3 varchar(20) CHARSET gb2312,
c4 char(20) CHARACTER SET utf8,
c5 varchar(20) CHARSET utf8mb4
);
mysql> show create table sakila.colum_charset\G
*************************** 1. row ***************************
Table: colum_charset
Create Table: CREATE TABLE `colum_charset` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`c1` varchar(20) DEFAULT NULL,
`c2` char(20) CHARACTER SET gbk DEFAULT NULL,
`c3` varchar(20) CHARACTER SET gb2312 DEFAULT NULL,
`c4` char(20) CHARACTER SET utf8 DEFAULT NULL,
`c5` varchar(20) CHARACTER SET utf8mb4 DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=latin1
2、基于默认字符插入数据(gbk)
– character_set_client gbk
– character_set_connection gbk
– character_set_results gbk
INSERT INTO sakila.colum_charset(id,
c1,
c2,
c3,
c4,
c5)
VALUES (NULL,'爱','爱','爱','爱','爱');
ERROR 1366 (HY000): Incorrect string value: '\xB0\xAE' for column 'c1' at row 1
INSERT INTO sakila.colum_charset(id,
c1,
c2,
c3,
c4,
c5)
VALUES (NULL,'love','爱','爱','爱','爱');
Query OK, 1 row affected (0.00 sec)
mysql> select * from sakila.colum_charset;
+----+------+------+------+------+------+
| id | c1 | c2 | c3 | c4 | c5 |
+----+------+------+------+------+------+
| 1 | love | 爱 | 爱 | 爱 | 爱 |
+----+------+------+------+------+------+
3、三个变量全部设置为utf8插入数据
mysql> set names ‘utf8’;
Query OK, 0 rows affected (0.00 sec)
mysql> show variables like '%character%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
INSERT INTO sakila.colum_charset(id,
c1,
c2,
c3,
c4,
c5)
VALUES (NULL,'heart','心','心','心','心');
Query OK, 1 row affected (0.00 sec)
mysql> select * from sakila.colum_charset;
+----+-------+------+------+------+------+
| id | c1 | c2 | c3 | c4 | c5 |
+----+-------+------+------+------+------+
| 1 | love | 爱 | 爱 | 爱 | 爱 |
| 2 | heart | 心 | 心 | 心 | 心 |
+----+-------+------+------+------+------+
INSERT INTO sakila.colum_charset(id,
c1,
c2,
c3,
c4,
c5)
VALUES (NULL,'heart','屌','屌','屌','屌'); -- c3列为gb2312编码
ERROR 1366 (HY000): Incorrect string value: '\xE5\xB1\x8C' for column 'c3' at row 1
4、单个变量character_set_connection设置为latin1插入数据
mysql> set character_set_connection=latin1;
INSERT INTO sakila.colum_charset(id,
c1,
c2,
c3,
c4,
c5)
VALUES (NULL,'heart','情','情','情','情');
Query OK, 1 row affected, 4 warnings (0.00 sec)
mysql> show warnings \G
*************************** 1. row ***************************
Level: Warning
Code: 1300
Message: Invalid utf8 character string: '\xE6\x83\x85'
-- 乱码出现
mysql> select * from sakila.colum_charset;
+----+-------+------+------+------+------+
| id | c1 | c2 | c3 | c4 | c5 |
+----+-------+------+------+------+------+
| 1 | love | 爱 | 爱 | 爱 | 爱 |
| 2 | heart | 心 | 心 | 心 | 心 |
| 3 | heart | ? | ? | ? | ? |
+----+-------+------+------+------+------+
5、单个变量character_set_connection设置为gb2312插入数据
mysql> set character_set_connection=gb2312;
INSERT INTO sakila.colum_charset(id,
c1,
c2,
c3,
c4,
c5)
VALUES (NULL,'heart','屌','屌','屌','屌');
Query OK, 1 row affected, 4 warnings (0.00 sec)
mysql> select * from sakila.colum_charset;
+----+-------+------+------+------+------+
| id | c1 | c2 | c3 | c4 | c5 |
+----+-------+------+------+------+------+
| 1 | love | 爱 | 爱 | 爱 | 爱 |
| 2 | heart | 心 | 心 | 心 | 心 |
| 3 | heart | ? | ? | ? | ? |
| 4 | heart | ? | ? | ? | ? |
+----+-------+------+------+------+------+
6、单个变量character_set_results设置为latin1
– 测试返回数据
mysql> set character_set_results=latin1;
Query OK, 0 rows affected (0.00 sec)
mysql> select * from sakila.colum_charset;
+----+-------+------+------+------+------+
| id | c1 | c2 | c3 | c4 | c5 |
+----+-------+------+------+------+------+
| 1 | love | ? | ? | ? | ? |
| 2 | heart | ? | ? | ? | ? |
| 3 | heart | ? | ? | ? | ? |
| 4 | heart | ? | ? | ? | ? |
+----+-------+------+------+------+------+
4 rows in set (0.00 sec)
6、单个变量character_set_results设置为gb2312
– 测试返回数据
mysql> set character_set_results=gb2312;
Query OK, 0 rows affected (0.00 sec)
mysql> select * from sakila.colum_charset;
+----+-------+------+------+------+------+
| id | c1 | c2 | c3 | c4 | c5 |
+----+-------+------+------+------+------+
| 1 | love | | | | |
| 2 | heart | | | | |
| 3 | heart | ? | ? | ? | ? |
| 4 | heart | ? | ? | ? | ? |
+----+-------+------+------+------+------+
4 rows in set (0.00 sec)
6、单个变量character_set_results设置为gbk
– 测试返回数据
mysql> set character_set_results=gbk;
Query OK, 0 rows affected (0.00 sec)
mysql> select * from sakila.colum_charset;
+----+-------+------+------+------+------+
| id | c1 | c2 | c3 | c4 | c5 |
+----+-------+------+------+------+------+
| 1 | love | | | | |
| 2 | heart | | | | |
| 3 | heart | ? | ? | ? | ? |
| 4 | heart | ? | ? | ? | ? |
+----+-------+------+------+------+------+
4 rows in set (0.00 sec)
7、单个变量character_set_results设置为utf8
– 测试返回数据
mysql> set character_set_results=utf8;
Query OK, 0 rows affected (0.00 sec)
mysql> select * from sakila.colum_charset;
+----+-------+------+------+------+------+
| id | c1 | c2 | c3 | c4 | c5 |
+----+-------+------+------+------+------+
| 1 | love | 爱 | 爱 | 爱 | 爱 |
| 2 | heart | 心 | 心 | 心 | 心 |
| 3 | heart | ? | ? | ? | ? |
| 4 | heart | ? | ? | ? | ? |
+----+-------+------+------+------+------+
4 rows in set (0.00 sec)
8、本地环境变量影响客户端字符集设定
– 在my.cnf中未配置客户端字符集,如果配置后,则使用配置文件中设定的字符集
[root@centos7 ~]# export LANG=en_US.UTF-8
[root@centos7 ~]# mysql -e "show variables like 'character%'"
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
[root@centos7 ~]# export LANG=zh_CN.GBK
[root@centos7 ~]# mysql -e "show variables like 'character%'"
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | gbk |
| character_set_connection | gbk |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | gbk |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
[root@centos7 ~]# export LANG=zh_CN.GB2312
[root@centos7 ~]# mysql -e "show variables like 'character%'"
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | gb2312 |
| character_set_connection | gb2312 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | gb2312 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
==========================================================
结论:
character_set_client: 客户端发送的数据是什么编码?
character_set_connection: 告诉字符集转换器,转换成什么编码?
character_set_results: 查询的结果用什么编码?
如果以上三者都为字符集N,可简写为set names ‘N’;
乱码产生的原因如下:
a、插入或读取时对应编码环节发生转换导致数据丢失。
b、如果两个字符集之间无法进行无损编码转换,一定会出现乱码。
解决方案:
1、一定要保证character_set_connection字符集大于等于client字符集,否则会丢失数据
比如: latin1 < gb2312 < gbk < utf8,
若设置set character_set_client = gb2312,
那么至少connection的字符集要大于等于gb2312,否则就会丢失数据
2、一定要保证character_set_results大于等于数据存入的字符集,否则会丢失数据
比如:如存储的字符为utf8,而返回character_set_results为gbk,数据被截断
3、所有变量使用统一的字符编码,如utf8或者utf8mb4