MySQL字符集乱码

MySQL数据库查询结果乱码,这是大家比较常见的情形。到底是什么原因导致出现查询结果为乱码呢,本文主要通过演示来理解乱码产生的原因,以及如何解决字符集乱码,供大家参考。

一、字符编码对比

	SELECT hex(convert('love' USING latin1)) latin_value,
	       hex(convert('love' USING gb2312)) gb2312_value,
	       hex(convert('love' USING gbk))    gbk_value,
	       hex(convert('love' USING utf8))   utf8_value;
	+-------------+--------------+-----------+------------+
	| latin_value | gb2312_value | gbk_value | utf8_value |
	+-------------+--------------+-----------+------------+
	| 6C6F7665    | 6C6F7665     | 6C6F7665  | 6C6F7665   |
	+-------------+--------------+-----------+------------+
	
	SELECT hex(convert('爱' USING latin1)) latin_value,
	       hex(convert('爱' USING gb2312)) gb2312_value,
	       hex(convert('爱' USING gbk))    gbk_value,
	       hex(convert('爱' USING utf8))   utf8_value;
	+-------------+--------------+-----------+------------+
	| latin_value | gb2312_value | gbk_value | utf8_value |
	+-------------+--------------+-----------+------------+
	| 3F          | B0AE         | B0AE      | E788B1     |
	+-------------+--------------+-----------+------------+  
	
	SELECT convert(0x3F USING latin1)   latin_value,
	       convert(0xB0AE USING gb2312) gb2312_value,
	       convert(0xB0AE USING gbk)    gbk_value,
	       convert(0xE788B1 USING utf8) utf8_value;
   
	+-------------+--------------+-----------+------------+
	| latin_value | gb2312_value | gbk_value | utf8_value |
	+-------------+--------------+-----------+------------+
	| ?           | 爱           | 爱        | 爱         |
	+-------------+--------------+-----------+------------+

二、乱码测试

1、环境准备

	# grep -Ev "^#|^$" /etc/my.cnf   -- 查看当前my.cnf配置
	
	mysql> show variables like 'version';
	+---------------+------------+
	| Variable_name | Value      |
	+---------------+------------+
	| version       | 5.7.23-log |
	+---------------+------------+

	mysql> show variables like '%character%';
	
	+--------------------------+----------------------------+
	| Variable_name            | Value                      |
	+--------------------------+----------------------------+
	| character_set_client     | gbk                        |
	| character_set_connection | gbk                        |
	| character_set_database   | latin1                     |
	| character_set_filesystem | binary                     |
	| character_set_results    | gbk                        |
	| character_set_server     | latin1                     |
	| character_set_system     | utf8                       |
	| character_sets_dir       | /usr/share/mysql/charsets/ |
	+--------------------------+----------------------------+

	DROP TABLE IF EXISTS sakila.colum_charset;
	
	CREATE TABLE sakila.colum_charset
	(
	   id int not null auto_increment primary key,
	   c1 varchar(20),
	   c2 char(20) CHAR SET gbk,
	   c3 varchar(20) CHARSET gb2312,
	   c4 char(20) CHARACTER SET utf8,
	   c5 varchar(20) CHARSET utf8mb4
	);
	
	mysql> show create table sakila.colum_charset\G
	*************************** 1. row ***************************
	       Table: colum_charset
	Create Table: CREATE TABLE `colum_charset` (
	  `id` int(11) NOT NULL AUTO_INCREMENT,
	  `c1` varchar(20) DEFAULT NULL,
	  `c2` char(20) CHARACTER SET gbk DEFAULT NULL,
	  `c3` varchar(20) CHARACTER SET gb2312 DEFAULT NULL,
	  `c4` char(20) CHARACTER SET utf8 DEFAULT NULL,
	  `c5` varchar(20) CHARACTER SET utf8mb4 DEFAULT NULL,
	  PRIMARY KEY (`id`)
	) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=latin1

2、基于默认字符插入数据(gbk)
– character_set_client gbk
– character_set_connection gbk
– character_set_results gbk

	INSERT INTO sakila.colum_charset(id,
	                                 c1,
	                                 c2,
	                                 c3,
	                                 c4,
	                                 c5)
	VALUES (NULL,'爱','爱','爱','爱','爱');
	        
	ERROR 1366 (HY000): Incorrect string value: '\xB0\xAE' for column 'c1' at row 1

	INSERT INTO sakila.colum_charset(id,
	                                 c1,
	                                 c2,
	                                 c3,
	                                 c4,
	                                 c5)
	VALUES (NULL,'love','爱','爱','爱','爱');
	Query OK, 1 row affected (0.00 sec)
	
	mysql> select * from sakila.colum_charset;
	+----+------+------+------+------+------+
	| id | c1   | c2   | c3   | c4   | c5   |
	+----+------+------+------+------+------+
	|  1 | love | 爱   | 爱   | 爱   | 爱   |
	+----+------+------+------+------+------+

3、三个变量全部设置为utf8插入数据
mysql> set names ‘utf8’;
Query OK, 0 rows affected (0.00 sec)

	mysql> show variables like '%character%';
	+--------------------------+----------------------------+
	| Variable_name            | Value                      |
	+--------------------------+----------------------------+
	| character_set_client     | utf8                       |
	| character_set_connection | utf8                       |
	| character_set_database   | latin1                     |
	| character_set_filesystem | binary                     |
	| character_set_results    | utf8                       |
	| character_set_server     | latin1                     |
	| character_set_system     | utf8                       |
	| character_sets_dir       | /usr/share/mysql/charsets/ |
	+--------------------------+----------------------------+
	
	INSERT INTO sakila.colum_charset(id,
	                                 c1,
	                                 c2,
	                                 c3,
	                                 c4,
	                                 c5)
	VALUES (NULL,'heart','心','心','心','心');        
	Query OK, 1 row affected (0.00 sec)
	
	mysql> select * from sakila.colum_charset;
	+----+-------+------+------+------+------+
	| id | c1    | c2   | c3   | c4   | c5   |
	+----+-------+------+------+------+------+
	|  1 | love  | 爱   | 爱   | 爱   | 爱   |
	|  2 | heart | 心   | 心   | 心   | 心   |
	+----+-------+------+------+------+------+

	INSERT INTO sakila.colum_charset(id,
	                                 c1,
	                                 c2,
	                                 c3,
	                                 c4,
	                                 c5)
	VALUES (NULL,'heart','屌','屌','屌','屌');  -- c3列为gb2312编码
	        
	ERROR 1366 (HY000): Incorrect string value: '\xE5\xB1\x8C' for column 'c3' at row 1

4、单个变量character_set_connection设置为latin1插入数据
mysql> set character_set_connection=latin1;

	INSERT INTO sakila.colum_charset(id,
	                                 c1,
	                                 c2,
	                                 c3,
	                                 c4,
	                                 c5)
	VALUES (NULL,'heart','情','情','情','情');
	Query OK, 1 row affected, 4 warnings (0.00 sec)
	
	mysql> show warnings \G
	*************************** 1. row ***************************
	  Level: Warning
	   Code: 1300
	Message: Invalid utf8 character string: '\xE6\x83\x85'
	
	-- 乱码出现
	mysql> select * from sakila.colum_charset;
	+----+-------+------+------+------+------+
	| id | c1    | c2   | c3   | c4   | c5   |
	+----+-------+------+------+------+------+
	|  1 | love  | 爱   | 爱   | 爱   | 爱   |
	|  2 | heart | 心   | 心   | 心   | 心   |
	|  3 | heart | ?    | ?    | ?    | ?    |
	+----+-------+------+------+------+------+        

5、单个变量character_set_connection设置为gb2312插入数据
mysql> set character_set_connection=gb2312;

	INSERT INTO sakila.colum_charset(id,
	                                 c1,
	                                 c2,
	                                 c3,
	                                 c4,
	                                 c5)
	VALUES (NULL,'heart','屌','屌','屌','屌');
	
	Query OK, 1 row affected, 4 warnings (0.00 sec)
    
	mysql> select * from sakila.colum_charset;
	+----+-------+------+------+------+------+
	| id | c1    | c2   | c3   | c4   | c5   |
	+----+-------+------+------+------+------+
	|  1 | love  | 爱   | 爱   | 爱   | 爱   |
	|  2 | heart | 心   | 心   | 心   | 心   |
	|  3 | heart | ?    | ?    | ?    | ?    |
	|  4 | heart | ?    | ?    | ?    | ?    |
	+----+-------+------+------+------+------+

6、单个变量character_set_results设置为latin1
– 测试返回数据

	mysql> set character_set_results=latin1;
	Query OK, 0 rows affected (0.00 sec)
	
	mysql> select * from sakila.colum_charset;
	+----+-------+------+------+------+------+
	| id | c1    | c2   | c3   | c4   | c5   |
	+----+-------+------+------+------+------+
	|  1 | love  | ?    | ?    | ?    | ?    |
	|  2 | heart | ?    | ?    | ?    | ?    |
	|  3 | heart | ?    | ?    | ?    | ?    |
	|  4 | heart | ?    | ?    | ?    | ?    |
	+----+-------+------+------+------+------+
	4 rows in set (0.00 sec)

6、单个变量character_set_results设置为gb2312
– 测试返回数据

	mysql> set character_set_results=gb2312;
	Query OK, 0 rows affected (0.00 sec)
	
	mysql> select * from sakila.colum_charset;
	+----+-------+------+------+------+------+
	| id | c1    | c2   | c3   | c4   | c5   |
	+----+-------+------+------+------+------+
	|  1 | love  |    |    |    |    |
	|  2 | heart |    |    |    |    |
	|  3 | heart | ?    | ?    | ?    | ?    |
	|  4 | heart | ?    | ?    | ?    | ?    |
	+----+-------+------+------+------+------+
	4 rows in set (0.00 sec)

6、单个变量character_set_results设置为gbk
– 测试返回数据
mysql> set character_set_results=gbk;
Query OK, 0 rows affected (0.00 sec)

	mysql> select * from sakila.colum_charset;
	+----+-------+------+------+------+------+
	| id | c1    | c2   | c3   | c4   | c5   |
	+----+-------+------+------+------+------+
	|  1 | love  |    |    |    |    |
	|  2 | heart |    |    |    |    |
	|  3 | heart | ?    | ?    | ?    | ?    |
	|  4 | heart | ?    | ?    | ?    | ?    |
	+----+-------+------+------+------+------+
	4 rows in set (0.00 sec)

7、单个变量character_set_results设置为utf8
– 测试返回数据

	mysql> set character_set_results=utf8;
	Query OK, 0 rows affected (0.00 sec)
	
	mysql> select * from sakila.colum_charset;
	+----+-------+------+------+------+------+
	| id | c1    | c2   | c3   | c4   | c5   |
	+----+-------+------+------+------+------+
	|  1 | love  | 爱   | 爱   | 爱   | 爱   |
	|  2 | heart | 心   | 心   | 心   | 心   |
	|  3 | heart | ?    | ?    | ?    | ?    |
	|  4 | heart | ?    | ?    | ?    | ?    |
	+----+-------+------+------+------+------+
	4 rows in set (0.00 sec)             

8、本地环境变量影响客户端字符集设定
– 在my.cnf中未配置客户端字符集,如果配置后,则使用配置文件中设定的字符集

	[root@centos7 ~]# export LANG=en_US.UTF-8
	[root@centos7 ~]# mysql -e "show variables like 'character%'"
	+--------------------------+----------------------------+
	| Variable_name            | Value                      |
	+--------------------------+----------------------------+
	| character_set_client     | utf8                       |
	| character_set_connection | utf8                       |
	| character_set_database   | utf8                       |
	| character_set_filesystem | binary                     |
	| character_set_results    | utf8                       |
	| character_set_server     | utf8                       |
	| character_set_system     | utf8                       |
	| character_sets_dir       | /usr/share/mysql/charsets/ |
	+--------------------------+----------------------------+
	
	[root@centos7 ~]# export LANG=zh_CN.GBK
	[root@centos7 ~]# mysql -e "show variables like 'character%'"
	+--------------------------+----------------------------+
	| Variable_name            | Value                      |
	+--------------------------+----------------------------+
	| character_set_client     | gbk                        |
	| character_set_connection | gbk                        |
	| character_set_database   | utf8                       |
	| character_set_filesystem | binary                     |
	| character_set_results    | gbk                        |
	| character_set_server     | utf8                       |
	| character_set_system     | utf8                       |
	| character_sets_dir       | /usr/share/mysql/charsets/ |
	+--------------------------+----------------------------+
	
	[root@centos7 ~]# export LANG=zh_CN.GB2312
	[root@centos7 ~]# mysql -e "show variables like 'character%'"
	+--------------------------+----------------------------+
	| Variable_name            | Value                      |
	+--------------------------+----------------------------+
	| character_set_client     | gb2312                     |
	| character_set_connection | gb2312                     |
	| character_set_database   | utf8                       |
	| character_set_filesystem | binary                     |
	| character_set_results    | gb2312                     |
	| character_set_server     | utf8                       |
	| character_set_system     | utf8                       |
	| character_sets_dir       | /usr/share/mysql/charsets/ |
	+--------------------------+----------------------------+

==========================================================
结论:
character_set_client: 客户端发送的数据是什么编码?
character_set_connection: 告诉字符集转换器,转换成什么编码?
character_set_results: 查询的结果用什么编码?
如果以上三者都为字符集N,可简写为set names ‘N’;

乱码产生的原因如下:
a、插入或读取时对应编码环节发生转换导致数据丢失。
b、如果两个字符集之间无法进行无损编码转换,一定会出现乱码。

解决方案:
1、一定要保证character_set_connection字符集大于等于client字符集,否则会丢失数据
比如: latin1 < gb2312 < gbk < utf8,
若设置set character_set_client = gb2312,
那么至少connection的字符集要大于等于gb2312,否则就会丢失数据
2、一定要保证character_set_results大于等于数据存入的字符集,否则会丢失数据
比如:如存储的字符为utf8,而返回character_set_results为gbk,数据被截断

3、所有变量使用统一的字符编码,如utf8或者utf8mb4

你可能感兴趣的:(-----MySQL相关特性)