使用UTF8字符集存储中文生僻字
一、相关学习BLOG
https://www.cnblogs.com/jyzhao/p/8654412.html http://blog.itpub.net/781883/viewspace-1411259/ https://www.qqxiuzi.cn/bianma/zifuji.php https://blog.csdn.net/iteye_7853/article/details/82516888
二、需求详情:
客户提出,关于氮卓斯汀变更为氮䓬斯汀,系统出现乱码问题
产生问题的原因为:oracle数据库字符集为:ZHS16GBK,对于部分生僻字是无法正常保存的。
三、客户提出的解决方案:
1. 修改数据库字符集为:UTF-8。此方法需对oracle字符集进行修改,但修改后,可能会将原有数据全部变成乱码。
2. 程序改造:将所有会涉及到生僻字的字段(例如产品名称、通用名等),存入数据库时,转码为16进制存,然后读取时再进行解码后展示到页面。此方法涉及修改代码庞大,且数据库内容可读性很差,手动刷数据、导出数据难度也很大。
四、解决思路:
1)直接修改数据库字符集,除非是子集修改为超集,否则不建议修改,从上述链接blog可以发现强行将db字符集从gbk修改为utf8后,plsql登录提示存在字符不匹配现象;
2)应用程序修改,代码量大,且可读写性太差;
3)建议将生僻字业务表,迁移至utf8 db库中存储(与开发人员沟通,实际存储生僻字的表只有20余个,可以单独对这些表进行迁移,业务修改查询表的代码(通过db_link),或者直接连接新的db,再或者通过创建db_link+同义词指向迁移后的远程表进行查询不修改应用代码(应用不修改,无感知);
五、实验测试
1.测试环境导出业务表
2.导入到UTF8环境下,进行读写测试
5.1源环境导出
修改字符集报错 SQL> alter database character set al32utf8; alter database character set al32utf8 * 第 1 行出现错误: ORA-12712: 新字符集必须为旧字符集的超集 SQL> select * from nls_database_parameters where parameter like '%CHARACTERSET%' PARAMETER VALUE ------------------------------ ------------------------------ NLS_CHARACTERSET ZHS16GBK NLS_NCHAR_CHARACTERSET AL16UTF16 SQL> conn scott/tiger SQL> create table test(id int,c_name varchar2(200)); 表已创建。 SQL> insert into test values(1,'板蓝根'); SQL> insert into test values(2,'氮䓬斯汀'); SQL> commit; SQL> insert into test values(3,'氮卓斯汀'); SQL> commit; SQL> select * from test ID C_NAME ---------- -------------------- 1 板蓝根 2 氮?斯汀 3 氮卓斯汀 C:\Users\Thinkpad>exp scott/tiger FILE=C:\Users\Thinkpad\Desktop\temp\hr_test.dmp TABLES=test Export: Release 11.2.0.4.0 - Production on 星期三 6月 26 13:20:58 2019 Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved. 连接到: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options 已导出 ZHS16GBK 字符集和 AL16UTF16 NCHAR 字符集 即将导出指定的表通过常规路径... . . 正在导出表 TEST导出了 3 行 成功终止导出, 没有出现警告。
5.2目标环境导入
SQL> select * from nls_database_parameters where parameter like '%CHARACTERSET%'; PARAMETER VALUE ------------------------------ ------------------------------ NLS_CHARACTERSET AL32UTF8 NLS_NCHAR_CHARACTERSET AL16UTF16 $env|grep LANG NLS_LANG=american_america.ZHS16GBK LANG=en_US.UTF-8 enmo:/home/oracleimp scott/tiger file=/home/oracle/hr_test.dmp full=y Import: Release 11.2.0.4.0 - Production on Wed Jun 26 01:27:22 2019 Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options Export file created by EXPORT:V11.02.00 via conventional path import done in ZHS16GBK character set and AL16UTF16 NCHAR character set import server uses AL32UTF8 character set (possible charset conversion) . importing SCOTT's objects into SCOTT . importing SCOTT's objects into SCOTT . . importing table "TEST" 3 rows imported Import terminated successfully without warnings. SQL> select * from test; ID C_NAME ---------- ------------------------------ 1 2 3 ˹͡ 以上Oracle进行字符转换后,中文字符直接配置为Null 修改语言格式,让Oracle无需进行字符转换 export NLS_LANG=american_america.AL32UTF8 enmo:/home/oracleimp scott/tiger file=/home/oracle/hr_test.dmp full=y Export file created by EXPORT:V11.02.00 via conventional path import done in AL32UTF8 character set and AL16UTF16 NCHAR character set export client uses ZHS16GBK character set (possible charset conversion) . importing SCOTT's objects into SCOTT . importing SCOTT's objects into SCOTT . . importing table "TEST" 3 rows imported Import terminated successfully without warnings. enmo:/home/oraclesqlplus / as sysdba SQL*Plus: Release 11.2.0.4.0 Production on Wed Jun 26 02:40:32 2019 Copyright (c) 1982, 2013, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options
数据验证
SQL> conn scott/tiger Connected. SQL> select * from test; ID C_NAME ------------------------------------ 1 板蓝根 2 氮?斯汀 3 氮卓斯汀 本次数据是有了, UTF8字符集 SQL> select dump('氮卓斯汀') from dual; DUMP('氮卓斯汀') -------------------------------------------------------------- Typ=96 Len=12: 230,176,174,229,141,147,230,150,175,230,177,128 GBK字符集 SQL> select dump('氮卓斯汀') from dual; DUMP('氮卓斯汀') --------------------------------------------- Typ=96 Len=8: 181,170,215,191,203,185,205,161 SQL> desc scott.test 名称 是否为空? 类型 ----------------------------------------- -------- ---------------------------- ID NUMBER(38) C_NAME VARCHAR2(200) 对于两套环境test表字段进行收缩,可以发现UTF8字符集表,实际存储是使用三个字节存储一个汉字 UTF8 SQL> alter table scott.test modify c_name varchar2(8); alter table scott.test modify c_name varchar2(8) * ERROR at line 1: ORA-01441: cannot decrease column length because some value is too big SQL> alter table scott.test modify c_name varchar2(12); Table altered. GBK GBK存储中文两个字节存储一个汉字 SQL> alter table scott.test modify c_name varchar2(8); 表已更改。