AL32UTF8 / UTF8 (Unicode) Database Character Set Implications [ID 788156.1]

AL32UTF8 / UTF8 (Unicode) Database Character Set Implications [ID 788156.1]

In this Document

Purpose
Scope
Details
A) Often asked questions:
A.1) Do I need to use Nchar, Nvarchar2 or Nclob?
A.2) Does my Operating System need to support Unicode or do I need to install character sets on OS level?
A.3) What are the Unicode character sets and the Unicode versions in Oracle RDBMS?
A.4) Is -insert language or character here- supported/defined/known in an Oracle AL32UTF8/UTF8 database?
A.5) I also want to upgrade to a new Oracle version, do I go to AL32UTF8 before or after the upgrade?
B) Server side implications.
B.1) Storage.
B.2) How much will my database grow when going to AL32UTF8?
B.3) Codepoints for characters may change in AL32UTF8.
B.4) The meaning of SP2-0784, ORA-29275 and ORA-600 [kole_t2u], [34] errors / losing characters when using convert.
B.5) Going to AL32UTF8 from another NLS_CHARACTERSET.
B.6) ORA-01401 / ORA-12899 while importing data in an AL32UTF8 database (or move data using dblinks).
B.7) Object and user names using non-US7ASCII characters.
B.8) The password of an user can only contain single byte data in 10g and below.
B.9) When using DBMS_LOB.LOADFROMFILE.
B.10) When using UTL_FILE
B.11) When using sqlldr or external tables.
B.12) Make sure you do not store "binary" ( pdf , doc, docx, jpeg, png , etc files) or Encrypted data (passwords) in character datatypes (CHAR, VARCHAR2, LONG, CLOB).
B.13) String functions work with characters not byte (length,like,substr ...).
B.14) LPad and Rpad count in "display units" not characters.
B.15) Using LIKE and INSTR.
B.16) Character functions that are returning character values might silently truncate data.
B.17) Column size triple when using Materialized Views / CTAS trough database link.
B.18) When fetching data from non-AL32UTF8 databases using cursors (PL/SQL)
B.19) When using HTMLDB.
B.20) When using non-US7ASCII names in directory's or file names.
B.21) When using XDB (xmltype).
B.22) Upper and NLS_upper give unexpected results on the Micro symbol or turkish i and I characters.
B.23) Lower and NLS_lower do not handle Greek Sigma Uppercase / capital Σ to lowercase conversion based on position of the Sigma symbol.
B.24) After going to AL32UTF8 ORA-24816: Expanded non LONG bind data supplied after actual LONG or LOB column error may be seen
B.25) What is the impact on CPU and memory usage?
C) The Client side.
C.1) Common misconceptions about NLS_LANG.
C.2) Configuring your UNIX client to be an UTF-8 (Unicode) client.
C.3) Configuring your Microsoft Windows client to be an UTF-8 (Unicode) client.
C.4) The default column width of output in sqlplus will change.
C.5) Configuring your web based client to be a Unicode client.
C.6) Using Sqlplus to run scripts inserting non-US7ASCII data.
C.7) Spooling files using sqlplus is much slower using NLS_LANG set to UTF8 or AL32UTF8
C.8) Using Oracle Applications.
C.9) Using Portal.
C.10) Oracle Forms PDF and Unicode
C.11) Changing a database to AL32UTF8 hosting an OracleAS 10g Metadata Repository.
D) Known Issues
E) Other useful references
References

Applies to:

Oracle Server - Standard Edition - Version 8.0.3.0 and later
Oracle Server - Enterprise Edition - Version 8.0.3.0 and later
Information in this document applies to any platform.

Purpose

To provide some practical hints on how to deal with the effects of moving to an AL32UTF8 database character set and using Unicode clients.
To do the actual conversion to AL32UTF8 see Note 260192.1 Changing the NLS_CHARACTERSET to AL32UTF8/UTF8 (Unicode) or Note 1272374.1 The Database Migration Assistant for Unicode (DMU) Tool

While this note is written for going to AL32UTF8/UTF8 most of the facts are also applicable when changing to any other Multibyte characterset (ZHS16GBK, ZHT16MSWIN950, ZHT16HKSCS, ZHT16HKSCS31,KO16MSWIN949, JA16SJIS ...), simply substitute AL32UTF8 with the xx16xxxx target characterset. But in that case going to AL32UTF8 would be simply a far better idea.Choosing a database character set means choosing Unicode

Scope

This note ignores any difference between AL32UTF8 and UTF8 and uses AL32UTF8, all information in this note is however the same for UTF8.
Choosing between UTF8 and AL32UTF8 (in 9i and up) is discussed here: Note 237593.1 Problems connecting to AL32UTF8 databases from older versions (8i and lower)
Basically, if your setup is completely (!) (all clients and all servers) 9i or higher, use AL32UTF8 as NLS_CHARACTERSET (unless there are restrictions posed by the Application layer/vendor like for example Oracle Applications lower than Version 12).
If there are older 8i or lower clients use UTF8 and not AL32UTF8 as NLS_CHARACTERSET.

IMPORTANT: Do NOT use Expdp/Impdp when going to (AL32)UTF8 or an other multibyte characterset on ALL 10g versions lower than 10.2.0.4 (including 10.1.0.5). Also 11.1.0.6 is affected.
It will provoke data corruption unless you applied Patch 5874989 on the Impdp side, Expdp is not affected. The "old" exp/imp tools are not affected. This problem is fixed in the 10.2.0.4 and 11.1.0.7 patch set.
For windows the fix is included in
10.1.0.5.0 Patch 20 (10.1.0.5.20P) or later, see Note 276548.1
10.2.0.3.0 Patch 11 (10.2.0.3.11P) or later, see Note 342443.1

Details

A) Often asked questions:

A.1) Do I need to use Nchar, Nvarchar2 or Nclob?

People often think that data types like NCHAR, NVARCHAR2 or NCLOB (NLS_NCHAR_CHARACTERSET / National Character set data types) need to be used to have UNICODE support in Oracle.

This is simply not true.

The NLS_NCHAR_CHARACTERSET defines the encoding of NCHAR, NVARCHAR2 and NCLOB columns and is in 9i and up always Unicode (see Note 276914.1 The National Character Set in Oracle 9i 10g and 11g )
The NLS_CHARACTERSET defines the encoding of "normal" CHAR, VARCHAR2 , LONG and CLOB columns, these can also be used for storing Unicode. In that case an AL32UTF8 or UTF8 NLS_CHARACTERSET database is needed.

if your database gives this result

SQL> select value from NLS_DATABASE_PARAMETERS where parameter='NLS_CHARACTERSET';

VALUE
----------------------------------------
AL32UTF8

When using AL32UTF8 all "normal" CHAR, VARCHAR2 , LONG and CLOB data types are "Unicode" and you can store any language in the world in CHAR, VARCHAR2 , LONG and CLOB data types.

It is not possible to use AL16UTF16 as NLS_CHARACTERSET, AL16UTF16 can only be used as NLS_NCHAR_CHARACTERSET, see Note:276914.1 The National Character Set in Oracle 9i 10g and 11g
Often it is assume that simply using NCHAR, NVARCHAR2 or NCLOB is "less work" to make an application "Unicode" than by changing the NLS_CHARACTERSET, however N-types are rather poorly supported in (other vendor) programming languages and on application level in general. In order to use N-types explicit support for N-type on the client application/programming side is needed.
Hence why Oracle advices to not use N-types but an AL32UTF8 or UTF8 NLS_CHARACTERSET.

A.2) Does my Operating System need to support Unicode or do I need to install character sets on OS level?

For an Unicode database Oracle does not need "Unicode support" from the OS where the database is running on because the Oracle AL32UTF8 implementation is not depending on OS features.
It's for example perfectly possible to run/use an AL32UTF8 database on an Unix system that has not installed any UTF-8 locale. It's however advisable to configure you OS to use UTF-8 so that you can use this environment as UTF-8 *client*.

There is also no need to "install Unicode" or so for the Oracle database/client software itself, all character sets known in an Oracle version , and this includes Unicode character sets, are always installed. You simply cannot choose to not install them.
Note that is about using the Oracle definitions ( using AL32UTF8 as NLS_CHARACTERSET or as NLS_LANG), if you want to use for example sqlplus as an UTF-8 client on a Unix system than you may need OS UTF-8 support from the OS to have the application working properly.

A.3) What are the Unicode character sets and the Unicode versions in Oracle RDBMS?

For information on the Unicode character sets in Oracle and the versions of Unicode supported please see: Note 260893.1 Unicode character sets in the Oracle database

Choosing between UTF8 and AL32UTF8 (in 9i and up) is discussed here: Note 237593.1 Problems connecting to AL32UTF8 databases from older versions (8i and lower)
Basically, if your setup is completely (!) (all clients and all servers) 9i or higher, use AL32UTF8 as NLS_CHARACTERSET (unless there are restrictions posed by the Application layer/vendor like for example Oracle Applications lower than Version 12).
If there are older 8i or lower clients use UTF8 and not AL32UTF8 as NLS_CHARACTERSET.

A.4) Is -insert language or character here- supported/defined/known in an Oracle AL32UTF8/UTF8 database?

The short answer, when using AL32UTF8, is "yes".
For some languages like HKCSC2004 UTF8 may not be ideal. If you want to be 100% sure check the Unicode version of the Oracle release and have a look at http://www.unicode.org or Note 1051824.6 What languages are supported in an Unicode (UTF8/AL32UTF8) database?
Most likely it's a bigger question if the client environment can support the language in question than an AL32UTF8 database.

Note that the support to store a language in an Unicode database is not related and has nothing to with the Oracle Installer "Product Language" choice. The installed "Product Language" refers to translation of the database messages. See note 985974.1 Changing the Language of RDBMS (Error) Messages
Even if you only install the English "product language" you can store any language in an AL32UTF8 database.

A.5) I also want to upgrade to a new Oracle version, do I go to AL32UTF8 before or after the upgrade?

If your current Oracle version is 8.1.7 or lower than it's best to upgrade first to a higher release, mainly because a) you than can use AL32UTF8 (not possible in 8i) and b) Csscan has a few issues in 817 who might provoke confusion.

If your current Oracle version is 9i or up than both (before or after) are a good choice, it simply depends on your preference or needed application changes. We would however advice to not do the upgrade and the character set change at the same time, simply to be able to trace issues who might arise to or the upgrade or the character set change. Since doing an upgrade or character set change NEEDS proper testing and Q&A this is of course less relevant for production systems, the changes are than already well tested.

Please do not try to run csscan in the lower version , upgrade and than run csalter. After the upgrade you need to run csminst.sql and csscan again.

Upgrading to 11.2.0.3 (or higher) before doing the migration to AL32UTF8 might be a good idea so the new Database Migration Assistant for Unicode (DMU) can be used instead of Csscan/csalter seen DMU can do the conversion to AL32UTF8 without need to export /import (a part of) the dataset.
The DMU tool is supported against Oracle 11.2.0.3 and higher and selected older versions and platform. combinations.
For more information please see Note 1272374.1 The Database Migration Assistant for Unicode (DMU) Tool and the DMU pages on OTN.
From Oracle 12c onwards, the DMU will be the only tool available to migrate to Unicode.

B) Server side implications.

B.1) Storage.

AL32UTF8 is a varying width characterset, which means that the code for 1 character can be 1 , 2 , 3 or 4 bytes long. This is a big difference with character sets like WE8ISO8859P1 or WE8MSWIN1252 where 1 character is always 1 byte.

US7ASCII characters (A-Z,a-Z,0-1 and ./?,*# etc..) are in AL32UTF8 always 1 byte, so for most West European languages the impact is rather limited for the whole dataset as only "special" characters will use more bytes than in a 8 bit characterset and they are not that often used (compared to A-Z) in most Western Languages..
When converting a Cyrillic or Arabic system to AL32UTF8 , seen all the Cyrillic or Arabian data will take considerable more bytes to store the impact on the whole dataset will be bigger.

note that ANY character OTHER than US7ASCII (A-Z,a-Z,0-1 and ./?,*# ..) will take more "bytes" to store the same character, so on COLUMN level this may have a big impact

columns need to be big enough to store the additional bytes. By default the column size is defined in BYTES and not in CHARACTERS. By default a "create table ( VARCHAR2 (2000));" means that that column can store 2000 bytes.

From 9i onwards it's possible to define the column length with the number of characters you want to store, regardless of the characterset.
How this works, what the limits and current known problems are is explained in Note 144808.1 Examples and limits of BYTE and CHAR semantics usage

More info on how AL32UTF8 encoding works can be found in Note 69518.1 Storing and Checking Character Codepoints in a UTF8/AL32UTF8 (Unicode) database.

Note: UTF8 can be 1, 2, 3 or 6 bytes / character, any AL32UTF8 character that is 4 bytes will be stored as 2 time 3 bytes in UTF8. The amount of "real life" characters that will be 6 bytes is limited and will only exist when using (some) chinese characters - see Note 69518.1 and note 787371.1

B.2) How much will my database grow when going to AL32UTF8?

The biggest expansion will be seen with CLOB's, if the source database is a 8 bit characterset (WE8ISO8859P1, WE8MSWIN1252 etc) than populated Clob columns will double in disk size.See Note 257772.1 CLOBs and NCLOBs character set storage in Oracle Release 8i, 9i, 10g and 11g.

An Estimation of the expansion is listed in the Csscan .txt file output under the Expansion header. See note 444701.1 Csscan output explained
We advice to use Csscan always when going to AL32UTF8, see point B.5).

For non-CLOB the expansion is typically a few % for West European databases seen most characters are actually US7ASCII characters.
Databases storing other language groups like Arabic, Cyrillic etc will see an overall higher amount of data expansion than West European databases.

B.3) Codepoints for characters may change in AL32UTF8.

There is a common misconception that a character is always the same code, for example the pound sign is often referred as "code 163" character. This is not correct, a character is a certain code only in a certain characterset (!). The code itself means nothing if you do not know what characterset you are using.

The difference may look small, but it's not.

The pound sign for example is indeed "code 163" ( A3 in hex) in the WE8ISO8859P1 and WE8MSWIN1252 charactersets, but in AL32UTF8 the pound sign is code 49827 (C2 A3 in hex).
When using chr(163) in a AL32UTF8 database the 163 code is a illegal character, as 163 simply does not exist in UTF8, the pound sign is chr(49827) in an UTF8/AL32UTF8 system.

So be careful when using for example the CHR() function, the code for a character depends on the database characterset!

Instead of CHR() it's far better to use Unistr('\codepoint>'). Unistr() (a 9i new feature) works always on every characterset that knows the character. There is for example no need to change the Unistr value for the Euro symbol when changing from WE8MSWIN1252 to AL32UTF8.

For more info on how to check/find the code for a character in AL32UTF8 and using Unistr please see Note 69518.1 Storing and Checking Character Codepoints in an UTF8/AL32UTF8 (Unicode) database

Only US7ASCII (A-Z,a-z,0-9) characters have the same codepoints in AL32UTF8 as in US7ASCII, WE8ISO8859P1, AR8MSWIN1256 etc. meaning that using chr() for any value above 128 should be best avoided.

B.4) The meaning of SP2-0784, ORA-29275 and ORA-600 [kole_t2u], [34] errors / losing characters when using convert.

If you receive errors like "SP2-0784: Invalid or incomplete character beginning 0xC4 returned" or "ORA-29275: partial multibyte character" or "ORA-600 [kole_t2u], [34]" than this means that you are storing data in a character datatype that is NOT using AL32UTF8 encoding.
Not often seen but an "ORA-00911: invalid character" or "ORA-24812: character set conversion to or from UCS2 failed" when using an AL32UTF8 db means in most cases the same thing as the other errors.

The way UTF8/AL32UTF8 works (see Note 69518.1 Storing and Checking Character Codepoints in an UTF8/AL32UTF8 (Unicode) database ) give some easy way's to check a (big part) of the validity of the data stream, for example a code between 00 and 79 (hexadecimal notation) can only be followed by another between 00 and 79 or a code between C2 and EF. If the code following a code between 00 and 79 is for example C1 than there is something wrong.
An other example is that you cannot have after a code between C2 and EF a code between 00 and 79 or if you have a code between 00 hex and 79 followed with a code between C2 and EF you know there NEED to be a 3the code between 80 and BF ,etc etc...
If one of those "rules" is violated Oracle will give a ORA-29275 Error

The ORA-29275: partial multibyte character error is a result of Oracle is doing "sanity" checks on character strings on very low level to see if the code sequence is valid AL32UTF8, the checks cannot catch 100% of the cases, so some illegal code sequences may not be detected, it depends entirely on the dataset hence why the error is often seen "randomly" and may not appear for some rows.
They are enhanced in every version, the checks in 11g are far better/stricter than in 9i for example.
This is done to avoid wrong result sets from functions and to reduce the risk of injection problems leading to security problems.

With Clob data you will not encounter ORA-29275 but ORA-600 [kole_t2u], [34]. See Note 734474.1 ORA-600 [kole_t2u], [34] - description, bugs, and reasons

SP2-0784 is a pure client side error/warning returned by sqlplus, it means the same as ORA-29275.

Note that those errors cannot be "turned off" and nor should they be. They error indicate a serious problem with your setup which needs to be resolved.

Any character data type like CHAR, VARCHAR2 , LONG and CLOB expect the data to be in the encoding defined by the NLS_CHARACTERSET. Storing data in an encoding that is not the NLS_CHARACTERSET is not supported. Any data using a encoding different from the NLS_CHARACTERSET should be considered BINARY and a BINARY data types like RAW or BLOB should be used to store and process (!) this.
Most of the time this is seen with "encrypted" (passwords etc) data stored in a VARCHAR2. If this is happening with "encrypted" data than see Note 1297507.1 Problems with (Importing) Encrypted Data After Character Set Change Using Other NLS_CHARACTERSET Database or Upgrading the (client) Oracle Version

This is also true when using the "convert" function by the way, any conversion from the NLS_CHARACTERSET to a other character set should be considered as binary data as the result is not in the NLS_CHARACTERSET. When using the convert function you might even see errors like ORA-12703: this character set conversion is not supported.
Note that this is expected behaviour and that the convert function should not be used in normal application logic. If data needs to be stored in non-UTF8 encoding UTL_RAW.CAST_TO_RAW and UTL_RAW.CONVERT should be used and the result should be stored as RAW or BLOB.

There is only one solution and that is to use CHARACTER datatypes for what they are designed for, store data in the NLS_CHARACTERSET encoding.
If you want to write out files in an other characterset than AL32UTF8 use UTL_FILE, see point B.10)

To find all *stored* data that might give these errors an Unicode client like Sqldeveloper can be used to check the data an update it.
For finding non-AL32UTF8 codes in an database Csscan can be used.
install this first
Note 458122.1 Installing and configuring CSSCAN in 8i and 9i
Note 745809.1 Installing and configuring CSSCAN in 10g and 11g

The run Csscan with this syntax:

$ csscan \"sys/@as sysdba\" FULL=Y FROMCHAR=TOCHAR=LOG=dbcheck CAPTURE=N ARRAY=1000000 PROCESS=2
* Always run Csscan connecting with a 'sysdba' connection/user,do not use "system" or "csmig" user.
* The is seen in NLS_DATABASE_PARAMETERS.
select value from NLS_DATABASE_PARAMETERS where parameter='NLS_CHARACTERSET';
* the TOCHAR= is not a typo, the idea is to check the CURRENT character set for codes who are not defined in this NLS_CHARACTERSET
* Above syntax will do a FULL database scan, if you want to scan only certain tables or users please see " E) Do I need to always run a full database scan?" in Note 444701.1 Csscan output explained

To have an overview of the Csscan output and what it means please see Note 444701.1 Csscan output explained
Any "Lossy" is data that will give ORA-29275 and/or ORA-600 [kole_t2u], [34],
For encrypted data
If this data is not "encrypted" data but "normal" data it means it is stored using non-UTF8 codes (= bad client config when the data was loaded) this data needs to be CORRECTED trough reload in a correct matter , manual update or , if the actual encoding can be found, using UTL_RAW.CAST_TO_RAW , UTL_RAW.CONVERT and UTL_RAW.CAST_TO_VARCHAR2 and then update the row .
It might not be possible to correct all data and it's impossible to validate in a scripted way if the meaning of the data is correct.

Known Oracle Bugs who can give ORA-29275 or ORA-600 [KOLE_T2U] :

Bug 4562807 ORA-600 [KOLE_T2U] - only related to oracle text, happens on bad input data
Fixed in 10.2.0.4 patchset and 11.1 and higher base release.
Bug 6268409 ORA-29275 ERROR WHEN QUERYING THE SQL_REDO/UNDO COLUNMS IN V$LOGMNR_CONTENTS
Fixed in 10.2.0.5 , 11.1.0.7 and up
Bug 5915741 ORA-29275 selecting from V$SESSION with Multibyte DB
Fixed in 10.2.0.5 , 11.1.0.6 and up
Bug 10334711 ORA-600 [KOLE_T2U] when using EXTENDED feature of AUDIT_TRAIL
Fixed in 11.2.0.3 patchset and 12.1 base release.

B.5) Going to AL32UTF8 from another NLS_CHARACTERSET.

To change a database NLS_CHARACTERSET to AL32UTF8 using csscan/csalter or export/import into a new AL32UTF8 db we suggest to follow Note 260192.1 Changing the NLS_CHARACTERSET to AL32UTF8/UTF8 (Unicode)

Please note that it's strongly recommend to follow above note when changing a WHOLE database to AL32UTF8 , even when using full exp/imp into a new AL32UTF8 db.
If you are exporting/importing only certain users or table(s) between existing databases and the target database is an UTF8 or AL32UTF8 database please see:
Note 1297961.1 ORA-01401 / ORA-12899 While Importing Data In An AL32UTF8 / UTF8 (Unicode) Or Other Multibyte NLS_CHARACTERSET Database.

For migration to AL32UTF8 (and the deprecated UTF8), there is a new tool available called the Database Migration Assistant for Unicode (DMU). The DMU is a unique next-generation migration tool providing an end-to-end solution for migrating your databases from legacy encodings to Unicode. DMU's intuitive user-interface greatly simplifies the migration process and lessens the need for character set migration expertise by guiding the DBA through the entire migration process as well as automating many of the migration tasks.
The DMU tool is supported against Oracle 11.2.0.3 and higher and selected older versions and platform. combinations.
For more information please see Note 1272374.1 The Database Migration Assistant for Unicode (DMU) Tool and the DMU pages on OTN.
From Oracle 12c onwards, the DMU will be the only tool available to migrate to Unicode.

Please see the following note for an Oracle Applications database: Note 124721.1 Migrating an Applications Installation to a New Character Set.
This is the only way supported by Oracle applications. If you have any doubt log an Oracle Applications SR for assistance.

B.6) ORA-01401 / ORA-12899 while importing data in an AL32UTF8 database (or move data using dblinks).

If import give errors like

IMP-00019: row rejected due to ORACLE error 1401
IMP-00003: ORACLE error 1401 encountered
ORA-01401: inserted value too large for column

or from 10g onwards:

ORA-02374: conversion error loading table "TEST"."NTEST"
ORA-12899: value too large for column COMMENT (actual: 6028, maximum: 4000)

or when using a dblink from a non AL32UTF8 db to move data gives ORA-01401 / ORA-12899 than this indicates that the columns cannot handle the "growth in bytes" of the data.

If you are

  • exporting/importing certain users or table(s) between existing databases and one database is an UTF8 or AL32UTF8 database
  • moving data from a nonAL32UTF8 db to an AL32UTF8 db using dblinks

please see: Note 1297961.1 ORA-01401 / ORA-12899 While Importing Data In An AL32UTF8 / UTF8 (Unicode) Or Other Multibyte NLS_CHARACTERSET Database.

If you are trying to convert a WHOLE database to AL32UTF8 using export/import please follow: Note 260192.1 Changing the NLS_CHARACTERSET to AL32UTF8/UTF8 (Unicode)

B.7) Object and user names using non-US7ASCII characters.

We strongly suggest to never use non-US7ASCII names for a database name and a database link name. See also the limitations listed in the documentation set Restrictions on Character Sets Used to Express Names . This means that for any type of names that has "no" under the "Variable Width" column you NEED to use only US7ASCII characters (a-z,A-Z, 1-0) in an AL32UTF8 database. Using non-US7ASCII for these names is not supported. This has to be corrected before going to AL32UTF8. In general avoiding to use non-US7ASCII characters for database objects whenever possible is a very good idea.

(See "Schema Object Naming Rules " in the "Database SQL Reference" part of the docset)

Object names can be from 1 to 30 bytes long with these exceptions:

  • Names of databases are limited to 8 bytes.
  • Names of database links can be as long as 128 bytes.

http://docs.oracle.com/docs/cd/B19306_01/server.102/b14200/sql_elements008.htm#i27570

This select

SQL> select object_name from dba_objects where object_name <> convert(object_name,'US7ASCII');

will return all objects having a non-US7ASCII name. If there are column names, schema objects or comments with non-US7ASCII names that take more than 30 bytes in AL32UTF8 there is no alternative besides renaming the affected objects or user to use a name that will occupy maximum 30 bytes.

An username can be max 30 bytes long

http://docs.oracle.com/docs/cd/B19306_01/server.102/b14200/statements_8003.htm#SQLRF01503

and "..his name can contain only characters from your database character set and must follow the rules described in the section "Schema Object Naming Rules". Oracle recommends that the user name contain at least one single-byte character regardless of whether the database character set also contains multibyte characters."

This select

SQL> select username from dba_users where username <> convert(username,'US7ASCII');

will return all users having a non-US7ASCII name.

B.8) The password of an user can only contain single byte data in 10g and below.

http://docs.oracle.com/docs/cd/B19306_01/server.102/b14200/statements_8003.htm#SQLRF01503

Passwords can contain only single-byte characters from the database character set regardless of whether the character set also contains multibyte characters. This means that in an AL32UTF8 database the user password can only contain US7ASCII characters as this are the only single byte characters in AL32UTF8.
This may provoke a problem, if you migrate from (for example) a CL8MSWIN1251 database where your users can use Cyrillic in their passwords, in CL8MSWIN1251 Cyrillic letters are one single byte, in AL32UTF8 they're not.

Passwords are stored in a hashed way, meaning this will not be seen in the Csscan result. You will need to reset for those clients the password to an US7ASCII string.

This restriction is lifted in 11g, there multi byte characters can be used as password string. Please note that they need to be updated in 11g before they use the new 11g hashing system. Please see Note 429465.1 11g R1 New Feature Case Sensitive Passwords and Strong User Authentication

B.9) When using DBMS_LOB.LOADFROMFILE.

When using DBMS_LOB.LOADFROMFILE please read Note 267356.1 Character set conversion when using DBMS_LOB.

B.10) When using UTL_FILE

When using UTL_FILE please read Note 227531.1 Character set conversion when using UTL_FILE.

B.11) When using sqlldr or external tables.

When using Sqlldr or external tables make sure to define the correct characterset of the file in the control file. The characterset of the database has in no direct relation with the encoding used in the file, in other words, it's not because the database is using an AL32UTF8 characterset that using AL32UTF8 as NLS_LANG or as characterset in the control file is always correct. You need to specify the encoding of the file sqlldr is loading.

Please read Note 227330.1 Character Sets & Conversion - Frequently Asked Questions
18. What is the best way to load non-US7ASCII characters using SQL*Loader or External Tables?

B.12) Make sure you do not store "binary" ( pdf , doc, docx, jpeg, png , etc files) or Encrypted data (passwords) in character datatypes (CHAR, VARCHAR2, LONG, CLOB).

If binary data ( like PDF , doc, docx, jpeg, png , etc files) or encrypted data like hashed/encrypted passwords is stored/handled as a CHAR, VARCHAR2, LONG or CLOB datatype than data loss is expected, especially when using an AL32UTF8 database (even without using exp/imp). Or errors like ORA-29275 or ORA-600 [kole_t2u], [34] may appear.

The ONLY supported data types to store "raw" binary data (like PDF , doc, docx, jpeg, png , etc files) or encrypted data like hashed/encrypted passwords are LONG RAW or BLOB.
If you want to store binary data (like PDF , doc, docx, jpeg, png , etc files) or encrypted data like hashed/encrypted passwords in CHAR, VARCHAR2, LONG or CLOB datatype than this must be converted to a "characterset safe" representation like base64 in the application layer.

Note 1297507.1 Problems with (Importing) Encrypted Data After Character Set Change Using Other NLS_CHARACTERSET Database or Upgrading the (client) Oracle Version
Note 1307346.1 DBMS_LOB Loading and Extracting Binary File To Oracle Database

B.13) String functions work with characters not byte (length,like,substr ...).

Functions like Length and Substr count in characters, not bytes. So in an AL32UTF8 database the result of Length or substr will be different from the amount of BYTES this string uses.
Functions like Substr and Length, who are often used to prepare or limit a string this means application logic, NEED to be checked.

There are of course exceptions like lengthB , substrB and instrB who explicitly deal with bytes. The exact length of a string in BYTES in an AL32UTF8 environment is never known upfront, hence operations based on BYTE length should be avoided.

Note that substrB might generate a different result than expected in a UTF8 env:

-- the euro symbol

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/17252115/viewspace-753243/,如需转载,请注明出处,否则将追究法律责任。

上一篇: Transportable Tablespaces (TTS) for Oracle Database [ID 1461278.2]
下一篇: Manual Database Creation in Oracle9i (Single Instance and RAC)-137288.1
user_pic_default.png
请登录后发表评论 登录
全部评论
<%=items[i].createtime%>

<%=items[i].content%>

<%if(items[i].items.items.length) { %>
<%for(var j=0;j
<%=items[i].items.items[j].createtime%> 回复

<%=items[i].items.items[j].username%>   回复   <%=items[i].items.items[j].tousername%>: <%=items[i].items.items[j].content%>

<%}%> <%if(items[i].items.total > 5) { %>
还有<%=items[i].items.total-5%>条评论 ) data-count=1 data-flag=true>点击查看
<%}%>
<%}%>
<%}%>

转载于:http://blog.itpub.net/17252115/viewspace-753243/

你可能感兴趣的:(AL32UTF8 / UTF8 (Unicode) Database Character Set Implications [ID 788156.1])