Character Encoding Schemes
Multi-byte encoding schemes are needed for Asian languages because these languages use thousands of characters. A double-byte encoding scheme can support up to 65536 characters. Some multi-byte encoding schemes use the value of the most significant bit to indicate if a byte represents a single-byte character or is the first or second byte of a double-byte character. In other schemes, control codes differentiate single-byte from double-byte characters. A shift-out code indicates that the following bytes are double-byte characters until a shift-in code is encountered.
There are two general groups of encoding schemes, those based on 7-bit ASCII and those based on IBM EBCDIC. Within each group, all schemes normally use the same encoding for the 26 Latin characters (A to Z), but use different encoding for other characters used in languages other than English. ASCII and EBCDIC use different encodings, even for the Latin characters.
--------------------------
Most NLS parameters can be used in three ways
NLS_TERRITORY = FRANCE
setenv NLS_TERRITORY FRANCE
ALTER SESSION SET NLS_TERRITORY = FRANCE
Parameter | Description |
NLS_CALENDAR | Calendar system |
NLS_CURRENCY | Local currency symbol |
NLS_DATE_FORMAT | Default date format |
NLS_DATE_LANGUAGE | Default language for dates |
NLS_ISO_CURRENCY | ISO international currency symbol |
NLS_LANGUAGE | Default language |
NLS_NUMERIC_CHARACTERS | Decimal character and group separator |
NLS_SORT | Character sort sequence |
NLS_SPECIAL_CHARS | |
NLS_TERRITORY | Default territory |
Many different calendar systems are in use throughout the world. NLS_CALENDAR specifies which calendar system Oracle uses.
NLS_CALENDAR can have one of the following values:
For example, if NLS_CALENDAR is set to "Japanese Imperial", the date format is "YY-MM-DD", and the date is February 17, 1907, then the sysdate is displayed as follows:
SELECT SYSDATE FROM DUAL;
SYSDATE
--------
07-02-17
NLS_CURRENCY
This parameter specifies the character string returned by the number format mask L, the local currency symbol, overriding that defined implicitly by NLS_TERRITORY. For example, to set the local currency symbol to "Dfl" (including a space), the parameter should be set as follows:
NLS_CURRENCY = "Dfl "In this case, the query
SELECT TO_CHAR(TOTAL, 'L099G999D99') "TOTAL"FROM ORDERS WHERE CUSTNO = 586would return
TOTAL-------------Dfl 12.673,49You can alter the default value of NLS_CURRENCY by changing its value in the initialization file and then restarting the instance, and you can alter its value during a session using an ALTER SESSION SET NLS_CURRENCY command.
For a complete description of ALTER SESSION, see Oracle7 Server SQL Reference.
NLS_DATE_FORMAT
Defines the default date format to use with the TO_CHAR and TO_DATE functions. The default value of this parameter is determined by NLS_TERRITORY. The value of this parameter can be any valid date format mask, and the value must be surrounded by double quotes. For example:
NLS_DATE_FORMAT = "MM/DD/YYYY"As another example, to set the default date format to display Roman numerals for months, you would include the following line in your initialization file:
NLS_DATE_FORMAT = "DD RM YY"
SELECT TO_CHAR(SYSDATE) CURRDATEFROM DUAL;CURRDATE---------13 II 91The value of this parameter is stored in the tokenized internal date format. Each format element occupies two bytes, and each string occupies the number of bytes in the string plus a terminator byte. Also, the entire format mask has a two-byte terminator. For example, "MM/DD/YY" occupies 12 bytes internally because there are three format elements, two one-byte strings (the two slashes), and the two-byte terminator for the format mask. The tokenized format for the value of this parameter cannot exceed 24 bytes.
Note: The applications you design may need to allow for a variable-length default date format. Also, the parameter value must be surrounded by double quotes: single quotes are interpreted as part of the format mask.
You can alter the default value of NLS_DATE_FORMAT by changing its value in the initialization file and then restarting the instance, and you can alter the value during a session using an ALTER SESSION SET NLS_DATE_FORMAT command.
For a complete description of ALTER SESSION, see Oracle7 Server SQL Reference.
NLS_DATE_ LANGUAGE
This parameter specifies the language for the spelling of day and month names by the functions TO_CHAR and TO_DATE, overriding that specified implicitly by NLS_LANGUAGE. NLS_DATE_LANGUAGE has the same syntax as the NLS_LANGUAGE parameter, and all supported languages are valid values. For example, to specify the date language as French, the parameter should be set as follows:
NLS_DATE_LANGUAGE = FRENCHIn this case, the query
SELECT TO_CHAR(SYSDATE, 'Day:Dd Month yyyy')FROM DUAL;would return
Mercredi:13 Février 1991Month and day name abbreviations are also in the language specified, for example:
Me:13 Fév 1991The default date format also uses the language-specific month name abbreviations. For example, if the default date format is DD-MON-YYYY, the above date would be inserted using:
INSERT INTO tablename VALUES ('13-Fév-1991');The abbreviations for AM, PM, AD, and BC are also returned in the language specified by NLS_DATE_LANGUAGE. Note that numbers spelled using the TO_CHAR function always use English spellings; for example:
SELECT TO_CHAR(TO_DATE('27-Fév-91'),'Day: ddspth Month')FROM DUAL;would return:
Mercredi: twenty-seventh FévrierYou can alter the default value of NLS_DATE_LANGUAGE by changing its value in the initialization file and then restarting the instance, and you can alter the value during a session using an ALTER SESSION SET NLS_DATE_LANGUAGE command.
For a complete description of ALTER SESSION, see Oracle7 Server SQL Reference.
NLS_ISO_CURRENCY
This parameter specifies the character string returned by the number format mask C, the ISO currency symbol, overriding that defined implicitly by NLS_TERRITORY.Local currency symbols can be ambiguous; for example, a dollar sign ($) can refer to US dollars or Australian dollars. ISO Specification 4217 1987-07-15 defines unique "international" currency symbols for the currencies of specific territories (or countries).
For example, the ISO currency symbol for the US Dollar is USD, for the Australian Dollar AUD. To specify the ISO currency symbol, the corresponding territory name is used.
NLS_ISO_CURRENCY has the same syntax as the NLS_TERRITORY parameter, and all supported territories are valid values. For example, to specify the ISO currency symbol for France, the parameter should be set as follows:
NLS_ISO_CURRENCY = FRANCEIn this case, the query
SELECT TO_CHAR(TOTAL, 'C099G999D99') "TOTAL"FROM ORDERS WHERE CUSTNO = 586would return
TOTAL-------------FRF12.673,49For a complete description of ALTER SESSION, see Oracle7 Server SQL Reference.
NLS_NUMERIC_ CHARACTERS
This parameter specifies the decimal character and grouping separator, overriding those defined implicitly by NLS_TERRITORY. The decimal character separates the integer and decimal parts of a number. The grouping separator is the character returned by the number format mask G. For example, to set the decimal character to a comma and the grouping separator to a period, the parameter should be set as follows:
NLS_NUMERIC_CHARACTERS = ",."Both characters are single byte and must be different. Either can be a space.
Note: When the decimal character is not a period (.) or when a group separator is used, numbers appearing in SQL statements must be enclosed in quotes. For example:
INSERT INTO SIZES (ITEMID, WIDTH, QUANTITY)VALUES (618, '45,5', TO_NUMBER('1.234','9G999'));You can alter the default value of NLS_NUMERIC_CHARACTERS by changing its value in the initialization file and then restarting the instance, and you can alter its value during a session using an ALTER SESSION SET NLS_DATE_LANGUAGE command.
For a complete description of ALTER SESSION, see Oracle7 Server SQL Reference.
NLS_SORT
This parameter specifies the type of sort for character data, overriding that defined implicitly by NLS_LANGUAGE.The syntax of NLS_SORT is:
NLS_SORT = { BINARY | name }BINARY specifies a binary sort and name specifies a particular linguistic sort sequence. For example, to specify the linguistic sort sequence called German, the parameter should be set as follows:
NLS_SORT = GermanThe name given to a linguistic sort sequence has no direct connection to language names. Usually, however, each supported language will have an appropriate linguistic sort sequence defined that uses the same name.
Note: Setting the NLS_SORT initialization parameter to BINARY causes a sort to use a full table scan, regardless of the path the optimizer chooses.
You can alter the default value of NLS_SORT by changing its value in the initialization file and then restarting the instance, and you can alter its value during a session using an ALTER SESSION SET NLS_SORT command.
For a complete description of ALTER SESSION, see Oracle7 Server SQL Reference.
A complete list of linguistic definitions is provided in the "Linguistic Definitions" table .
_______________________________________________________________________
NLS Data
This section lists supported languages, territories, storage character sets, Arabic/Hebrew display character sets, linguistic definitions, and calendars.
Table C-2 Oracle Character Sets for Operating System Locales
Operating System Locale
Character Set
The following storage character sets are supported in Oracle Server release 7.3:
The following Arabic/Hebrew display character sets are supported in Oracle Server release 7.3:
Name | Description |
AR8ASMO708PLUS | ASMO 708 Plus 8-bit Latin/Arabic |
AR7ASMO449PLUS | ASMO 449 Plus 7-bit Latin/Arabic |
AR7AMEER | Ameer 7-bit Latin/Arabic |
AR8XBASIC | XBASIC Right-to-Left Arabic Character Set |
AR8NAFITHA711T | Nafitha Enhanced 711 Client 8-bit Latin/Arabic |
AR8SAKHR707T | SAKHR 707 Client 8-bit Latin/Arabic |
AR8MUSSAD768T | Mussa'd Alarabi/2 768 Client 8-bit Latin/Arabic |
AR8ADOS710T | Arabic MS-DOS 710 Client 8-bit Latin/Arabic |
AR8ADOS720T | Arabic MS-DOS 720 Client 8-bit Latin/Arabic |
AR8APTEC715T | APTEC 7 15 Client 8-bit Latin/Arabic |
AR8NAFITHA721T | Nafitha International 721 Client 8-bit Latin/Arabic |
AR7SEDCOT | SEDCO/ESPRIT/DATA GENERAL 7-bit Latin/Arabic |
AR8HPARABIC8T | HP ARABIC8 8-bit Latin/Arabic |
_____________________________________________________________________
摘要至itpub
AL16UTF16 和 UTF8 这两种选择都适用于国家字符集
AL16UFT16 是宽度固定的双字节 Unicode 字符集
UTF8 是宽度可变的、一至三个字节的 Unicode 字符集
欧洲字符在 UTF8 中按一至两个字节存储,而在 AL16UTF16 中按两个字节存储,相比之下,UTF8可以节省空间
亚洲字符在 UTF8 中按三个字节存储,这样,所需的空间比在 AL16UTF16 中要多
AL16UTF16 是宽度固定的编码,因此在执行速度上要比宽度可变的 UTF8 快
翻译的一段:
字符集类型
CREATE DATABASE语句中有CHARACTER SET从句和附加的NATIONAL CHARACTER SET从句用来定义
数据库的字符集和国家字符集。这两个字符集在数据库创建之后都无法修改。如果不指明NATIONAL
CHARACTER SET从句,则国家字符集缺省取数据库字符集。
因为数据库字符集用于标识并装载SQL和PL/SQL源代码,所以数据库字符集必须将EBCDIC或7位ASCII
作为子集。因此,固定宽度,多字节字符集不可能作为数据库字符集,而只能作为国家字符集。数据类型
NCHAR,NVARCHAR2和NCLOB是基本数据类型CHAR,VARCHAR2和BLOB的变体,来指明它们用国家字符集而
不是数据库字符集存储数据。
NCHAR用于使用国家字符集定义固定长度的字符项。
NVARCHAR2用于使用国家字符集定义变长度的字符项。
NCLOB用于使用国家字符集定义字符大对象,来保存固定宽度,多字节字符。
数据库字符集存储变宽度字符,国家字符集存储固定宽度和变宽度多字节字符。
原文
Character Set Types
The CREATE DATABASE statement has the CHARACTER SET clause and the
additional optional clause NATIONAL CHARACTER SET to declare the character set
to be used as the database character set and the national character set. Neither
character set can be changed after creating the database. If no NATIONAL
CHARACTER SET clause is present, the national character set defaults to the
database character set.
Because the database character set is used to identify and to hold SQL and PL/SQL
source code, it must have either EBCDIC or 7-bit ASCII as a subset, whichever is
native to the platform. Therefore, it is not possible to use a fixed-width, multibyte
character set as the database character set, only as the national character set.
The data types NCHAR, NVARCHAR2, and NCLOB are provided to declare columns
as variants of the basic types CHAR, VARCHAR2, and CLOB, to note that they are
stored using the national character set and not the database character set.
• To declare a fixed-length character item that uses the national character set, use the
data type specification NCHAR [(size)].
• To declare a variable-length character item that uses the national character set, use
the data type specification NVARCHAR2 (size).
• To declare a character large object (CLOB) item containing fixed-width, multibyte
characters that uses the national character set, use the data type specification
NCLOB (size).
效率
从上述编码原理中得出的结论是:
1.每个英文字母、数字所占的空间为1 Byte;
2.泛欧语系、斯拉夫语字母占2 Bytes;
3.汉字占3 Bytes。
由此可见UTF8对英文来说是个非常诱人的方案,但对中文来说则不太合算,无论用ANSI还是 Unicode/UCS2来编码都只用2 Bytes,但用UTF8则需要3 Bytes。
以下是一些统计资料,显示用UTF8来储存文件每个字符所需的平均字节:
1.拉丁语系平均用1.1 Bytes;
2.希腊文、俄文、阿拉伯文和希伯莱文平均用1.7 Bytes;
3.其他大部份文字如中文、日文、韩文、Hindi(北印度语)用约3 Bytes;
4.用超过4 Bytes的都是些非常少用的文字符号。