Oracle数据库多语言支持

一、NLS Parameter

Oracle called globalization support National Language Support (NLS), 
以下内容整理自Oracle Online Document以及http://www.dbapool.com/forumthread/topic_6463.html

可以从Session, Instance, Database三个级别来对NLS参数进行设置。
Session级别的设置覆盖Instance级别的设置,Instance级别的设置覆盖Database级别的设置。
相应地,Oracle提供NLS_SESSION_PARAMETERS,NLS_INSTANCE_PARAMETERS,NLS_DATABASE_PARAMETERS这3个视图来查看NLS参数。
此外,还可以在SQL函数中指定NLS参数,这些函数称作locale-dependent。

A) NLS_SESSION_PARAMETERS
此视图中的参数可以通过"ALTER SESSION "语句来进行设置。
ALTER SESSION SET parameter_name=parameter_value [ parameter_name=parameter_value ]... ;
如果没有通过"ALTER SESSION "语句设置过,那么使用NLS_LANG环境变量的定义值。(NLS_LANG环境变量下面详细说明)
如果NLS_LANG也没有被定义,那么沿用NLS_INSTANCE_PARAMETERS中的设定值。
这些参数影响到客户端的错误信息显示语言;日期,数字,货币等的显示格式;排序结果等。

B) NLS_INSTANCE_PARAMETERS
此视图中中的参数设定是数据库启动时参照init.ora中的设定而来。 可以通过"ALTER SYSTEM"语句更改。
NLS_LANGUAGE设定决定了alert.log和trace files中的Server错误信息的语言。

C) NLS_DATABASE_PARAMETERS
此视图中的参数是创建数据库时决定的。数据库创建好以后不能改动,不要通过修改系统表格来修改设定。
如果INSTANCE和SESSION参数没有设定,那么默认使用这里的参数。
NLS_CHARACTERSET and NLS_NCHAR_CHARACTERSET 参数是创建数据库时决定的,不能被INSTANCE和SESSION参数覆盖,不要在创建完数据库后试图修改它们,这种尝试会损坏数据库。(关于这2个参数下面有详细说明)

注1:NLS_DATABASE_PARAMETERS视图等价于SELECT name,value$ from sys.props$ where name like '%NLS%'。
注2:V$NLS_PARAMETERS视图给出了当前的SESSION参数以及NLS_DATABASE_PARAMETERS视图中的2个CHARACTERSET参数。
注3:select userenv ('language') from dual; select sys_context('userenv','language') from dual;这2个语句执行结果等同于
          <当前Session的Language>_<当前Session的Territory> .<数据库的字符集NLS_CHARACTERSET>,
          这个结果貌似NLS_LANG,但是实际上不是,NLS_LANG定义的是客户端的字符集。
注4:select userenv ('lang') from dual; 返回的是Session参数NLS_LANGUAGE的简写。
注5:V$NLS_VALID_VALUES视图返回NLS_LANGUAGE, NLS_SORT, NLS_TERRITORY, NLS_CHARACTERSET的有效取值。

D) Locale-dependent SQL Function
SQL functions whose behavior depends on the values of NLS parameters are called locale-dependent.
依赖于NLS参数的SQL函数包括:
•TO_CHAR
•TO_DATE
•TO_NUMBER
•NLS_UPPER
•NLS_LOWER
•NLS_INITCAP
•NLSSORT
In all of the preceding functions, you can specify these NLS parameters:以上函数中可以指定的参数包括:
•NLS_DATE_FORMAT
•NLS_DATE_LANGUAGE
•NLS_NUMERIC_CHARACTERS
•NLS_CURRENCY
•NLS_ISO_CURRENCY
•NLS_DUAL_CURRENCY
•NLS_CALENDAR
•NLS_SORT
此外对于NLSSORT函数,还可以指定以下参数。
•NLS_LANGUAGE
•NLS_TERRITORY
按照'parameter=value' ['parameter=value']...的格式来指定参数。
例如,
select to_char(sysdate,'DD-MON-YYYY','NLS_DATE_LANGUAGE=''SIMPLIFIED CHINESE''') from dual;结果返回 "04-8月 -2011"
select * from employees order by nlssort(name,'NLS_SORT=SCHINESE_PINYIN_M'); 

E) NLS_LANG
NLS_LANG这个参数比较特殊,它是通过环境变量(UNIX)或者注册表(WINDOWS)定义的。它分三部分,格式如下:
    NLS_LANG = <Language>_<Territory>.<Charset>
<Language>部分设定了SESSION的NLS_LANGUAGE参数;
_<Territory>部分设定了SESSION的NLS_TERRITORY参数;
.<Charset>部分设定了CLIENT的字符集。它会影响到如何转换客户端的字符集到数据库字符集。
这三部分可以只定义任意一个/几个部分。
如果只有_<Territory>部分,<Language>默认等于AMERICAN;
如果只有<Language>部分,_<Territory>的值根据<Language>的值来决定;
如果既没有_<Territory>部分也没有<Language>部分,这2部分认等于AMERICAN_AMERICA;
如果没有定义NLS_LANG,那么默认值<Language>_<Territory>.US7ASCII,其中<Language>_<Territory>部分根据NLS_INSTANCE_PARAMETERS中的值来决定。

NLS_LANG参数不能通过ALTER SESSION或者ALTER SYSTEM来更改(但是可以更改NLS_LANGUAGE和NLS_TERRITORY),也不能定义在init.ora中。
数据库创建的时候定义此参数也不影响数据库的设定,包括不会影响数据库的NLS_CHARACTERSET and NLS_NCHAR_CHARACTERSET 参数。
NLS_LANG参数的客户端字符集定义部分<Charset>不能通过查询任何系统表格或视图得到。
(感觉实际上这个参数应该只会影响到客户端,也就是NLS_SESSION_PARAMETERS)

NLS_LANG直接设定了NLS_LANGUAGE和NLS_TERRITORY参数,还间接设定了其他的NLS参数。
NLS_DATE_LANGUAGE 和 NLS_SORT 根据NLS_LANGUAGE决定取值;
NLS_CURRENCY, NLS_DUAL_CURRENCY, NLS_ISO_CURRENCY, NLS_DATE_FORMAT, NLS_TIMESTAMP_FORMAT, NLS_TIMESTAMP_TZ_FORMAT, NLS_NUMERIC_CHARACTERS 根据NLS_TERRITORY决定取值。

注1:NLS_LANG在WINDOWS注册表中的位置:HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\KEY_xxx


There are 3 levels at which you can set NLS parameters: Database, Instance and Session.
If a parameter is defined at more than one level then the rules on which one takes precedence are quite straightforward:
1. NLS database settings are superseded by NLS instance settings
2. NLS database & NLS instance settings are superseded by NLS session settings

Applications can check the session, instance, and database NLS parameters by querying the following data dictionary views:
A) NLS_SESSION_PARAMETERS
NLS_SESSION_PARAMETERS shows the NLS parameters and their values for the session that is querying the view.
It does not show information about the character set.
The values of NLS parameters set by "ALTER SESSION ". If there is no explicit "ALTER SESSION " statement done then it reflects the setting of the corresponding NLS parameter on the client derived from the NLS_LANG variable.

The NLS_LANG parameter has three components: language, territory, and character set. Specify it in the following format, including the punctuation:
NLS_LANG = <Language>_<Territory>.<Charset>
If NLS_LANG is specified with only the <Territory> part then AMERICAN is used as default <Language>.
If NLS_LANG is specified with only the <Language> part then the <Territory> defaults to a setting based on <Language>.
If NLS_LANG is specified without the <Language>_<Territory> part then the <Language>_<Territory> part defaults to AMERICAN_AMERICA.
If the NLS_LANG is not set at all, then it defaults to <Language>_<Territory>.US7ASCII and the values for the <Language>_<Territory> part used are the ones found in
NLS_INSTANCE_PARAMETERS.

The language and territory components of the NLS_LANG parameter determine the default values for other detailed NLS parameters, such as date format, numeric characters, and linguistic sorting.
If NLS_DATE_LANGUAGE or NLS_SORT are not set then they are derived from NLS_LANGUAGE.
If NLS_CURRENCY, NLS_DUAL_CURRENCY, NLS_ISO_CURRENCY, NLS_DATE_FORMAT, NLS_TIMESTAMP_FORMAT, NLS_TIMESTAMP_TZ_FORMAT, NLS_NUMERIC_CHARACTERS are not set then they are derived from NLS_TERRITORY

* NLS_LANG cannot be changed by ALTER SESSION, NLS_LANGUAGE and NLS_TERRITORY can.
* The <clients characterset> part of NLS_LANG is NOT shown in any system table or view.
* On Windows you have two possible options, normally the NLS_LANG is set in the registry, but it can also be set in the environment, however this is not often done and generally not recommended to do so. The value in the environment takes precedence over the value in the registry and is used for ALL Oracle_Homes on the server if defined as a system environment variable.
* Oracle strongly recommends that you set the NLS_LANG on the client at least to NLS_LANG=.<clients characterset>

B) NLS_INSTANCE_PARAMETERS
NLS_INSTANCE_PARAMETERS shows the current NLS instance parameters that have been explicitly set and the values of the NLS instance parameters.
These are the settings in the init.ora of the database at the moment that the database was started or set through ALTER SYSTEM.
If the parameter is not explicitly set in the init.ora or defined by ALTER SYSTEM then its value is NOT derived from a "higher" parameter (we are talking about parameters like NLS_SORT that derive a default from NLS_LANGUAGE in NLS_SESSION_PARAMETERS, this is NOT the case for NLS_INSTANCE_PARAMETERS)
* NLS_LANG is not an init.ora parameter; NLS_LANGUAGE and NLS_TERRITORY are so you need to set NLS_LANGUAGE and NLS_TERRITORY separately.
* You cannot define the <clients characterset> or NLS_LANG in the init.ora.The client characterset is defined by the NLS_LANG on the client OS (see above).
* You cannot define the database characterset in the init.ora. The database characterset is defined by the "Create Database" command.
* These settings take precedence above the NLS_DATABASE_PARAMETERS.
* These values are used for the NLS_SESSION_PARAMETERS if the client the NLS_LANG is NOT set.
* The NLS_LANGUAGE in the instance parameters also declares the language for the server error messages in alert.log and in trace files.

C) NLS_DATABASE_PARAMETERS
NLS_DATABASE_PARAMETERS shows the values of the NLS parameters for the database. The values are stored in the database.
Defaults to AMERICAN_AMERICA if there are no parameters explicitly set in the init.ora during database creation time. If there is parameters set in the init.ora during database creation you see them here. There is no way to change these after the database creation. Do NOT attempt to update system tables to bypass these settings! These settings are used to give the database a default if the INSTANCE and SESSION parameters are not set.
The database (national) character set NLS_(NCHAR)_CHARACTERSET) is defined by the "Create Database" command.
* The NLS_CHARACTERSET and NLS_NCHAR_CHARACTERSET parameters cannot be overridden by instance or session parameters.
They are defined by the value specified in the "CREATE DATABASE command and are not intended to be changed afterwards dynamically. Do NOT update system tables to change the character set. This can corrupt your database and potentially make it impossible to open the database again.
* Setting the NLS_LANG during the creation of the database does not influence the NLS_DATABASE_PARAMETERS.
* The NLS_LANG set during the database creation has NO impact on the database National Characterset.

 

二、使用Unicode
在需要支持多国语言的环境里,使用Unicode可以轻松解决数据存储问题。
Oracle中支持以下几种Unicode编码
•AL32UTF8
使用1,2,3个字节进行编码,扩展字符集使用4个字节。
The AL32UTF8 character set supports the latest version of the Unicode standard. It encodes characters in one, two, or three bytes. Supplementary characters require four bytes. It is for ASCII-based platforms.

•UTF8
使用1,2,3个字节进行编码,扩展字符集使用看作2个字符处理,占6个字节。Oracle建议使用AL32UTF8作为数据库字符集。
The UTF8 character set encodes characters in one, two, or three bytes. It is for ASCII-based platforms.
Supplementary characters inserted into a UTF8 database do not corrupt the data in the database. A supplementary character is treated as two separate, user-defined characters that occupy 6 bytes. Oracle recommends that you switch to AL32UTF8 for full support of supplementary characters in the database character set.

•UTFE
The UTFE character set is for EBCDIC platforms. It is similar to UTF8 on ASCII platforms, but it encodes characters in one, two, three, and four bytes. Supplementary characters are converted as two 4-byte characters.

•UTF-16
使用2个字节进行编码,扩展字符集使用4个字节。
One character can be either 2 bytes or 4 bytes in UTF-16. Characters from European and most Asian scripts are represented in 2 bytes. Supplementary characters are represented in 4 bytes. UTF-16 is the main Unicode encoding used for internal processing by Java since version J2SE 5.0 and by Microsoft Windows since version 2000.

(查看Unicode编码的一个方法:按Alt + X 组合键,MS Word会将光标前面的字符同其十六进制的四位 Unicode 编码进行互相转换)

Oracle中支持Unicode有2种方式
可以使用2种方式来支持Unicode。
• Unicode Database。创建DB的时候指定数据库字符集为AL32UTF8 ( CREATE DATABASE...CHARACTER SET AL32UTF8),SQL字符类型(CHAR, VARCHAR2, CLOB, and LONG)的字段中存储的就是UTF-8编码的Unicode数据。

• Unicode Data Type。指定列或变量的类型为national character data types。包括NCHAR, NVARCHAR2, and NCLOB。national character data types可以使用UTF8 or AL16UTF16作为字符集,默认为AL16UTF16。创建数据库的时候可以指定(CREATE DATABASE...NATIONAL CHARACTER SET AL16UTF16/UTF8)。
注意,NCHAR和NVARCHAR2的长度是字符数,不是字节数。
如果使用AL16UTF16字符集,NCHAR和NVARCHAR2的最大长度分别是1000和2000个字符。The maximum length limits for the NCHAR and NVARCHAR2 columns are 1000 and 2000 characters, respectively. Because the data is fixed-width, the lengths are guaranteed.( 实际上,如果包含Unicode扩展字符,最大长度也有可能到不了1000和2000?)
如果使用UTF8字符集,NCHAR和NVARCHAR2的最大长度分别是2000和4000个字符。
但是实际可以存储的字符数量还是受限于2000和4000个字节。例如NVARCHAR2(2000),如果存储1000个汉字(3000字节)+1000个英文没有问题,但是如果存储2000个汉字(6000字节)就不行了。

You can store Unicode characters in an Oracle Database in two ways:
• You can create a Unicode database that enables you to store UTF-8 encoded characters as SQL character data types (CHAR, VARCHAR2, CLOB, and LONG).
• You can declare columns and variables that have SQL national character data types.
The SQL national character data types are NCHAR, NVARCHAR2, and NCLOB. They are also called Unicode data types, because they are used only for storing Unicode data.
The national character set, which is used for all SQL national character data types, is specified when the database is created. The national character set can be either UTF8 or AL16UTF16 (default).
When you declare a column or variable of the type NCHAR or NVARCHAR2, the length that you specify is the number of characters, not the number of bytes.

查看数据库字符集和National Character字符集的方法。
select * from v$nls_parameters;
NLS_CHARACTERSET - 数据库字符集
NLS_NCHAR_CHARACTERSET - National Character字符集

选择Unicode Database还是Unicode Data Type,Oracle给出的建议:
以下情况建议使用Unicode Database

Situation

Explanation

You need easy code migration for Java or PL/SQL.
Java和PL/SQL代码迁移方便

If your existing application is mainly written in Java and PL/SQL and your main concern is to minimize the code changes required to support multiple languages, then you may want to use a Unicode database solution. If the data types used to stored data remain as SQL CHAR data types, then the Java and PL/SQL code that accesses these columns does not need to change.

You have evenly distributed multilingual data.
很多地方使用多语言数据

If the multilingual data is evenly distributed in existing schema tables and you are not sure which tables contain multilingual data, then you should use a Unicode database because it does not require you to identify the kind of data that is stored in each column.

Your SQL statements and PL/SQL code contain Unicode data.
PL/SQL的代码中包含Unicode数据

You must use a Unicode database. SQL statements and PL/SQL code are converted into the database character set before being processed. If the SQL statements and PL/SQL code contain characters that cannot be converted to the database character set, then those characters are lost. A common place to use Unicode data in a SQL statement is in a string literal.

You want to store multilingual documents in BLOB format and use Oracle Text for content searching.
在BLOG类型中存储多语言的文档

You must use a Unicode database. The BLOB data is converted to the database character set before being indexed by Oracle Text. If your database character set is not UTF8, then data is lost when the documents contain characters that cannot be converted to the database character set.


以下情况建议使用Unicode Data Type

Situation Explanation

You want to add multilingual support incrementally.
逐步增加多语言支持

If you want to add Unicode support to the existing database without migrating the character set, then consider using Unicode data types to store Unicode data. You can add columns of the SQLNCHAR data types to existing tables or new tables to support multiple languages incrementally.

You want to build a packaged application.
提供组件给用户

If you are building a packaged application to sell to customers, then you may want to build the application using SQLNCHAR data types. The SQLNCHAR data type is a reliable Unicode data type in which the data is always stored in Unicode, and the length of the data is always specified in UTF-16 code units. As a result, you need to test the application only once. The application will run on customer databases with any database character set.

You want better performance with single-byte database character sets.

If performance is your main concern, then consider using a single-byte database character set and storing Unicode data in the SQLNCHAR data types.如果效率很重要,考虑使用single-byte的数据库字符集,Unicode数据使用NCHAR的数据类型存储。

You require UTF-16 support in Windows clients.
Windows客户端使用UTF-16

If your applications are written in Visual C/C++ or Visual Basic running on Windows, then you may want to use the SQLNCHAR data types. You can store UTF-16 data in SQLNCHAR data types in the same way that you store it in thewchar_t buffer in Visual C/C++ andstring buffer in Visual Basic. You can avoid buffer overflow in client applications because the length of thewchar_t andstring data types match the length of the SQLNCHAR data types in the database.


======================================================================================

在需要支持多国语言,包括文档存储的应用中,最好的办法是Unicode Database+ Unicode Data Type的方式。

实验环境:
NLS_CHARACTERSET = AL32UTF8
NLS_NCHAR_CHARACTERSET = AL16UTF16

测试对中日韩三种语言进行支持,

字符串 UTF8编码 UTF16编码
中国 E4 B8 AD, E5 9B BD 4E2D, 56FD
ペン E3 83 9A, E3 83 B3 30DA, 30F3
한글 ED 95 9C, EA B8 80 D55C, AE00


create table tbl_varchar(col varchar2(6));
insert into tbl_varchar values('中国');
insert into tbl_varchar values('ペン');
insert into tbl_varchar values('한글');
commit;

查看数据库中存储的数据,
select dump(col, 16) from tbl_varchar;
DUMP(COL,16)
-------------------------------
Typ=1 Len=6: e4,b8,ad,e5,9b,bd
Typ=1 Len=6: e3,83,9a,e3,83,b3
Typ=1 Len=6: ed,95,9c,ea,b8,80
可以看到是以UTF8格式存储,每个字符都占用了3个字节。
下面验证varchar2的长度是按照字节指定的。
insert into tbl_varchar values('中国A');
执行后,得到错误结果:
SQL Error: ORA-12899: value too large for column "HR"."TBL_VARCHAR"."COL" (actual: 7, maximum: 6)
'中国A'虽然是3个字符,但是如果用UTF8编码,是7个字节。

create table tbl_nvarchar(col nvarchar2(2));
insert into tbl_nvarchar values('中国');
insert into tbl_nvarchar values('ペン');
insert into tbl_nvarchar values('한글');
commit;

查看数据库中存储的数据,
select dump(col, 16) from tbl_nvarchar;
DUMP(COL,16)
------------------------
Typ=1 Len=4: 4e,2d,56,fd
Typ=1 Len=4: 30,da,30,f3
Typ=1 Len=4: d5,5c,ae,0

可以看到是以UTF16格式存储,每个字符都占用了2个字节。

接下来尝试一下中文排序功能。
select value from v$nls_valid_values where parameter='SORT' and value like '%SCHINESE%';
返回可用的中文排序方法
SCHINESE_PINYIN_M    -- 按照拼音排序
SCHINESE_STROKE_M -- 按照笔划(第一顺序)、部首(第二顺序)排序
SCHINESE_RADICAL_M -- 按照部首(第一顺序)、笔划(第二顺序)排序

先插入数据,
delete from tbl_nvarchar;
insert into tbl_nvarchar values('荀彧');
insert into tbl_nvarchar values('荀攸');
insert into tbl_nvarchar values('蒋干');
insert into tbl_nvarchar values('董卓');
insert into tbl_nvarchar values('刘备');
insert into tbl_nvarchar values('刘表');
insert into tbl_nvarchar values('郭嘉');

按照拼音排序结果:
alter session set nls_sort='SCHINESE_PINYIN_M';
select * from tbl_nvarchar order by col;
COL 
--- 
董卓 
郭嘉 
蒋干 
刘备 
刘表 
荀攸 
荀彧 

按照笔画排序结果:
alter session set nls_sort='SCHINESE_STROKE_M';
select * from tbl_nvarchar order by col;
COL 
--- 
刘表 
刘备 
荀攸 
荀彧 
郭嘉 
董卓 
蒋干 

按照部首排序结果:
alter session set nls_sort='SCHINESE_RADICAL_M';
select * from tbl_nvarchar order by col;
COL 
--- 
刘表 
刘备 
荀攸 
荀彧 
董卓 
蒋干 
郭嘉

NLS_COMP和NLS_SORT参数
Oracle默认是采用binary进行排序,这对于例如中文的排序来说,是不恰当的。
使用这两个参数可以指定排序的方法,比如拼音或是,要注意可能会引起性能问题。
解决方法是使用NLSSORT函数来建立一个函数索引。

NLS_COMP = { BINARY | LINGUISTIC | ANSI }
BINARY
Normally, comparisons in the WHERE clause and in PL/SQL blocks is binary unless you specify theNLSSORT function.
LINGUISTIC
Comparisons for all SQL operations in the WHERE clause and in PL/SQL blocks should use the linguistic sort specified in theNLS_SORT parameter.To improve the performance, you can also define a linguistic index on the column for which you want linguistic comparisons.
ANSI
A setting of ANSI is for backwards compatibility; in general, you should setNLS_COMP toLINGUISTIC


NLS_SORT = { BINARY | linguistic_definition
}
NLS_SORT specifies the collating sequence for character value comparison in various SQL operators and clauses, for example, ORDER BY, GROUP BY, comparison conditions (=, <>, <=, >=), IN, BETWEEN, LIKE, MIN/MAX, GREATEST/LEAST, and INSTR.

If the value is BINARY, then comparison is based directly on byte values in the binary encoding of the character values being compared. The ordering depends on the character set of the compared values, which is either the database character set (for VARCHAR2, CHAR, LONG, and CLOB) or the national character set (for NVARCHAR2, NCHAR, and NCLOB).

If the value is a named linguistic sort, then comparison is defined by this sort. A linguistic sort uses various rules to achieve ordering expected by speakers of one or more natural languages. This is usually the same ordering that is used in dictionaries and/or telephone directories in those languages.

The BINARY comparison is faster and uses less resources than any linguistic comparison but for text in a natural language, it does not provide ordering expected by users.
The value of NLS_SORT affects execution plans of queries. Because a standard index cannot be used as a source of values sorted in a linguistic order, an explicit sort operation must usually be performed instead of an index range scan. A functional index on the NLSSORT function may be defined to provide values sorted in a linguistic order and reintroduce the index range scan to the execution plan.

下面做个测试:
首先来看看可用的中文排序方法:
select value from v$nls_valid_values where parameter='SORT' and value like '%SCHINESE%';
返回可用的中文排序方法
SCHINESE_PINYIN_M -- 按照拼音排序
SCHINESE_STROKE_M -- 按照笔划(第一顺序)、部首(第二顺序)排序
SCHINESE_RADICAL_M -- 按照部首(第一顺序)、笔划(第二顺序)排序

下面来做测试。
create index emp_ename_idx on emp(ename);
现在察看select * from emp where ename='Mike';的执行计划,
explain plan for select * from emp where ename='Mike';
select * from table(dbms_xplan.display);
可以看到oracle会走索引emp_ename_idx。

---------------------------------------------------------------------------------------------
| Id  | Operation                   | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |               |     1 |    46 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| EMP           |     1 |    46 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | EMP_ENAME_IDX |     1 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ENAME"=U'Mike')


alter session set nls_comp='LINGUISTIC';
alter session set nls_sort='SCHINESE_PINYIN_M';
之后再查看执行计划,得到结果:

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    46 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| EMP  |     1 |    46 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(NLSSORT("ENAME",'nls_sort=''SCHINESE_PINYIN_M''')=HEXTORAW
              ('0230021B022301FE0000020202020007020202') )


可以看到,现在走全表扫描了。
接下来,如果创建如下的索引
create index emp_ename_idx_nlssort on emp(nlssort(ename,'NLS_SORT=SCHINESE_PINYIN_M'));
再查看执行计划,发现又走索引emp_ename_idx_nlssort了。

-----------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                       |     1 |    46 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| EMP                   |     1 |    46 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | EMP_ENAME_IDX_NLSSORT |     1 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access(NLSSORT("ENAME",'nls_sort=''SCHINESE_PINYIN_M''')=HEXTORAW('0230021B022301FE000
              0020202020007020202') )

你可能感兴趣的:(oracle,nls)