1、varchar2类型介绍
1.1 Oracle数据库常用的数据类型
介绍varchar2类型的同时,我们顺便了解下Oracle数据库支持的其他数据类型。
如下摘自Oracle官方文档Built-in Data Type Summary
,简单介绍了Oracle中常用的数据类型。
Code | Data Type | Description | Memo |
---|---|---|---|
1 | VARCHAR2(size [BYTE | CHAR]) |
Variable-length character string having maximum length size bytes or characters. Maximum size is 4000 bytes or characters, and minimum is 1 byte or 1 character. You must specify size for VARCHAR2.BYTE indicates that the column will have byte length semantics. CHAR indicates that the column will have character semantics. | varchar2中指定的长度参数,指的是最多可以存储的长度,实际存储超过这个长度的内容会报错。类型中var 的意思是说,存储小于该长度的内容时,实际占用的存储根据具体存储内容长度而定。比如定义varchar2(10) 实际存储内容为3就占用长度为3的存储空间。相比于定长的char ,可以节省存储空间。 |
1 | NVARCHAR2(size) |
Variable-length Unicode character string having maximum length size characters. The number of bytes can be up to two times size for AL16UTF16 encoding and three times size for UTF8 encoding. Maximum size is determined by the national character set definition, with an upper limit of 4000 bytes. You must specify size for NVARCHAR2. | 主要是用来存储Unicode字符的 |
2 | NUMBER [ (p [, s]) ] |
Number having precision p and scale s. The precision p can range from 1 to 38. The scale s can range from -84 to 127. Both precision and scale are in decimal digits. A NUMBER value requires from 1 to 22 bytes. | |
2 | FLOAT [(p)] |
A subtype of the NUMBER data type having precision p. A FLOAT value is represented internally as NUMBER. The precision p can range from 1 to 126 binary digits. A FLOAT value requires from 1 to 22 bytes. | |
8 | LONG |
Character data of variable length up to 2 gigabytes, or 231 -1 bytes. Provided for backward compatibility. | |
12 | DATE |
Valid date range from January 1, 4712 BC, to December 31, 9999 AD. The default format is determined explicitly by the NLS_DATE_FORMAT parameter or implicitly by the NLS_TERRITORY parameter. The size is fixed at 7 bytes. This data type contains the datetime fields YEAR, MONTH, DAY, HOUR, MINUTE, and SECOND. It does not have fractional seconds or a time zone. | 表示的日期范围可以是公元前4712年1月1日至公元9999年12月31日。You can add and subtract constants to and from a DATE value, and these numbers will be interpreted as numbers of days. For example, SYSDATE+1 will be tomorrow. You cannot multiply or divide DATE values. |
100 | BINARY_FLOAT |
32-bit floating point number. This data type requires 4 bytes. | |
101 | BINARY_DOUBLE |
64-bit floating point number. This data type requires 8 bytes. | |
180 | TIMESTAMP [(fractional_seconds_precision)] |
Year, month, and day values of date, as well as hour, minute, and second values of time, where fractional_seconds_precision is the number of digits in the fractional part of the SECOND datetime field. Accepted values of fractional_seconds_precision are 0 to 9. The default is 6. The default format is determined explicitly by the NLS_TIMESTAMP_FORMAT parameter or implicitly by the NLS_TERRITORY parameter. The size is 7 or 11 bytes, depending on the precision. This data type contains the datetime fields YEAR, MONTH, DAY, HOUR, MINUTE, and SECOND. It contains fractional seconds but does not have a time zone. | |
181 | TIMESTAMP [(fractional_seconds_precision)] WITH TIME ZONE |
All values of TIMESTAMP as well as time zone displacement value, where fractional_seconds_precision is the number of digits in the fractional part of the SECOND datetime field. Accepted values are 0 to 9. The default is 6. The default format is determined explicitly by the NLS_TIMESTAMP_FORMAT parameter or implicitly by the NLS_TERRITORY parameter. The size is fixed at 13 bytes. This data type contains the datetime fields YEAR, MONTH, DAY, HOUR, MINUTE, SECOND, TIMEZONE_HOUR, and TIMEZONE_MINUTE. It has fractional seconds and an explicit time zone. | |
231 | TIMESTAMP [(fractional_seconds_precision)] WITH LOCAL TIME ZONE |
All values of TIMESTAMP WITH TIME ZONE, with the following exceptions: Data is normalized to the database time zone when it is stored in the database. When the data is retrieved, users see the data in the session time zone. The default format is determined explicitly by the NLS_TIMESTAMP_FORMAT parameter or implicitly by the NLS_TERRITORY parameter. The size is 7 or 11 bytes, depending on the precision. | |
182 | INTERVAL YEAR [(year_precision)] TO MONTH |
Stores a period of time in years and months, where year_precision is the number of digits in the YEAR datetime field. Accepted values are 0 to 9. The default is 2. The size is fixed at 5 bytes. | |
183 | INTERVAL DAY [(day_precision)] TO SECOND [(fractional_seconds_precision)] |
Stores a period of time in days, hours, minutes, and seconds, where day_precision is the maximum number of digits in the DAY datetime field. Accepted values are 0 to 9. The default is 2. fractional_seconds_precision is the number of digits in the fractional part of the SECOND field. Accepted values are 0 to 9. The default is 6. The size is fixed at 11 bytes. | |
23 | RAW(size) |
Raw binary data of length size bytes. Maximum size is 2000 bytes. You must specify size for a RAW value. | |
24 | LONG RAW |
Raw binary data of variable length up to 2 gigabytes. | |
69 | ROWID |
Base 64 string representing the unique address of a row in its table. This data type is primarily for values returned by the ROWID pseudocolumn. | |
208 | UROWID [(size)] |
Base 64 string representing the logical address of a row of an index-organized table. The optional size is the size of a column of type UROWID. The maximum size and default is 4000 bytes. | |
96 | CHAR [(size [BYTE | CHAR])] |
Fixed-length character data of length size bytes or characters. Maximum size is 2000 bytes or characters. Default and minimum size is 1 byte. BYTE and CHAR have the same semantics as for VARCHAR2. | |
96 | NCHAR[(size)] |
Fixed-length character data of length size characters. The number of bytes can be up to two times size for AL16UTF16 encoding and three times size for UTF8 encoding. Maximum size is determined by the national character set definition, with an upper limit of 2000 bytes. Default and minimum size is 1 character. | |
112 | CLOB |
A character large object containing single-byte or multibyte characters. Both fixed-width and variable-width character sets are supported, both using the database character set. Maximum size is (4 gigabytes - 1) * (database block size) . |
|
112 | NCLOB |
A character large object containing Unicode characters. Both fixed-width and variable-width character sets are supported, both using the database national character set. Maximum size is (4 gigabytes - 1) * (database block size) . Stores national character set data. |
|
113 | BLOB |
A binary large object. Maximum size is (4 gigabytes - 1) * (database block size) . |
|
114 | BFILE |
Contains a locator to a large binary file stored outside the database. Enables byte stream I/O access to external LOBs residing on the database server. Maximum size is 4 gigabytes. |
需要注意的是,如果在声明varchar2类型长度的时候,没有显式指定是BYTE还是CHAR,缺省的情况取决于数据库参数NLS_LENGTH_SEMANTICS
。具体可以通过如下sql查询,如下是一个例子:
select name,value from v$parameter where name='nls_length_semantics';
结果:
NAME | VALUE |
---|---|
nls_length_semantics | BYTE |
在这个例子中,缺省的情况,就是按照字节长度来计算。
1.2 Oracle中varchar2和varchar的区别
截止现在,varchar和varchar2是没有区别的。
按照ANSI的标准,varchar在Oracle中是被保留的,用来后续支持区分NULL和空字符串。VARCHAR2现在不区分NULL和空字符串,以后也不会区分。如果程序对于NULL和空字符串等同有依赖,就应该使用varchar2类型。
2、Oracle中截取字符串
2.1 substr函数
官网说明如下:
{ SUBSTR
| SUBSTRB
| SUBSTRC
| SUBSTR2
| SUBSTR4
}
(char, position [, substring_length ])
The SUBSTR functions return a portion of char, beginning at character position, substring_length characters long. SUBSTR calculates lengths using characters as defined by the input character set. SUBSTRB uses bytes instead of characters. SUBSTRC uses Unicode complete characters. SUBSTR2 uses UCS2 code points. SUBSTR4 uses UCS4 code points.
If position is 0, then it is treated as 1.
If position is positive, then Oracle Database counts from the beginning of char to find the first character.
If position is negative, then Oracle counts backward from the end of char.
If substring_length is omitted, then Oracle returns all characters to the end of char. If substring_length is less than 1, then Oracle returns null.
char can be any of the datatypes CHAR, VARCHAR2, NCHAR, NVARCHAR2, CLOB, or NCLOB. Both position and substring_length must be of datatype NUMBER, or any datatype that can be implicitly converted to NUMBER, and must resolve to an integer. The return value is the same datatype as char. Floating-point numbers passed as arguments to SUBSTR are automatically converted to integers.
用法介绍:
substr 函数:截取字符串
语法:SUBSTR(string,start, [length])
string:表示源字符串,即要截取的字符串。
start:开始位置,从1开始查找。如果start是负数,则从string字符串末尾开始算起。
length:可选项,表示截取字符串长度。
下面是几个例子:
SELECT '截取所有字符串' func, 'SUBSTR(''Hello SQL!'', 1)' expr, SUBSTR('Hello SQL!', 1) expr_val FROM dual union all
SELECT '从第2个字符开始,截取到末尾' func, 'SUBSTR(''Hello SQL!'', 2)' expr, SUBSTR('Hello SQL!', 2) expr_val FROM dual union all
SELECT '从倒数第4个字符开始,截取到末尾' func, 'SUBSTR(''Hello SQL!'', -4)' expr, SUBSTR('Hello SQL!', -4) expr_val FROM dual union all
SELECT '从第3个字符开始,截取6个字符' func, 'SUBSTR(''Hello SQL!'', 3, 6)' expr, SUBSTR('Hello SQL!', 3, 6) expr_val FROM dual union all
SELECT '从倒数第4个字符开始,截取3个字符' func, 'SUBSTR(''Hello SQL!'', -4, 3)' expr, SUBSTR('Hello SQL!', -4, 3) expr_val FROM dual union all
SELECT '从截取开始位置超过字符串长度' func, 'SUBSTR(''Hello SQL!'', 20, 3)' expr, SUBSTR('Hello SQL!', 20, 3) expr_val FROM dual
结果如下:
注:上图中的-
表示空。
2.2 根据指定字符出现的位置截取字符串
这里需要用到oracle中的instr函数,用法如下。
instr 函数:返回子字符串在源字符串中的位置
语法:INSTR(string,child_string,[start],[show_time])
string:表示源字符串。
child_string:子字符串,即要查找的字符串。
start:可选项,开始位置,默认从1开始。如果为负数,则从右向左检索。
show_time:可选项,表示子字符串第几次出现在源字符串当中,默认第1次,负数则报错。
返回指定字符出现的位置从1开始,如果找不到返回0
下面是几个例子:
SELECT '查找字符串中l出现的位置' func, 'INSTR(''Hello SQL!'', ''l'')' expr, INSTR('Hello SQL!', 'l') expr_val FROM dual union all
SELECT '查找字符串中H出现的位置' func, 'INSTR(''Hello SQL!'', ''H'')' expr, INSTR('Hello SQL!', 'H') expr_val FROM dual union all
SELECT '查找字符串中l出现的位置' func, 'INSTR(''Hello SQL!'', ''l'', 1)' expr, INSTR('Hello SQL!', 'l', 1) expr_val FROM dual union all
SELECT '查找字符串中ll出现的位置' func, 'INSTR(''Hello SQL!'', ''ll'')' expr, INSTR('Hello SQL!', 'll') expr_val FROM dual union all
SELECT '查找字符串中aa出现的位置' func, 'INSTR(''Hello SQL!'', ''aa'')' expr, INSTR('Hello SQL!', 'aa') expr_val FROM dual union all
SELECT '查找字符串中l第2次出现的位置' func, 'INSTR(''Hello SQL!'', ''l'', 1, 2)' expr, INSTR('Hello SQL!', 'l', 1, 2) expr_val FROM dual
结果如下:
结合instr比如我们想截取某个字符出现位置之前的字符,下面是一个例子。
-- 截取空格前的内容
SELECT SUBSTR('Hello SQL!', 1, INSTR('Hello SQL!', ' ')-1) FROM dual --Hello
-- 截取空格后的内容
SELECT SUBSTR('Hello SQL!', INSTR('Hello SQL!', ' ')+1) FROM dual --SQL!
2.3 substrb按字节截取字符串
2.3.1 Oracle数据中文占几个字节
可以通过查看如下参数(查询当前数据库的字符集)来了解当前的oracle环境中一个汉字占几个字节:
SELECT * FROM v$nls_parameters WHERE PARAMETER='NLS_CHARACTERSET';
如果value=ZHS16GBK
,那么一个汉字占用2个字节;如果value=AL32UTF8
,那么一个汉字占用3个字节。
另外一个参数也可以:
select userenv('language') from dual;
结果如下:
列 | USERENV('LANGUAGE') |
---|---|
值 | AMERICAN_AMERICA.AL32UTF8 |
这里AL32UTF8
说明当前环境下,一个汉字占3个字节。
实际上,也可以通过语句进行测试,如下:
select 'length(''美喵泡泡'')' expr, length('美喵泡泡') expr_val from dual union all
select 'lengthb(''美喵泡泡'')' expr, lengthb('美喵泡泡') expr_val from dual
结果如下:
expr | expr_val |
---|---|
length('美喵泡泡') |
4 |
lengthb('美喵泡泡') |
12 |
2.3.2 Oracle多字节数据截取
如下是几个跨字节截取的例子:
select 'dump(''美喵泡泡'')' expr, dump('美喵泡泡') expr_val from dual union all
select 'substrb(''美喵泡泡'', 1, 3)' expr, substrb('美喵泡泡', 1, 3) expr_val from dual union all
select 'dump(substrb(''美喵泡泡'', 1, 3))' expr, dump(substrb('美喵泡泡', 1, 3)) expr_val from dual union all
select 'substrb(''美喵泡泡'', 1, 5)' expr, substrb('美喵泡泡', 1, 5) expr_val from dual union all
select 'dump(substrb(''美喵泡泡'', 1, 5))' expr, dump(substrb('美喵泡泡', 1, 5)) from dual union all
select 'substr(substrb(''美喵泡泡'', 1, 5), 1, 2)' expr, substr(substrb('美喵泡泡', 1, 5), 1, 2) expr_val from dual union all
select 'dump(substr(substrb(''美喵泡泡'', 1, 5), 1, 2))' expr, dump(substr(substrb('美喵泡泡', 1, 5), 1, 2)) expr_val from dual union all
select 'substr(substrb(''美喵泡泡'', 1, 5), 1, 3)' expr, substr(substrb('美喵泡泡', 1, 5), 1, 3) expr_val from dual union all
select 'dump(substr(substrb(''美喵泡泡'', 1, 5), 1, 3))' expr, dump(substr(substrb('美喵泡泡', 1, 5), 1, 3)) expr_val from dual union all
select 'substr(substrb(''美喵泡泡'', 1, 5), 1, 4)' expr, substr(substrb('美喵泡泡', 1, 5), 1, 4) expr_val from dual union all
select 'dump(substr(substrb(''美喵泡泡'', 1, 5), 1, 4))' expr, dump(substr(substrb('美喵泡泡', 1, 5), 1, 4)) expr_val from dual
结果如下:
这个地方还是需要注意的,按照字节截取的时候,落单的两个字节会被oracle识别为两个字符。
参考资料
- Oracle Built-In Data Types
- What is the difference between varchar and varchar2 in Oracle?
- oracle中的数据类型:varchar2(byte)和varchar2(char)
- ORACLE 中汉字占几个字节?