在pre10g的很长时间内,Oracle仅提供exp/imp导入导出工具,虽然这2个实用程序十分有效(现在也是如此),但因为它们受限于client/server模式工具自身的限制,以普通用户程序的身份来运转数以TB计的数据,其才不堪大用!DataPump是10g以后主推的数据抽取/导入工具,不同于exp/imp工具,DataPump是一个服务器端的实用程序,因为运行在服务器上故而DataPump进程可以直接访问数据文件与SGA(无需借shadow进程之手),与exp/imp工具相比使用DataPump可以获得显著的性能改善。DataPump可以通过直接路径或外部表路径这两种方法导出数据;其中直接路径避开了数据库高速缓存。当使用直接路径模式抽取数据时,DataPump从磁盘直接读取数据文件,抽取和格式化文件内容,最后将内容写出到转储文件。该种模式和SGA交互等待少,其导入导出速度直接取决于数据库所在磁盘速度和cpu;因此,直接路径极为快速。 外部表路径模式将使用到数据库的高速缓存buffer cache,通过外部表路径方法导出数据时,DataPump使用普通的SELECT操作将数据块从数据文件中读入buffer cache,为了写出转储文件,数据会在缓存中被格式化。通过外部表路径导入数据时,DataPump根据转储文件的内容构造标准的插入语句,并且通过将数据块从数据文件读至缓存来执行这些语句,插入操作按照标准的样式在缓存中完成;如同任何普通DML操作一样,外部表路径也会同时产生撤销和重做。 DataPump自身会根据对象的复杂性作出使用直接路径还是外部表路径的选择;对于较复杂的对象(后文将列出)而言,为了分解复杂性而必须同SGA进行交互,此情况下Data Pump只能采用外部表模式。我们还可以通过使用access_method参数来控制其行为,当然这仅在我们确认Data Pump作出了错误选择时才有必要。
若满足右列条件EXPDP将采用direct_path即直接路径模式 |
表结构允许使用直接路径模式,举例而言: 表上没有启用针对SELECT操作的fine-grained access control 非队列表(queue table) 表上没有BFILE和opaque类型的列,或包含有opaque列的对象类型 表上没有加密列 表上没有被废弃的旧类型列 若表上存在LONG或LONG RAW类型列,则此列只能是最后一列 使用Expdp执行导出任务时没有为相关表指定QUERY, SAMPLE, or REMAP_DATA等参数 需要导出的表或分区相对较少(多达250M),亦或者表或分区其实很大,但导出任务无法工作在并行模式(未指定parallel参数,或parallel参数设置为1) |
若满足右列条件EXPDP将采用external_table即外部表模式 |
数据结构不满足在直接路径模式下抽取的条件,举例而言: 表上启用了针对SELECT操作的精细粒度控制 队列表 表上包含了BFILE或opaque类型列,或者包含有opaque列的对象类型 表上存在加密列 表上存在被废弃的旧类型列 表上存在LONG或LONG RAW类型列,且不是最后列 数据结构满足使用直接路径模式的条件,但执行导出任务时相关表上指定了QUERY, SAMPLE, or REMAP_DATA等参数 数据结构满足使用直接路径模式的条件,但相关的表或分区相对较大(大于250M),此时并行SQL可以用来加速数据抽取 |
若满足右列条件IMPDP将采用direct_path即直接路径模式 |
数据结构满足使用直接路径模式的条件,举例而言: 当导入某单一表分区时该分区表上没有建立全局索引,这一点也包括分区的对象表 没有基于LOB列建立的域索引(domain index) 非cluster表 表上没有BFILE列或opaque类型列 表上没有嵌入了opaque类型的VARRAY列 表上没有加密列 没有启用补全日志(Supplemental logging)且表上没有LOB类型列 若导入表已预先建立了表建构,则需满足以下条件: 表上没有激活的触发器 并且 若是分区表则应有索引 并且 表上上没有启用针对INSERT操作的精细粒度控制 并且 表上除去check类型约束不存在其他类型约束 并且 表上没有unique的索引 执行导入任务时没有为相关表指定QUERY, REMAP_DATA等参数 需要导入的表或分区相对较小(少于250M),或者表或分区其实很大,但无法以并行模式导入(未指定parallel参数或指定其为1) |
若满足右列条件IMPDP将采用external_table即外部表路径 |
当数据结构不满足在直接路径模式下导入的条件,举例而言: 当导入某单一表分区时该分区表上建有全局索引,这一点也包括分区的对象表 表上有基于LOB列建立的域索引(domain index) cluster表 表上有BFILE列或opaque类型列 表上有嵌入了opaque类型的VARRAY列 表上有加密列 启用了补全日志且表上有至少一个LOB列 若导入表已预先建立了表建构,且满足以下条件: 表上有激活的触发器 或者 是分区表且没有任何索引 或者 表上启用了针对INSERT操作的精细粒度控制 或者 表上除去check类型约束还还有其他类型约束 或者 表上有unique的索引 执行导入任务时有为相关表指定QUERY, REMAP_DATA等参数 数据结构满足使用直接路径模式的条件,但相关的表或分区相对较大(大于250M),此时并行SQL可以用来加速数据导入 |
我们有必要深入了解一下外部表路径究竟是如何工作的:
SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE 10.2.0.4.0 Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production
SQL> exec dbms_workload_repository.create_snapshot;
PL/SQL procedure successfully completed.
/* 执行expdp导出任务,TCACHE表大小为465M,这里显示指定了外部表路径 */
[maclean@rh2 ~]$ expdp maclean/maclean directory=dumps tables=TCACHE dumpfile=tcache1.dmp access_method=external_table
Export: Release 10.2.0.4.0 - 64bit Production on Tuesday, 24 August, 2010 20:31:48
Copyright (c) 2003, 2007, Oracle. All rights reserved.
Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Starting "MACLEAN"."SYS_EXPORT_TABLE_07": maclean/******** directory=dumps tables=TCACHE dumpfile=tcache1.dmp access_method=external_table
Estimate in progress using BLOCKS method...
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Total estimation using BLOCKS method: 544 MB
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
. . exported "MACLEAN"."TCACHE" 465.6 MB 4999999 rows
Master table "MACLEAN"."SYS_EXPORT_TABLE_07" successfully loaded/unloaded
******************************************************************************
Dump file set for MACLEAN.SYS_EXPORT_TABLE_07 is:
/s01/dump/tcache1.dmp
Job "MACLEAN"."SYS_EXPORT_TABLE_07" successfully completed at 20:32:18
/* 这次我们指定了并行度为4 */
[maclean@rh2 ~]$ expdp maclean/maclean directory=dumps tables=TCACHE dumpfile=tcache2.dmp,tcache3.dmp,tcache4.dmp,tcache5.dmp access_method=external_table parallel=4
Export: Release 10.2.0.4.0 - 64bit Production on Tuesday, 24 August, 2010 20:32:49
Copyright (c) 2003, 2007, Oracle. All rights reserved.
Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Starting "MACLEAN"."SYS_EXPORT_TABLE_07": maclean/******** directory=dumps tables=TCACHE dumpfile=tcache2.dmp,tcache3.dmp,tcache4.dmp,tcache5.dmp access_method=external_table parallel=4
Estimate in progress using BLOCKS method...
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Total estimation using BLOCKS method: 544 MB
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
. . exported "MACLEAN"."TCACHE" 465.6 MB 4999999 rows
Master table "MACLEAN"."SYS_EXPORT_TABLE_07" successfully loaded/unloaded
******************************************************************************
Dump file set for MACLEAN.SYS_EXPORT_TABLE_07 is:
/s01/dump/tcache2.dmp
/s01/dump/tcache3.dmp
/s01/dump/tcache4.dmp
Job "MACLEAN"."SYS_EXPORT_TABLE_07" successfully completed at 20:33:04
SQL> exec dbms_workload_repository.create_snapshot;
PL/SQL procedure successfully completed.
SQL> @?/rdbms/admin/awrrpt
接着我们分析该时段内的awr报告!
从AWR报告中,我们不难找到以下语句:
SQL ID 3qwsywpvtdyjp:
CREATE TABLE "ET$088200010001" ( "OWNER", "OBJECT_NAME", "SUBOBJECT_NAME", "OBJECT_ID", "DATA_OBJECT_ID", "OBJECT_TYPE", "CREATED", "LAST_DDL_TIME", "TIMESTAMP", "STATUS", "TEMPORARY", "GENERATED", "SECONDARY" ) ORGANIZATION EXTERNAL ( TYPE ORACLE_DATAPUMP DEFAULT DIRECTORY DUMPS ACCESS PARAMETERS (DEBUG =0 DATAPUMP INTERNAL TABLE "MACLEAN"."TCACHE" JOB ( "MACLEAN", "SYS_EXPORT_TABLE_07", 1) WORKERID 1 PARALLEL 1 VERSION COMPATIBLE ENCRYPTPASSWORDISNULL ) LOCATION ('bogus.dat') ) PARALLEL 1 REJECT LIMIT UNLIMITED AS SELECT /*+ PARALLEL(KU$, 1) */ "OWNER", "OBJECT_NAME", "SUBOBJECT_NAME", "OBJECT_ID", "DATA_OBJECT_ID", "OBJECT_TYPE", "CREATED", "LAST_DDL_TIME", "TIMESTAMP", "STATUS", "TEMPORARY", "GENERATED", "SECONDARY" FROM RELATIONAL("MACLEAN"."TCACHE" ) KU$
SQL ID 1rxax27p7anr3:
CREATE TABLE "ET$088000020001" ( "OWNER", "OBJECT_NAME", "SUBOBJECT_NAME", "OBJECT_ID", "DATA_OBJECT_ID", "OBJECT_TYPE", "CREATED", "LAST_DDL_TIME", "TIMESTAMP", "STATUS", "TEMPORARY", "GENERATED", "SECONDARY" ) ORGANIZATION EXTERNAL ( TYPE ORACLE_DATAPUMP DEFAULT DIRECTORY DUMPS ACCESS PARAMETERS (DEBUG =0 DATAPUMP INTERNAL TABLE "MACLEAN"."TCACHE" JOB ( "MACLEAN", "SYS_EXPORT_TABLE_07", 1) WORKERID 2 PARALLEL 2 VERSION COMPATIBLE ENCRYPTPASSWORDISNULL ) LOCATION ('bogus.dat') ) PARALLEL 2 REJECT LIMIT UNLIMITED AS SELECT /*+ PARALLEL(KU$, 2) */ "OWNER", "OBJECT_NAME", "SUBOBJECT_NAME", "OBJECT_ID", "DATA_OBJECT_ID", "OBJECT_TYPE", "CREATED", "LAST_DDL_TIME", "TIMESTAMP", "STATUS", "TEMPORARY", "GENERATED", "SECONDARY" FROM RELATIONAL("MACLEAN"."TCACHE" ) KU$
以上就是Expdp在导出数据时使用的创建组织类型为ORACLE_DATAPUMP的外部表的SQL语句;后者语句中指定的并行度为2,而非我所指定的4,因该是Oracle考虑到所需导出表的具体大小,设置过高的并行度反而会适得其反,所以对设定值做了修正。
启用了并行的SQL 1rxax27p7anr3执行时间为22.63s,反而要比不使用并行慢;可见在TCACHE表这个数量级,完全没有必要使用并行导出。 接着我们来探究一下外部表路径的导入:
SQL> truncate table tcache;
Table truncated.
SQL> set pagesize 1400;
SQL> set linesize 140;
/* 执行外部表路径的数据导入,并行度1 */
[maclean@rh2 admin]$ impdp maclean/maclean directory=dumps tables=TCACHE content=data_only dumpfile=tcache1.dmp access_method=external_table
/* 在以上SQL执行期间查询V$SQL动态视图的相关语句*/
select sql_text,
sql_id,
cpu_time,
elapsed_time,
disk_reads,
buffer_gets,
rows_processed
from v$sql
where sql_text like '%TCACHE%'
and sql_text not like '%like%';
SQL_TEXT
--------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID CPU_TIME ELAPSED_TIME DISK_READS BUFFER_GETS ROWS_PROCESSED
------------- ---------- ------------ ---------- ----------- --------------
INSERT /*+ SYS_DL_CURSOR */ INTO RELATIONAL("MACLEAN"."TCACHE") ("OWNER","OBJECT_NAME","SUBOBJECT_NAME","OBJECT_ID","DATA_OBJECT_ID","OBJECT
_TYPE","CREATED","LAST_DDL_TIME","TIMESTAMP","STATUS","TEMPORARY","GENERATED","SECONDARY") VALUES (NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,N
ULL,NULL,NULL,NULL,NULL)
6tn47a220d34q 408 408 0 0 0
INSERT /*+APPEND+*/ /*+PARALLEL("TCACHE",1)+*/ INTO RELATIONAL("MACLEAN"."TCACHE" ) ("OWNER", "OBJECT_NAME", "SUBOBJECT_NAME", "OBJECT_ID",
"DATA_OBJECT_ID", "OBJECT_TYPE", "CREATED", "LAST_DDL_TIME", "TIMESTAMP", "STATUS", "TEMPORARY", "GENERATED", "SECONDARY") SELECT "OWNE
R", "OBJECT_NAME", "SUBOBJECT_NAME", "OBJECT_ID", "DATA_OBJECT_ID", "OBJECT_TYPE", "CREATED", "LAST_DDL_TIME", "TIMESTAMP", "STATUS", "TEMPO
RARY", "GENERATED", "SECONDARY" FROM "ET$087B00060001" KU$
66q4w8c4ak0wf 31576 31576 0 608 0
/* 当导入job完成后再次查询 */
select sql_text,
sql_id,
cpu_time,
elapsed_time,
disk_reads,
buffer_gets,
rows_processed
from v$sql
where sql_text like '%TCACHE%'
and sql_text not like '%like%';
SQL_TEXT
--------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID CPU_TIME ELAPSED_TIME DISK_READS BUFFER_GETS ROWS_PROCESSED
------------- ---------- ------------ ---------- ----------- --------------
INSERT /*+ SYS_DL_CURSOR */ INTO RELATIONAL("MACLEAN"."TCACHE") ("OWNER","OBJECT_NAME","SUBOBJECT_NAME","OBJECT_ID","DATA_OBJECT_ID","OBJECT
_TYPE","CREATED","LAST_DDL_TIME","TIMESTAMP","STATUS","TEMPORARY","GENERATED","SECONDARY") VALUES (NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,N
ULL,NULL,NULL,NULL,NULL)
6tn47a220d34q 408 408 0 0 0
/* 执行完成后一句SQL立马消失了!? Oracle好像很不情愿我们看到外部表路径使用的INSERT语句 :) */
INSERT /*+APPEND+*/ /*+PARALLEL("TCACHE",1)+*/
INTO RELATIONAL
("MACLEAN"."TCACHE")
("OWNER",
"OBJECT_NAME",
"SUBOBJECT_NAME",
"OBJECT_ID",
"DATA_OBJECT_ID",
"OBJECT_TYPE",
"CREATED",
"LAST_DDL_TIME",
"TIMESTAMP",
"STATUS",
"TEMPORARY",
"GENERATED",
"SECONDARY")
SELECT "OWNER",
"OBJECT_NAME",
"SUBOBJECT_NAME",
"OBJECT_ID",
"DATA_OBJECT_ID",
"OBJECT_TYPE",
"CREATED",
"LAST_DDL_TIME",
"TIMESTAMP",
"STATUS",
"TEMPORARY",
"GENERATED",
"SECONDARY"
FROM "ET$087F00230001" KU$
/* 这里的KU$代表的就是以ORACLE_DATAPUMP转储文件形式存放在数据库外的外部表 */
/* 可以看到这里尝试使用了APPEND插入模式,以减少undo和redo的产生*/
SQL> truncate table tcache;
Table truncated.
/* 若启用并行导入呢? */
[maclean@rh2 admin]$ impdp maclean/maclean directory=dumps tables=TCACHE content=data_only dumpfile=tcache2.dmp,tcache3.dmp,tcache4.dmp access_method=external_table parallel=3
........
select sql_text,
sql_id,
cpu_time,
elapsed_time,
disk_reads,
buffer_gets,
rows_processed
from v$sql
where sql_text like '%TCACHE%'
and sql_text not like '%like%';
SQL_TEXT
--------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID CPU_TIME ELAPSED_TIME DISK_READS BUFFER_GETS ROWS_PROCESSED
------------- ---------- ------------ ---------- ----------- --------------
INSERT /*+APPEND+*/ /*+PARALLEL("TCACHE",1)+*/ INTO RELATIONAL("MACLEAN"."TCACHE" ) ("OWNER", "OBJECT_NAME", "SUBOBJECT_NAME", "OBJECT_ID",
"DATA_OBJECT_ID", "OBJECT_TYPE", "CREATED", "LAST_DDL_TIME", "TIMESTAMP", "STATUS", "TEMPORARY", "GENERATED", "SECONDARY") SELECT "OWNE
R", "OBJECT_NAME", "SUBOBJECT_NAME", "OBJECT_ID", "DATA_OBJECT_ID", "OBJECT_TYPE", "CREATED", "LAST_DDL_TIME", "TIMESTAMP", "STATUS", "TEMPO
RARY", "GENERATED", "SECONDARY" FROM "ET$087A00040001" KU$
2mvs15623ssvy 5910289 5910289 0 73578 0
INSERT /*+ SYS_DL_CURSOR */ INTO RELATIONAL("MACLEAN"."TCACHE") ("OWNER","OBJECT_NAME","SUBOBJECT_NAME","OBJECT_ID","DATA_OBJECT_ID","OBJECT
_TYPE","CREATED","LAST_DDL_TIME","TIMESTAMP","STATUS","TEMPORARY","GENERATED","SECONDARY") VALUES (NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,N
ULL,NULL,NULL,NULL,NULL)
6tn47a220d34q 1167 1167 0 0 0
/* 似乎因为沿用了APPEND插入模式,插入语句的并行度并未上升 */
/* 因为Oracle不想同我们分享它的小秘密,所以awr和ash报告中都不记录外部表路径插入SQL的相关信息 */
/* 好吧,它有权保持沉默! */
That's great!