A case I suffered recently.
Normally, expdp is more efficient than triditional exp, so many users tend to use expdp. yet, there are some limitions on this tool, or performance issue. since it is a new tool, it can be perfect. below is the case, I met recently.
[@more@]
Environment:
OS: AIX 5.3.0.0, Num of Physical CPU: 8, Physical Mem 20GB
DB: 10.2.0.3
This case happened last Saturday during the process to deploy an application release. As the plan scheduled there was a step to export the whole database before script moving, the database is not big (100G around), we estimated the time to do expdp was 1 hour or so. From the experience, for a 200GB database, it take less than 1 hour with parallel=7, so we thought 1 hour was enough to do the exporting with expdp.
The exporting job was issued by an US DBA, and he began to do the backup at 7:00AM with option parallel=4, from the machine configuration, we can see there is no problem to set parallel=4.
As far as I know, there are some guidelines to set parallel:
1, Set the degree of parallelism to two times the number of CPUs, then tune from there.
2, For Data Pump Export, the PARALLEL parameter value should be less than or equal to the number of dump files.
3, For Data Pump Import, the PARALLEL parameter value should not be much larger than the number of files in the dump file set.
4, A PARALLEL greater than one is only available in Enterprise Edition
With above items, we can see there is no problem to set parallel=4.
When I got to the office, it was already 9:00AM, and the exporting was still running, from the expdp log, it showed that the expdp was still doing Metadata exporting. Parallel cannot work with multiple processes to do metadata exporting, this may explain that it take long time to unload metadata data, but the size of the metadata should be very little, so the guess was not right. Then checked the expdp command used, we could not see any problem. Then went to check status with “expdp attach=??”, the output like:
Worker 1 Status:
State: EXECUTING
Object Schema: Usera
Object Name: Tablea
Object Type: DATABASE_EXPORT/SCHEMA/TABLE/TABLE_DATA
Completed Objects: 1
Total Objects: 436
Completed Rows:55,801
Worker Parallelism: 1
Worker 2 Status:
State: WORK WAITING
It displayed that there was only one data pumper worker, and was populating data from one table Usera.tablea, relatively, the table is a little big, 45GB around. From the status also showed that it already exported 55,000 rows, there are totally 110,000 records, I thought it maybe took a little more time, but it should not took 2 hours. So we made sure it was abnormal.
Checking the database workload, the workload was very light. At this time, we had no idea what was wrong with the expdp. And we tried another solution, to exporting the database without parallel option; it took 1 hour to do it. So with this information, we believe the problem is parallel option.
Due to the tight schedule, no much time to dig into the issue, continued to deploy scripts, from the above observation, the non-parallel expdp worked well, and the time was 1 hour.
Today, I did the test with parallel=8 and parallel=16, there are no any improvements, then go to check the table Usera.tablea:
SQL> desc Usera.tablea
Name Null? Type
--------------------------- -------- -----------
ID NOT NULL NUMBER(18)
DATA LONG RAW
The table structure is very simple, the last column data type gives me some hints, maybe, expdp has some problem to handle LONG RAW
And find a Metalink doc: 813396.1, the solution is to point access path:
Expdp access_method=direct_path parallel=8
From the test, when I point access_method=direct_path, the exporting did work fast, the whole process took 45 minutes.
But from Metalink doc: 552424.1, in this article, it listed 4 methods of loading/unloading data:
1, direct path mode
2, external table mode
3, data file copying mode
4, Network link importing mode
And it also lists some situation when direct path is used:
EXPDP will use DIRECT_PATH mode if:
2.1. The structure of a table allows a Direct Path unload, i.e.:
- The table does not have fine-grained access control enabled for SELECT.
- The table is not a queue table.
- The table does not contain one or more columns of type BFILE or opaque, or an object type containing opaque columns.
- The table does not contain encrypted columns.
- The table does not contain a column of an evolved type that needs upgrading.
- If the table has a column of datatype LONG or LONG RAW, then this column is the last column.
2.2. The parameters QUERY, SAMPLE, or REMAP_DATA parameter were not used for the specified table in the Export Data Pump job.
2.3. The table or partition is relatively small (up to 250 Mb), or the table or partition is larger, but the job cannot run in parallel because the parameter PARALLEL was not specified (or was set to 1).
From the test, the red item has some conflicts with our case. I think it only works with non-parallel option by default. With parallel option, we should force expdp to unload data with access_method=direct_path explicitly.
And another interesting issue appears, how about if the LONG RAW type in the middle of columns? From the test, when you point direct_path, you will suffer errors:
ORA-31696: unable to export/import TABLE_DATA:"USERA"."TABLEB"GR using client specified DIRECT_PATH method
For this issue, we need to use external_table method:
EXPDP will use EXTERNAL_TABLE mode if:
3.1. Data cannot be unloaded in Direct Path mode, because of the structure of the table, i.e.:
- Fine-grained access control for SELECT is enabled for the table.
- The table is a queue table.
- The table contains one or more columns of type BFILE or opaque, or an object type containing opaque columns.
- The table contains encrypted columns.
- The table contains a column of an evolved type that needs upgrading. - The table contains a column of type LONG or LONG RAW that is not last.
3.2. Data could also have been unloaded in "Direct Path" mode, but the parameters QUERY, SAMPLE, or REMAP_DATA were used for the specified table in the Export Data Pump job.
3.3. Data could also have been unloaded in "Direct Path" mode, but the table or partition is relatively large (> 250 Mb) and parallel SQL can be used to speed up the unload even more.
Summary:
When expdp LONG/LONG RAW data type, if the column is the last column of the table, we should point access_method=direct_path when use parallel option, otherwise, with non-parallel option, let expdp chooses load/unload method by itself.
References:
Metalink doc: 813396.1 and 552424.1
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/45188/viewspace-1022263/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/45188/viewspace-1022263/