转载备忘,地址:http://www.dba-oracle.com/t_optimize_sql_loader_sqlldr_performance.htm
以下为正文和benchmark数据
1.Use Direct Path Loads - The conventional path loader essentially loads the data by using standard insert statements. The direct path loader (direct=true) loads directly into the Oracle data files and creates blocks in Oracle database block format. The fact that SQL is not being issued makes the entire process much less taxing on the database. There are certain cases, however, in which direct path loads cannot be used (clustered tables). To prepare the database for direct path loads, the script $ORACLE_HOME/rdbms/admin/catldr.sql.sql must be executed.
2.Disable Indexes and Constraints. For conventional data loads only, the disabling of indexes and constraints can greatly enhance the performance of SQL*Loader.
3.Use a Larger Bind Array. For conventional data loads only, larger bind arrays limit the number of calls to the database and increase performance. The size of the bind array is specified using the bindsizeparameter. The bind array's size is equivalent to the number of rows it contains (rows=) times the maximum length of each row. Also see the columnarrayrows and streamsize parameters.
4.Use ROWS=n . For conventional data loads only, rows specifies the number of rows per commit and is related to bindsize. Issuing fewer commits will enhance performance, and the larger rows parameter affects performance (see benchmark below).
5.Use Parallel Loads. Available with direct path data loads only, this option allows multiple SQL*Loader jobs to execute concurrently.
$ sqlldr control=first.ctl parallel=true direct=true
$ sqlldr control=second.ctl parallel=true direct=true
6.Use Fixed Width Data. Fixed width data format saves Oracle some processing when parsing the data. The savings can be tremendous, depending on the type of data and number of rows.
7.Disable Archiving During Load. While this may not be feasible in certain environments, disabling database archiving can increase performance considerably.
8.Use unrecoverable. The unrecoverable option (unrecoverable load data) disables the writing of the data to the redo logs. This option is available for direct path loads only.
Benchmark data
From the book "Advanced Oracle Utilities" we see a valid benchmark of SQL*Loader performance.
"Using the table table_with_one_million_rows, the following benchmark tests were performed with the various SQL*Loader options. The table was truncated after each test.
SQL*Loader Option |
Elapsed Time (Seconds) |
Time Reduction |
direct=falserows=64 | 135 | - |
direct=falsebindsize=512000rows=10000 | 92 | 32% |
direct=falsebindsize=512000rows=10000DB in noarchivelog mode | 85 | 37% |
direct=true | 47 | 65% |
direct=trueunrecoverable | 41 | 70% |
direct=trueunrecoverablefixed width data | 41 | 70% |
SQL*Loader test results indicate conventional path loads take longest.
The results above indicate that conventional path loads take the longest. However, the bindsize and rows parameters can aid the performance under these loads. The test involving the conventional load didn’t come close to the performance of the direct path load with the unrecoverable option specified.
It is also worth noting that the fastest import time achieved for this table (earlier) was 67 seconds, compared to 41 for SQL*Loader direct path �C a 39% reduction in execution time. This proves that SQL*Loader can load the same data faster than import.
These tests did not compensate for indexes. All database load operations will execute faster when indexes are disabled.
Another SQL*Loader benchmark test
This benchmark byWarren Koch tests a SQL*Loader (sqlldr) import of 2m rows, using direct path with index skip set for the baseline. Here's results for differing values for column array and stream size in SQL*Loader timings.
CONTROL FILE SNIPPET:
OPTIONS (DIRECT=TRUE, SKIP=TRUE, ERRORS=50, rows=500000, COLUMNARRAYROWS=xx, STREAMSIZE=yy)
UNRECOVERABLE LOAD DATA
TRUNCATE
into table F15_ADPLS_NEXTASSY
fields terminated by X'9' optionally enclosed by X'1F'
TRAILING NULLCOLS {…}
This is for a load of 1,964,601 rows (about 2 million) from a delimited file. I know I could achieve much higher speeds going to a fixed width format but my data source precludes that.
Report headings:
columns = COLUMNARRAYROWS Parameter Setting (xx)
Stream = STREAMSIZE parameter setting (yy)
CPU = CPU time in minutes (from sqlldr log)
Elapsed = Elapsed time in minutes (from sqlldr log)
Main = Total stream buffers loaded by SQL*Loader main thread (from sqlldr log)
Load = Total stream buffers loaded by SQL*Loader load thread (from sqlldr log)