转载备忘,地址:http://www.dba-oracle.com/t_optimize_sql_loader_sqlldr_performance.htm

以下为正文和benchmark数据


1.Use Direct Path           Loads - The conventional path loader essentially loads the data by           using standard insert statements.  The direct path loader (direct=true)           loads directly into the Oracle data files and creates blocks in Oracle           database block format.  The fact that SQL is not being issued makes           the entire process much less taxing on the database.  There are           certain cases, however, in which direct path loads cannot be used           (clustered tables).  To prepare the database for direct path loads,           the script $ORACLE_HOME/rdbms/admin/catldr.sql.sql must be           executed.

2.Disable Indexes           and Constraints.  For conventional data loads only, the disabling           of indexes and constraints can greatly enhance the performance of           SQL*Loader.  

3.Use a Larger           Bind Array.  For conventional data loads only, larger bind arrays           limit the number of calls to the database and increase performance.            The size of the bind array is specified using the bindsizeparameter.  The           bind array's size is equivalent to the number of rows it contains (rows=)           times the maximum length of each row. Also see the columnarrayrows and streamsize parameters.

4.Use ROWS=n           .  For conventional data loads only,          rows specifies the number of rows per commit and is related to bindsize.  Issuing fewer           commits will enhance performance, and the larger rows parameter affects performance (see benchmark below).  

5.Use Parallel           Loads.  Available with direct path data loads only, this option           allows multiple SQL*Loader jobs to execute concurrently.

$ sqlldr           control=first.ctl  parallel=true direct=true

$ sqlldr           control=second.ctl parallel=true direct=true

6.Use Fixed Width           Data.  Fixed width data format saves Oracle some processing when           parsing the data.  The savings can be tremendous, depending on the           type of data and number of rows.  

7.Disable           Archiving During Load.  While this may not be feasible in certain           environments, disabling database archiving can increase performance           considerably.

8.Use          unrecoverable.  The           unrecoverable option (unrecoverable load data) disables the writing of           the data to the redo logs.  This option is available for direct path           loads only.


Benchmark data

From the book "Advanced Oracle Utilities" we see a valid benchmark of SQL*Loader performance.

"Using the table table_with_one_million_rows, the following benchmark tests were performed with the various SQL*Loader options. The table was truncated after each test.

SQL*Loader Option

Elapsed Time (Seconds)

Time Reduction

direct=false
rows=64
135
-
direct=false
bindsize=512000
rows=10000
92
32%
direct=false
bindsize=512000
rows=10000
DB in noarchivelog mode
85
37%
direct=true
47
65%
direct=true
unrecoverable
41
70%
direct=true
unrecoverable
fixed width data
41
70%

SQL*Loader test results indicate conventional path loads take longest.


The results above indicate that conventional path loads take the longest. However, the bindsize and rows parameters can aid the performance under these loads. The test involving the conventional load didn’t come close to the performance of the direct path load with the unrecoverable option specified.

It is also worth noting that the fastest import time achieved for this table (earlier) was 67 seconds, compared to 41 for SQL*Loader direct path – a 39% reduction in execution time. This proves that SQL*Loader can load the same data faster than import.

These tests did not compensate for indexes. All database load operations will execute faster when indexes are disabled.


Another SQL*Loader benchmark test

This benchmark byWarren Koch tests a SQL*Loader (sqlldr) import of 2m rows, using direct path with index skip set for the baseline. Here's results for differing values for column array and stream size in SQL*Loader timings.

CONTROL FILE SNIPPET:

OPTIONS (DIRECT=TRUE, SKIP=TRUE, ERRORS=50, rows=500000, COLUMNARRAYROWS=xx, STREAMSIZE=yy)
UNRECOVERABLE LOAD DATA

TRUNCATE
into table F15_ADPLS_NEXTASSY

fields terminated by X'9' optionally enclosed by X'1F'
TRAILING NULLCOLS {…}

This is for a load of 1,964,601 rows (about 2 million) from a delimited file. I know I could achieve much higher speeds going to a fixed width format but my data source precludes that.

Report headings:

columns = COLUMNARRAYROWS Parameter Setting (xx)

Stream = STREAMSIZE parameter setting (yy)

CPU = CPU time in minutes (from sqlldr log)

Elapsed = Elapsed time in minutes (from sqlldr log)

Main = Total stream buffers loaded by SQL*Loader main thread (from sqlldr log)

Load = Total stream buffers loaded by SQL*Loader load thread (from sqlldr log)



Columns   Stream Elap time  CPU     Main    Load
  100   256,000 04:12.0   02:52.3 19908      0
1,000   256,000 04:14.4   02:57.3 2219     218
5,000    256,000 04:17.3   03:05.9 515    1,350
50,000    256,000 04:19.1   03:09.9 515    1,350

5,000     500,000 04:15.9   03:02.8 515      512
5,000     512,000 04:14.6   03:02.7 515      512
5,000     128,000 04:11.0   03:05.5 515    2,908
5,000      64,000 04:10.0   03:06.7 515     5876