Updating Large Tables in Parallel
TheDBMS_PARALLEL_EXECUTEpackage enables you to incrementally update the data in a large table in parallel, in twohigh-level steps:
(1)Group sets of rows in the table into smaller chunks.
(2)Apply the desired UPDATE statement to the chunks in parallel,committing each time you have finished processing a chunk.
--dbms_parallel_execute 包使用并行的2个步骤,一是将大表分成多个小的chunks。二对这些小的chunks 进行并行。
Thistechnique is recommended whenever you are updating a lot of data. Its advantages are:
(1)You lock only one set of rows at a time, for a relatively shorttime, instead of locking the entire table.
(2)You do not lose work that has been done if something fails beforethe entire operation finishes.
(3)You reduce rollback space consumption.
(4)You improve performance.
See Also:
OracleDatabase PL/SQL Packages and Types Reference for more information about theDBMS_PARALLEL_EXECUTE package
http://download.oracle.com/docs/cd/E11882_01/appdev.112/e16760/d_parallel_ex.htm#ARPLS233
-- 这个链接上有这个包的详细使用说明。
并行在一定程度上能够提高SQL 的性能, 在我的blog里对parallelexecution 这块有说明:
Oracle Parallel Execution(并行执行)
http://blog.csdn.net/xujinyang/article/details/6832630
提到这篇文章,是关注一个问题:
Oracle对Delete,update,merge的操作限制在,只有操作的对象是分区表示,Oracle 才会启动并行操作。原因在于,对于分区表,Oracle 会对每个分区启用一个并行服务进程同时进行数据处理,这对于非分区表来说是没有意义的。
如果我们要对一张大表进行update,而且该表又不是分区表,这时就可以使用我们的dbms_parallel_execute包来进行并行操作。
dbms_parallel_execute包是把大表分成了多个小的chunks,然后对chunks进行并行,这个就类似把非分区表变成了分区表。
注意,该包是Oracle 11g 以后才有的。
以下内容转自:
http://www.oracle-base.com/articles/11g/dbms_parallel_execute_11gR2.php
SQL> conn / as sysdba;
Connected.
SQL> grant create job to icd;
Grant succeeded.
SQL> conn icd/icd;
Connected.
SQL> CREATE TABLE test_tab (
2 id NUMBER,
3 description VARCHAR2(50),
4 num_col NUMBER,
5 CONSTRAINT test_tab_pk PRIMARY KEY (id)
6 );
Table created.
SQL> INSERT /*+ APPEND */ INTO test_tab
2 SELECT level,
3 'Description for ' || level,
4 CASE
5 WHEN MOD(level, 5) = 0 THEN 10
6 WHEN MOD(level, 3) = 0 THEN 20
7 ELSE 30
8 END
9 FROM dual
10 CONNECT BY level <= 500000;
500000 rows created.
SQL> commit;
Commit complete.
SQL> EXEC DBMS_STATS.gather_table_stats(USER, 'TEST_TAB', cascade => TRUE);
PL/SQL procedure successfully completed.
SQL> SELECT num_col, COUNT(*)
2 FROM test_tab
3 GROUP BY num_col
4 ORDER BY num_col;
NUM_COL COUNT(*)
---------- ----------
10 100000
20 133333
30 266667
TheCREATE_TASK procedure is used to create a new task. It requires a task name tobe specified, but can also include an optional task comment.
SQL> BEGIN
2 DBMS_PARALLEL_EXECUTE.create_task (task_name => 'test_task');
3 END;
4 /
PL/SQL procedure successfully completed.
Informationabout existing tasks is displayed using the [DBA|USER]_PARALLEL_EXECUTE_TASKSviews.
SQL> COLUMN task_name FORMAT A10
SQL> SELECT task_name,
2 status
3 FROM user_parallel_execute_tasks;
TASK_NAME STATUS
---------- -------------------
test_task CREATED
The GENERATE_TASK_NAME function returns a unique task name ifyou do not want to name the task manually.
SQL> SELECTDBMS_PARALLEL_EXECUTE.generate_task_name FROM dual;
GENERATE_TASK_NAME
-----------------------------------------------------
TASK$_1
将一张大表split 成多个chunks 有三种方法。
(1)CREATE_CHUNKS_BY_ROWID
(2)CREATE_CHUNKS_BY_NUMBER_COL
(3)CREATE_CHUNKS_BY_SQL
分配好的chunks 可以用drop_chunks 来删除。
TheCREATE_CHUNKS_BY_ROWID procedure splits the data by rowid into chunks specifiedby the CHUNK_SIZE parameter. If the BY_ROW parameter isset to TRUE, the CHUNK_SIZE refers to the number of rows, otherwise it refersto the number of blocks.
SQL> BEGIN
2dbms_parallel_execute.create_chunks_by_rowid(task_name => 'test_task',
3 table_owner => 'icd',
4 table_name => 'test_tab',
5 by_row => true,
6 chunk_size => 10000);
7 end;
8 /
PL/SQL procedure successfully completed.
一旦chunks创建完毕,task 的状态就变成了'chunked'.
SQL> COLUMN task_name FORMAT A10
SQL> SELECT task_name,
2 status
3 FROM user_parallel_execute_tasks;
TASK_NAME STATUS
---------- -------------------
test_task CHUNKED
The [DBA|USER]_PARALLEL_EXECUTE_CHUNKS views displayinformation about the individual chunks.
SQL> SELECT chunk_id, status,start_rowid, end_rowid
2 FROM user_parallel_execute_chunks
3 WHERE task_name = 'test_task'
4 ORDER BY chunk_id;
CHUNK_ID STATUS START_ROWID END_ROWID
---------- -------------------------------------- ------------------
2 UNASSIGNED AAATMCAAMAABSMIAAA AAATMCAAMAABSMPCcP
3 UNASSIGNED AAATMCAAMAABSMgAAA AAATMCAAMAABSMnCcP
4 UNASSIGNED AAATMCAAMAABSMoAAAAAATMCAAMAABSMvCcP
...
73 UNASSIGNED AAATMCAAMAABS0yAAA AAATMCAAMAABS1jCcP
74 UNASSIGNED AAATMCAAMAABS1kAAA AAATMCAAMAABS1/CcP
73 rows selected.
删除chunks
SQL> begin
2 dbms_parallel_execute.drop_chunks('test_task');
3 end;
4 /
PL/SQL procedure successfully completed.
再次查看chunk状态,又变成了created.
SQL> SELECT task_name,
2 status
3 FROM user_parallel_execute_tasks;
TASK_NAME STATUS
---------- -------------------
test_task CREATED
TheCREATE_CHUNKS_BY_NUMBER_COL procedure divides the workload up based on a number column. It uses the specifiedcolumns min and max values along with the chunk size to split the data intoapproximately equal chunks. For the chunks to be equally sized the column mustcontain a continuous sequence of numbers, like that generated by a sequence.
BEGIN
dbms_parallel_execute.create_chunks_by_number_col(task_name => 'test_task',
table_owner => 'ICD',
table_name => 'TEST_TAB',
table_column => 'ID',
chunk_size => 10000);
END;
/
The [DBA|USER]_PARALLEL_EXECUTE_CHUNKSviews display information about the individual chunks.
SQL> SELECT chunk_id, status, start_id,end_id
2 FROM user_parallel_execute_chunks
3 WHERE task_name = 'test_task'
4 ORDER BY chunk_id;
CHUNK_ID STATUS START_ID END_ID
---------- -------------------- --------------------
75 UNASSIGNED 1 10000
76 UNASSIGNED 10001 20000
77 UNASSIGNED 20001 30000
78 UNASSIGNED 30001 40000
......
122 UNASSIGNED 470001 480000
123 UNASSIGNED 480001 490000
124 UNASSIGNED 490001 500000
50 rows selected.
TheCREATE_CHUNKS_BY_SQL procedure divides the workload based on a user-definedquery. If the BY_ROWID parameter is set to TRUE, the query must return a seriesof start and end rowids. If it's set to FALSE, the query must return a seriesof start and end IDs.
把之前创建的chunks drop 掉
SQL> exec dbms_parallel_execute.drop_chunks('test_task');
PL/SQL procedure successfully completed.
DECLARE
l_stmt CLOB;
BEGIN
l_stmt:= 'SELECT DISTINCT num_col, num_col FROM test_tab';
DBMS_PARALLEL_EXECUTE.create_chunks_by_sql(task_name => 'test_task',
sql_stmt => l_stmt,
by_rowid => FALSE);
END;
/
The[DBA|USER]_PARALLEL_EXECUTE_CHUNKS views display information about theindividual chunks.
SQL> SELECT chunk_id, status, start_id,end_id
2 FROM user_parallel_execute_chunks
3 WHERE task_name = 'test_task'
4 ORDER BY chunk_id;
CHUNK_ID STATUS START_ID END_ID
---------- -------------------- --------------------
141 UNASSIGNED 10 10
142 UNASSIGNED 30 30
143 UNASSIGNED 20 20
Runninga task involves running a specific statement for each defined chunk of work.The documentation only shows examples using updates of the base table, but thisis not the only use of this functionality. The statement associated with thetask can be a procedure call, as shown in one of the examples at the end of thearticle.
There are two ways to run a taskand several procedures to control a running task.
TheRUN_TASK procedure runs the specified statement inparallel by scheduling jobs to process the workload chunks. Thestatement specifying the actual work to be done mustinclude a reference to the ':start_id' and ':end_id', which represent arange of rowids or column IDs to be processed, as specified in the chunkdefinitions. The degree of parallelism is controlled by the number of scheduledjobs, not the number of chunks defined. The scheduled jobs take an unassignedworkload chunk, process it, then move on to the next unassigned chunk.
DECLARE
l_sql_stmtVARCHAR2(32767);
BEGIN
l_sql_stmt:= 'UPDATE /*+ ROWID (dda) */ test_tab t
SET t.num_col = t.num_col + 10
WHERE rowid BETWEEN :start_idAND :end_id';
DBMS_PARALLEL_EXECUTE.run_task(task_name => 'test_task',
sql_stmt => l_sql_stmt,
language_flag =>DBMS_SQL.NATIVE,
parallel_level => 10);
END;
/
TheRUN_TASK procedure waits for the task to complete. On completion, the status ofthe task must be assessed to know what action to take next.
TheDBMS_PARALLEL_EXECUTE package allows you to manually code the task run. The GET_ROWID_CHUNK and GET_NUMBER_COL_CHUNK proceduresreturn the next available unassigned chunk. You can than manuallyprocess the chunk and set its status. The example below shows the processing ofa workload chunked by rowid.
DECLARE
l_sql_stmt VARCHAR2(32767);
l_chunk_id NUMBER;
l_start_rowid ROWID;
l_end_rowid ROWID;
l_any_rows BOOLEAN;
BEGIN
l_sql_stmt := 'UPDATE /*+ ROWID (dda) */ test_tab t
SET t.num_col = t.num_col + 10
WHERE rowid BETWEEN :start_idAND :end_id';
LOOP
-- Get next unassigned chunk.
DBMS_PARALLEL_EXECUTE.get_rowid_chunk(task_name => 'test_task',
chunk_id => l_chunk_id,
start_rowid=> l_start_rowid,
end_rowid => l_end_rowid,
any_rows => l_any_rows);
EXIT WHEN l_any_rows = FALSE;
BEGIN
-- Manually execute the work.
EXECUTE IMMEDIATE l_sql_stmt USING l_start_rowid, l_end_rowid;
-- Set the chunk status as processed.
DBMS_PARALLEL_EXECUTE.set_chunk_status(task_name => 'test_task',
chunk_id => l_chunk_id,
status =>DBMS_PARALLEL_EXECUTE.PROCESSED);
EXCEPTION
WHEN OTHERS THEN
-- Record chunk error.
DBMS_PARALLEL_EXECUTE.set_chunk_status(task_name => 'test_task',
chunk_id => l_chunk_id,
status =>DBMS_PARALLEL_EXECUTE.PROCESSED_WITH_ERROR,
err_num => SQLCODE,
err_msg => SQLERRM);
END;
-- Commit work.
COMMIT;
ENDLOOP;
END;
/
A running task can be stopped and restarted using the STOP_TASKand RESUME_TASK procedures respectively.
The PURGE_PROCESSED_CHUNKSprocedure deletes all chunks with a status of 'PROCESSED' or'PROCESSED_WITH_ERROR'.
The ADM_DROP_CHUNKS, ADM_DROP_TASK,ADM_TASK_STATUS and ADM_STOP_TASK routines have the same function as theirnamesakes, but they allow the operations to performed on tasks owned by otherusers. In order to use these routines the user must have been granted the ADM_PARALLEL_EXECUTE_TASKrole.
Thesimplest way to check the status of a task is to use the TASK_STATUS function. After execution of the task, the only possible return valuesare the 'FINISHED' or 'FINISHED_WITH_ERROR' constants. If the status isnot 'FINISHED', then the task can be resumed using the RESUME_TASK procedure.
DECLARE
l_try NUMBER;
l_status NUMBER;
BEGIN
--If there is error, RESUME it for at most 2 times.
l_try := 0;
l_status := DBMS_PARALLEL_EXECUTE.task_status('test_task');
WHILE(l_try < 2 and l_status != DBMS_PARALLEL_EXECUTE.FINISHED)
Loop
l_try := l_try + 1;
DBMS_PARALLEL_EXECUTE.resume_task('test_task');
l_status := DBMS_PARALLEL_EXECUTE.task_status('test_task');
ENDLOOP;
END;
/
The status of the taskand the chunks can also be queried.
COLUMN task_name FORMAT A10
SELECT task_name,
status
FROM user_parallel_execute_tasks;
TASK_NAME STATUS
---------- -------------------
test_task FINISHED
If there were errors, thechunks can be queried to identify the problems.
SELECT status, COUNT(*)
FROM user_parallel_execute_chunks
GROUP BY status
ORDER BY status;
STATUS COUNT(*)
-------------------- ----------
PROCESSED_WITH_ERROR 3
The[DBA|USER]_PARALLEL_EXECUTE_TASKS views contain a record of the JOB_PREFIX usedwhen scheduling the chunks of work.
SELECT job_prefix
FROM user_parallel_execute_tasks
WHERE task_name = 'test_task';
JOB_PREFIX
------------------------------
TASK$_368
Thisvalue can be used to query information about the individual jobs used duringthe process. The number of jobs scheduled should match the degree ofparallelism specified in the RUN_TASK procedure.
COLUMN job_name FORMAT A20
SELECT job_name, status
FROM user_scheduler_job_run_details
WHERE job_name LIKE (SELECT job_prefix || '%'
FROM user_parallel_execute_tasks
WHERE task_name = 'test_task');
JOB_NAME STATUS
--------------------------------------------------
TASK$_205_3 SUCCEEDED
TASK$_205_9 SUCCEEDED
TASK$_205_5 SUCCEEDED
TASK$_205_7 SUCCEEDED
TASK$_205_1 SUCCEEDED
TASK$_205_2 SUCCEEDED
TASK$_205_6 SUCCEEDED
TASK$_205_8 SUCCEEDED
TASK$_205_4 SUCCEEDED
TASK$_205_10 SUCCEEDED
Oncethe job is complete you can drop the task, which will drop the associated chunkinformation also.
BEGIN
DBMS_PARALLEL_EXECUTE.drop_task('test_task');
END;
/
The following example shows the processingof a workload chunked by rowid.
DECLARE
l_task VARCHAR2(30) :='test_task';
l_sql_stmt VARCHAR2(32767);
l_try NUMBER;
l_status NUMBER;
BEGIN
DBMS_PARALLEL_EXECUTE.create_task (task_name => l_task);
DBMS_PARALLEL_EXECUTE.create_chunks_by_rowid(task_name => l_task,
table_owner => 'TEST',
table_name => 'TEST_TAB',
by_row => TRUE,
chunk_size => 10000);
l_sql_stmt := 'UPDATE /*+ ROWID (dda) */ test_tab t
SET t.num_col = t.num_col + 10
WHERE rowid BETWEEN :start_idAND :end_id';
DBMS_PARALLEL_EXECUTE.run_task(task_name => l_task,
sql_stmt => l_sql_stmt,
language_flag =>DBMS_SQL.NATIVE,
parallel_level => 10);
--If there is error, RESUME it for at most 2 times.
l_try := 0;
l_status := DBMS_PARALLEL_EXECUTE.task_status(l_task);
WHILE(l_try < 2 and l_status != DBMS_PARALLEL_EXECUTE.FINISHED)
Loop
l_try := l_try + 1;
DBMS_PARALLEL_EXECUTE.resume_task(l_task);
l_status := DBMS_PARALLEL_EXECUTE.task_status(l_task);
ENDLOOP;
DBMS_PARALLEL_EXECUTE.drop_task(l_task);
END;
/
Thefollowing example shows the processing of a workload chunked by a numbercolumn. Notice that the workload is actually a stored procedure in this case.
CREATE OR REPLACE PROCEDURE process_update(p_start_id IN NUMBER, p_end_id IN NUMBER) AS
BEGIN
UPDATE /*+ ROWID (dda) */ test_tab t
SET t.num_col = t.num_col + 10
WHERE id BETWEEN p_start_id AND p_end_id;
END;
/
DECLARE
l_task VARCHAR2(30) :='test_task';
l_sql_stmt VARCHAR2(32767);
l_try NUMBER;
l_status NUMBER;
BEGIN
DBMS_PARALLEL_EXECUTE.create_task (task_name => l_task);
DBMS_PARALLEL_EXECUTE.create_chunks_by_number_col(task_name => l_task,
table_owner => 'TEST',
table_name => 'TEST_TAB',
table_column => 'ID',
chunk_size => 10000);
l_sql_stmt := 'BEGIN process_update(:start_id, :end_id); END;';
DBMS_PARALLEL_EXECUTE.run_task(task_name => l_task,
sql_stmt => l_sql_stmt,
language_flag =>DBMS_SQL.NATIVE,
parallel_level=> 10);
--If there is error, RESUME it for at most 2 times.
l_try := 0;
l_status := DBMS_PARALLEL_EXECUTE.task_status(l_task);
WHILE(l_try < 2 and l_status != DBMS_PARALLEL_EXECUTE.FINISHED)
Loop
l_try := l_try + 1;
DBMS_PARALLEL_EXECUTE.resume_task(l_task);
l_status := DBMS_PARALLEL_EXECUTE.task_status(l_task);
ENDLOOP;
DBMS_PARALLEL_EXECUTE.drop_task(l_task);
END;
/
Thefollowing example shows a workload chunked by an SQL statement and processed bya user-defined framework.
DECLARE
l_task VARCHAR2(30) :='test_task';
l_stmt CLOB;
l_sql_stmt VARCHAR2(32767);
l_chunk_id NUMBER;
l_start_id NUMBER;
l_end_id NUMBER;
l_any_rows BOOLEAN;
BEGIN
DBMS_PARALLEL_EXECUTE.create_task (task_name => l_task);
l_stmt := 'SELECT DISTINCT num_col, num_col FROM test_tab';
DBMS_PARALLEL_EXECUTE.create_chunks_by_sql(task_name => l_task,
sql_stmt => l_stmt,
by_rowid => FALSE);
l_sql_stmt := 'UPDATE /*+ ROWID (dda) */ test_tab t
SET t.num_col = t.num_col
WHERE num_col BETWEEN:start_id AND :end_id';
LOOP
-- Get next unassigned chunk.
DBMS_PARALLEL_EXECUTE.get_number_col_chunk(task_name => 'test_task',
chunk_id => l_chunk_id,
start_id => l_start_id,
end_id => l_end_id,
any_rows => l_any_rows);
EXIT WHEN l_any_rows = FALSE;
BEGIN
-- Manually execute the work.
EXECUTE IMMEDIATE l_sql_stmt USING l_start_id, l_end_id;
-- Set the chunk status as processed.
DBMS_PARALLEL_EXECUTE.set_chunk_status(task_name => 'test_task',
chunk_id => l_chunk_id,
status =>DBMS_PARALLEL_EXECUTE.PROCESSED);
EXCEPTION
WHEN OTHERS THEN
-- Record chunk error.
DBMS_PARALLEL_EXECUTE.set_chunk_status(task_name => 'test_task',
chunk_id => l_chunk_id,
status =>DBMS_PARALLEL_EXECUTE.PROCESSED_WITH_ERROR,
err_num => SQLCODE,
err_msg => SQLERRM);
END;
-- Commit work.
COMMIT;
ENDLOOP;
DBMS_PARALLEL_EXECUTE.drop_task(l_task);
END;
/
-------------------------------------------------------------------------------------------------------