Oracle11g使用dbms_parallel_execute对大表进行并行update

原 Oracle 11g 使用 dbms_parallel_execute 对大表进行并行updatehttps://blog.csdn.net/tianlesoftware/article/details/6603010版权声明: https://blog.csdn.net/tianlesoftware/article/details/6603010
  一.  dbms_parallel_execute说明
Updating Large Tables in Parallel
       TheDBMS_PARALLEL_EXECUTEpackage enables you to incrementally update the data in a large table in parallel, in twohigh-level steps:
       (1)Group sets of rows in the table into smaller chunks.
       (2)Apply the desired UPDATE statement to the chunks in parallel,committing each time you have finished processing a chunk.
       --dbms_parallel_execute 包使用并行的2个步骤,一是将大表分成多个小的chunks。二对这些小的chunks 进行并行。
         Thistechnique is recommended whenever you are updating a lot of data. Its advantages are:
       (1)You lock only one set of rows at a time, for a relatively shorttime, instead of locking the entire table.
       (2)You do not lose work that has been done if something fails beforethe entire operation finishes.
       (3)You reduce rollback space consumption.
       (4)You improve performance.
  See Also:
       OracleDatabase PL/SQL Packages and Types Reference for more information about theDBMS_PARALLEL_EXECUTE package
       http://download.oracle.com/docs/cd/E11882_01/appdev.112/e16760/d_parallel_ex.htm#ARPLS233
       -- 这个链接上有这个包的详细使用说明。
         并行在一定程度上能够提高SQL 的性能, 在我的blog里对parallelexecution 这块有说明:
       Oracle Parallel Execution(并行执行)
       http://blog.csdn.net/tianlesoftware/article/details/5854583
  提到这篇文章,是关注一个问题:
       Oracle对Delete,update,merge的操作限制在,只有操作的对象是分区表示,Oracle 才会启动并行操作。原因在于,对于分区表,Oracle 会对每个分区启用一个并行服务进程同时进行数据处理,这对于非分区表来说是没有意义的。
         如果我们要对一张大表进行update,而且该表又不是分区表,这时就可以使用我们的dbms_parallel­_execute包来进行并行操作。
       dbms_parallel_execute包是把大表分成了多个小的chunks,然后对chunks进行并行,这个就类似把非分区表变成了分区表。
       注意,该包是Oracle 11g 以后才有的。
    二.  使用说明
以下内容转自:
       http://www.oracle-base.com/articles/11g/dbms_parallel_execute_11gR2.php
  2.1 操作需要createjob的权限,所以先赋权
SQL> conn / as sysdba;
Connected.
SQL> grant create job to icd;
Grant succeeded.
SQL> conn icd/icd;
Connected.
  2.2 创建相关的测试表并插入数据
SQL> CREATE TABLE test_tab (
  2    id          NUMBER,
  3    description VARCHAR2(50),
  4    num_col     NUMBER,
  5    CONSTRAINT test_tab_pk PRIMARY KEY (id)
  6  );
Table created.
  SQL> INSERT /*+ APPEND */ INTO test_tab
  2  SELECT level,
  3         'Description for ' || level,
  4         CASE
  5           WHEN MOD(level, 5) = 0 THEN 10
  6           WHEN MOD(level, 3) = 0 THEN 20
  7           ELSE 30
  8         END
  9  FROM   dual
 10  CONNECT BY level <= 500000;
500000 rows created.
SQL> commit;
Commit complete.
  2.3 收集统计信息
SQL> EXEC DBMS_STATS.gather_table_stats(USER, 'TEST_TAB', cascade => TRUE);
PL/SQL procedure successfully completed.
  SQL> SELECT num_col, COUNT(*)
  2      FROM   test_tab
  3      GROUP BY num_col
  4      ORDER BY num_col;
     NUM_COL   COUNT(*)
---------- ----------
        10     100000
        20     133333
        30     266667
  2.4  创建task
       TheCREATE_TASK procedure is used to create a new task. It requires a task name tobe specified, but can also include an optional task comment.
  SQL> BEGIN
 2   DBMS_PARALLEL_EXECUTE.create_task (task_name => 'test_task');
 3  END;
 4  /
PL/SQL procedure successfully completed.
         Informationabout existing tasks is displayed using the [DBA|USER]_PARALLEL_EXECUTE_TASKSviews.
  SQL> COLUMN task_name FORMAT A10
SQL> SELECT task_name,
 2         status
 3  FROM   user_parallel_execute_tasks;
  TASK_NAME STATUS
---------- -------------------
test_task CREATED
         The GENERATE_TASK_NAME function returns a unique task name ifyou do not want to name the task manually.
  SQL> SELECTDBMS_PARALLEL_EXECUTE.generate_task_name FROM  dual;
  GENERATE_TASK_NAME
-----------------------------------------------------
TASK$_1
    2.5 Split the workload into chunks
       将一张大表split 成多个chunks 有三种方法。
       (1)CREATE_CHUNKS_BY_ROWID
       (2)CREATE_CHUNKS_BY_NUMBER_COL
       (3)CREATE_CHUNKS_BY_SQL
              分配好的chunks 可以用drop_chunks 来删除。
  2.5.1 CREATE_CHUNKS_BY_ROWID
       TheCREATE_CHUNKS_BY_ROWID procedure splits the data by rowid into chunks specifiedby the CHUNK_SIZE parameter. If the BY_ROW parameter isset to TRUE, the CHUNK_SIZE refers to the number of rows, otherwise it refersto the number of blocks.
  SQL> BEGIN
  2dbms_parallel_execute.create_chunks_by_rowid(task_name   => 'test_task',
 3                                       table_owner => 'icd',
 4                                       table_name => 'test_tab',
 5                                       by_row      => true,
 6                                       chunk_size => 10000);
 7  end;
 8  /
PL/SQL procedure successfully completed.
  一旦chunks创建完毕,task 的状态就变成了'chunked'.
SQL> COLUMN task_name FORMAT A10
SQL> SELECT task_name,
 2         status
 3  FROM   user_parallel_execute_tasks;
  TASK_NAME STATUS
---------- -------------------
test_task CHUNKED
         The [DBA|USER]_PARALLEL_EXECUTE_CHUNKS views displayinformation about the individual chunks.
  SQL> SELECT chunk_id, status,start_rowid, end_rowid
 2  FROM   user_parallel_execute_chunks
 3  WHERE  task_name = 'test_task'
 4  ORDER BY chunk_id;
   CHUNK_ID STATUS               START_ROWID        END_ROWID
---------- -------------------------------------- ------------------
        2 UNASSIGNED          AAATMCAAMAABSMIAAA AAATMCAAMAABSMPCcP
        3 UNASSIGNED          AAATMCAAMAABSMgAAA AAATMCAAMAABSMnCcP
        4 UNASSIGNED           AAATMCAAMAABSMoAAAAAATMCAAMAABSMvCcP
...
       73 UNASSIGNED          AAATMCAAMAABS0yAAA AAATMCAAMAABS1jCcP
       74 UNASSIGNED          AAATMCAAMAABS1kAAA AAATMCAAMAABS1/CcP
  73 rows selected.
  删除chunks
SQL> begin
 2  dbms_parallel_execute.drop_chunks('test_task');
 3  end;
 4  /
PL/SQL procedure successfully completed.
  再次查看chunk状态,又变成了created.
SQL> SELECT task_name,
 2             status
 3     FROM   user_parallel_execute_tasks;
  TASK_NAME STATUS
---------- -------------------
test_task CREATED
  2.5.2  CREATE_CHUNKS_BY_NUMBER_COL
      TheCREATE_CHUNKS_BY_NUMBER_COL procedure divides the workload up based on a number column. It uses the specifiedcolumns min and max values along with the chunk size to split the data intoapproximately equal chunks. For the chunks to be equally sized the column mustcontain a continuous sequence of numbers, like that generated by a sequence.
  BEGIN
dbms_parallel_execute.create_chunks_by_number_col(task_name    => 'test_task',
                                                                                           table_owner  => 'ICD',
                                                                                    table_name   => 'TEST_TAB',
                                         table_column => 'ID',
                                         chunk_size   => 10000);
END;
/
  The [DBA|USER]_PARALLEL_EXECUTE_CHUNKSviews display information about the individual chunks.
  SQL> SELECT chunk_id, status, start_id,end_id
 2  FROM   user_parallel_execute_chunks
 3  WHERE  task_name = 'test_task'
 4  ORDER BY chunk_id;
   CHUNK_ID STATUS                START_ID     END_ID
---------- -------------------- --------------------
       75 UNASSIGNED                   1      10000
       76 UNASSIGNED               10001      20000
       77 UNASSIGNED               20001      30000
       78 UNASSIGNED               30001      40000
       ......
      122 UNASSIGNED              470001     480000
      123 UNASSIGNED              480001     490000
      124 UNASSIGNED              490001     500000
  50 rows selected.
  2.5.3 CREATE_CHUNKS_BY_SQL
       TheCREATE_CHUNKS_BY_SQL procedure divides the workload based on a user-definedquery. If the BY_ROWID parameter is set to TRUE, the query must return a seriesof start and end rowids. If it's set to FALSE, the query must return a seriesof start and end IDs.
  把之前创建的chunks drop 掉
SQL> exec dbms_parallel_execute.drop_chunks('test_task');
PL/SQL procedure successfully completed.
  DECLARE
 l_stmt CLOB;
BEGIN
  l_stmt:= 'SELECT DISTINCT num_col, num_col FROM test_tab';
   DBMS_PARALLEL_EXECUTE.create_chunks_by_sql(task_name => 'test_task',
                                            sql_stmt  => l_stmt,
                                            by_rowid  => FALSE);
END;
/
         The[DBA|USER]_PARALLEL_EXECUTE_CHUNKS views display information about theindividual chunks.
  SQL> SELECT chunk_id, status, start_id,end_id
 2  FROM   user_parallel_execute_chunks
 3  WHERE  task_name = 'test_task'
 4  ORDER BY chunk_id;
   CHUNK_ID STATUS                START_ID     END_ID
---------- -------------------- --------------------
      141 UNASSIGNED                  10         10
      142 UNASSIGNED                  30         30
      143 UNASSIGNED                  20         20
  2.6 Run the task
       Runninga task involves running a specific statement for each defined chunk of work.The documentation only shows examples using updates of the base table, but thisis not the only use of this functionality. The statement associated with thetask can be a procedure call, as shown in one of the examples at the end of thearticle.

       There are two ways to run a taskand several procedures to control a running task.
  2.6.1 RUN_TASK
       TheRUN_TASK procedure runs the specified statement inparallel by scheduling jobs to process the workload chunks. Thestatement specifying the actual work to be done mustinclude a reference to the ':start_id' and ':end_id', which represent arange of rowids or column IDs to be processed, as specified in the chunkdefinitions. The degree of parallelism is controlled by the number of scheduledjobs, not the number of chunks defined. The scheduled jobs take an unassignedworkload chunk, process it, then move on to the next unassigned chunk.
  DECLARE
  l_sql_stmtVARCHAR2(32767);
BEGIN
  l_sql_stmt:= 'UPDATE /*+ ROWID (dda) */ test_tab t
                SET    t.num_col = t.num_col + 10
                WHERE rowid BETWEEN :start_idAND :end_id';
   DBMS_PARALLEL_EXECUTE.run_task(task_name      => 'test_task',
                                 sql_stmt       => l_sql_stmt,
                                language_flag  =>DBMS_SQL.NATIVE,
                                 parallel_level => 10);
END;
/
         TheRUN_TASK procedure waits for the task to complete. On completion, the status ofthe task must be assessed to know what action to take next.

2.6.2 User-defined framework
       TheDBMS_PARALLEL_EXECUTE package allows you to manually code the task run. The GET_ROWID_CHUNK and GET_NUMBER_COL_CHUNK proceduresreturn the next available unassigned chunk. You can than manuallyprocess the chunk and set its status. The example below shows the processing ofa workload chunked by rowid.
  DECLARE
 l_sql_stmt    VARCHAR2(32767);
 l_chunk_id    NUMBER;
 l_start_rowid ROWID;
 l_end_rowid   ROWID;
 l_any_rows    BOOLEAN;
BEGIN
 l_sql_stmt := 'UPDATE /*+ ROWID (dda) */ test_tab t
                 SET    t.num_col = t.num_col + 10
                 WHERE rowid BETWEEN :start_idAND :end_id';
   LOOP
   -- Get next unassigned chunk.
   DBMS_PARALLEL_EXECUTE.get_rowid_chunk(task_name   => 'test_task',
                                         chunk_id    => l_chunk_id,
                                         start_rowid=> l_start_rowid,
                                         end_rowid   => l_end_rowid,
                                         any_rows    => l_any_rows);
     EXIT WHEN l_any_rows = FALSE;
     BEGIN
      -- Manually execute the work.
     EXECUTE IMMEDIATE l_sql_stmt USING l_start_rowid, l_end_rowid;
       -- Set the chunk status as processed.
     DBMS_PARALLEL_EXECUTE.set_chunk_status(task_name => 'test_task',
                                             chunk_id => l_chunk_id,
                                            status    =>DBMS_PARALLEL_EXECUTE.PROCESSED);
     EXCEPTION
       WHEN OTHERS THEN
         -- Record chunk error.
         DBMS_PARALLEL_EXECUTE.set_chunk_status(task_name => 'test_task',
                                                chunk_id  => l_chunk_id,
                                                status    =>DBMS_PARALLEL_EXECUTE.PROCESSED_WITH_ERROR,
                                                err_num   => SQLCODE,
                                                err_msg   => SQLERRM);
   END;
     -- Commit work.
   COMMIT;
  ENDLOOP;
END;
/
  2.6.3 Task control
       A running task can be stopped and restarted using the STOP_TASKand RESUME_TASK procedures respectively.

       The PURGE_PROCESSED_CHUNKSprocedure deletes all chunks with a status of 'PROCESSED' or'PROCESSED_WITH_ERROR'.

       The ADM_DROP_CHUNKS, ADM_DROP_TASK,ADM_TASK_STATUS and ADM_STOP_TASK routines have the same function as theirnamesakes, but they allow the operations to performed on tasks owned by otherusers. In order to use these routines the user must have been granted the ADM_PARALLEL_EXECUTE_TASKrole.

  2.7 Check the task status
       Thesimplest way to check the status of a task is to use the TASK_STATUS function. After execution of the task, the only possible return valuesare the 'FINISHED' or 'FINISHED_WITH_ERROR' constants. If the status isnot 'FINISHED', then the task can be resumed using the RESUME_TASK procedure.
  DECLARE
 l_try NUMBER;
 l_status NUMBER;
BEGIN
  --If there is error, RESUME it for at most 2 times.
 l_try := 0;
 l_status := DBMS_PARALLEL_EXECUTE.task_status('test_task');
 WHILE(l_try < 2 and l_status != DBMS_PARALLEL_EXECUTE.FINISHED)
 Loop
   l_try := l_try + 1;
   DBMS_PARALLEL_EXECUTE.resume_task('test_task');
   l_status := DBMS_PARALLEL_EXECUTE.task_status('test_task');
  ENDLOOP;
END;
/
  The status of the taskand the chunks can also be queried.
COLUMN task_name FORMAT A10
SELECT task_name,
      status
FROM  user_parallel_execute_tasks;
  TASK_NAME STATUS
---------- -------------------
test_task FINISHED
  If there were errors, thechunks can be queried to identify the problems.
  SELECT status, COUNT(*)
FROM  user_parallel_execute_chunks
GROUP BY status
ORDER BY status;
  STATUS                 COUNT(*)
-------------------- ----------
PROCESSED_WITH_ERROR          3
         The[DBA|USER]_PARALLEL_EXECUTE_TASKS views contain a record of the JOB_PREFIX usedwhen scheduling the chunks of work.
  SELECT job_prefix
FROM  user_parallel_execute_tasks
WHERE task_name = 'test_task';
  JOB_PREFIX
------------------------------
TASK$_368
         Thisvalue can be used to query information about the individual jobs used duringthe process. The number of jobs scheduled should match the degree ofparallelism specified in the RUN_TASK procedure.
  COLUMN job_name FORMAT A20
  SELECT job_name, status
FROM  user_scheduler_job_run_details
WHERE job_name LIKE (SELECT job_prefix || '%'
                      FROM   user_parallel_execute_tasks
                      WHERE  task_name = 'test_task');
  JOB_NAME             STATUS
--------------------------------------------------
TASK$_205_3          SUCCEEDED
TASK$_205_9          SUCCEEDED
TASK$_205_5          SUCCEEDED
TASK$_205_7          SUCCEEDED
TASK$_205_1          SUCCEEDED
TASK$_205_2          SUCCEEDED
TASK$_205_6          SUCCEEDED
TASK$_205_8          SUCCEEDED
TASK$_205_4          SUCCEEDED
TASK$_205_10         SUCCEEDED
  2.8 Drop the task
       Oncethe job is complete you can drop the task, which will drop the associated chunkinformation also.
  BEGIN
 DBMS_PARALLEL_EXECUTE.drop_task('test_task');
END;
/
      三. 示例
3.1 Test 1
The following example shows the processingof a workload chunked by rowid.
  DECLARE
 l_task     VARCHAR2(30) :='test_task';
 l_sql_stmt VARCHAR2(32767);
 l_try      NUMBER;
 l_status   NUMBER;
BEGIN
 DBMS_PARALLEL_EXECUTE.create_task (task_name => l_task);
   DBMS_PARALLEL_EXECUTE.create_chunks_by_rowid(task_name   => l_task,
                                              table_owner => 'TEST',
                                              table_name  => 'TEST_TAB',
                                              by_row      => TRUE,
                                              chunk_size  => 10000);
   l_sql_stmt := 'UPDATE /*+ ROWID (dda) */ test_tab t
                 SET    t.num_col = t.num_col + 10
                 WHERE rowid BETWEEN :start_idAND :end_id';
   DBMS_PARALLEL_EXECUTE.run_task(task_name      => l_task,
                                 sql_stmt       => l_sql_stmt,
                                language_flag  =>DBMS_SQL.NATIVE,
                                 parallel_level => 10);
    --If there is error, RESUME it for at most 2 times.
 l_try := 0;
 l_status := DBMS_PARALLEL_EXECUTE.task_status(l_task);
 WHILE(l_try < 2 and l_status != DBMS_PARALLEL_EXECUTE.FINISHED)
 Loop
   l_try := l_try + 1;
   DBMS_PARALLEL_EXECUTE.resume_task(l_task);
   l_status := DBMS_PARALLEL_EXECUTE.task_status(l_task);
  ENDLOOP;
   DBMS_PARALLEL_EXECUTE.drop_task(l_task);
END;
/
  3.2 Test 2
       Thefollowing example shows the processing of a workload chunked by a numbercolumn. Notice that the workload is actually a stored procedure in this case.
  CREATE OR REPLACE PROCEDURE process_update(p_start_id IN NUMBER, p_end_id IN NUMBER) AS
BEGIN
 UPDATE /*+ ROWID (dda) */ test_tab t
 SET    t.num_col = t.num_col + 10
 WHERE id BETWEEN p_start_id AND p_end_id;
END;
/
  DECLARE
 l_task     VARCHAR2(30) :='test_task';
 l_sql_stmt VARCHAR2(32767);
 l_try      NUMBER;
 l_status   NUMBER;
BEGIN
 DBMS_PARALLEL_EXECUTE.create_task (task_name => l_task);
   DBMS_PARALLEL_EXECUTE.create_chunks_by_number_col(task_name    => l_task,
                                                   table_owner  => 'TEST',
                                                   table_name   => 'TEST_TAB',
                                                   table_column => 'ID',
                                                   chunk_size   => 10000);
   l_sql_stmt := 'BEGIN process_update(:start_id, :end_id); END;';
   DBMS_PARALLEL_EXECUTE.run_task(task_name      => l_task,
                                 sql_stmt       => l_sql_stmt,
                                language_flag  =>DBMS_SQL.NATIVE,
                                 parallel_level=> 10);
    --If there is error, RESUME it for at most 2 times.
 l_try := 0;
 l_status := DBMS_PARALLEL_EXECUTE.task_status(l_task);
 WHILE(l_try < 2 and l_status != DBMS_PARALLEL_EXECUTE.FINISHED)
 Loop
   l_try := l_try + 1;
   DBMS_PARALLEL_EXECUTE.resume_task(l_task);
   l_status := DBMS_PARALLEL_EXECUTE.task_status(l_task);
  ENDLOOP;
   DBMS_PARALLEL_EXECUTE.drop_task(l_task);
END;
/
  3.3 Test 3
       Thefollowing example shows a workload chunked by an SQL statement and processed bya user-defined framework.
  DECLARE
 l_task     VARCHAR2(30) :='test_task';
 l_stmt     CLOB;
 l_sql_stmt VARCHAR2(32767);
 l_chunk_id NUMBER;
 l_start_id NUMBER;
 l_end_id   NUMBER;
 l_any_rows BOOLEAN;
BEGIN
 DBMS_PARALLEL_EXECUTE.create_task (task_name => l_task);
   l_stmt := 'SELECT DISTINCT num_col, num_col FROM test_tab';
   DBMS_PARALLEL_EXECUTE.create_chunks_by_sql(task_name => l_task,
                                            sql_stmt  => l_stmt,
                                            by_rowid  => FALSE);
   l_sql_stmt := 'UPDATE /*+ ROWID (dda) */ test_tab t
                 SET    t.num_col = t.num_col
                 WHERE num_col BETWEEN:start_id AND :end_id';
   LOOP
   -- Get next unassigned chunk.
   DBMS_PARALLEL_EXECUTE.get_number_col_chunk(task_name => 'test_task',
                                              chunk_id    => l_chunk_id,
                                               start_id    => l_start_id,
                                              end_id      => l_end_id,
                                              any_rows    => l_any_rows);
     EXIT WHEN l_any_rows = FALSE;
     BEGIN
     -- Manually execute the work.
     EXECUTE IMMEDIATE l_sql_stmt USING l_start_id, l_end_id;
       -- Set the chunk status as processed.
     DBMS_PARALLEL_EXECUTE.set_chunk_status(task_name => 'test_task',
                                             chunk_id  => l_chunk_id,
                                            status    =>DBMS_PARALLEL_EXECUTE.PROCESSED);
     EXCEPTION
       WHEN OTHERS THEN
         -- Record chunk error.
         DBMS_PARALLEL_EXECUTE.set_chunk_status(task_name => 'test_task',
                                                chunk_id  => l_chunk_id,
                                                status    =>DBMS_PARALLEL_EXECUTE.PROCESSED_WITH_ERROR,
                                                err_num   => SQLCODE,
                                                err_msg   => SQLERRM);
   END;
     -- Commit work.
   COMMIT;
  ENDLOOP;
   DBMS_PARALLEL_EXECUTE.drop_task(l_task);
END;
/
 

你可能感兴趣的:(Oracle11g使用dbms_parallel_execute对大表进行并行update)