2015-05-21 Created By BaoXinjian
一、摘要
以前只考虑 merge into 只是在特定场合下方便才使用的,今天才发现,merge into 竟然会比 update 在更新数据时有这么大的改进。
其实呢,merge into部分的update和update也没啥不同的,不同的地方在于使用merge into后执行计划变了。
merge方法是最简洁,效率最高的方式,在大数据量更新时优先使用这种方式。
1. 基本语法
merge into test1 using test2 on (test1.id = test2.id) when matched then update set test1.name = nvl2(test1.name,test2.name,test1.name);
update内联视图方式:使用这种方式必须在test2.id上有主键 (这里很好理解,必须保证每一个test1.id对应在test2里只有一条记录,如果test2中有多条对应的记录,怎么更新test1)
或者on (test1.id = test2.id, test1.name = test2.name ....),通过多栏位对比,确认唯一记录,类似Unique Index
2. 使用并行,加快大量数据更新:
merge /*+parallel(test1,4)*/ into test1 using test2 on (test1.id = test2.id) when matched then update
set test1.name = nvl2(test1.name,test2.name,test1.name);
二、测试案例 - Update / Merge Into
1. 创建测试数据
create table test1 as select * from dba_objects where rownum<=10000;--10000条记录
create table test2 as select * from dba_objects--73056条记录
2. 直接Update时间和效率
SQL> alter system flush shared_pool; System altered. SQL> alter system flush buffer_cache; System altered. SQL> set linesize 400 pagesize 400 SQL> set autot trace SQL> set timing on SQL> update test1 t1 2 set t1.object_name = (select t2.object_name
3 from test2 t2 4 where t2.object_id = t1.object_id); 10000 rows updated. Elapsed: 00:06:33.35 Execution Plan
----------------------------------------------------------
0 UPDATE STATEMENT Optimizer=ALL_ROWS (Cost=2923252 Card=10011 Bytes=790869) 1 0 UPDATE OF 'TEST1'
2 1 TABLE ACCESS (FULL) OF 'TEST1' (TABLE) (Cost=40 Card=10011 Bytes=790869) 3 1 TABLE ACCESS (FULL) OF 'TEST2' (TABLE) (Cost=292 Card=772 Bytes=60988) Statistics
----------------------------------------------------------
430 recursive calls 11122 db block gets 15275257 consistent gets 1175 physical reads 4058752 redo size 520 bytes sent via SQL*Net to client 668 bytes received via SQL*Net from client 3 SQL*Net roundtrips to/from client 7 sorts (memory) 0 sorts (disk) 10000 rows processed
3. 通过Merge Into时间和效率
SQL> alter system flush shared_pool; System altered. Elapsed: 00:00:00.45 SQL> alter system flush buffer_cache; System altered. Elapsed: 00:00:00.71 SQL> merge into test1 t1 2 using test2 t2 3 on (t1.object_id = t2.object_id) 4 when matched then
5 update set t1.object_name = t2.object_name; 10000 rows merged. Elapsed: 00:00:00.92 Execution Plan
----------------------------------------------------------
0 MERGE STATEMENT Optimizer=ALL_ROWS (Cost=1243 Card=10011 Bytes=1321452) 1 0 MERGE OF 'TEST1'
2 1 VIEW
3 2 HASH JOIN (Cost=1243 Card=10011 Bytes=4264686) 4 3 TABLE ACCESS (FULL) OF 'TEST1' (TABLE) (Cost=40 Card=10011 Bytes=2192409) 5 3 TABLE ACCESS (FULL) OF 'TEST2' (TABLE) (Cost=292 Card=77163 Bytes=15972741) Statistics
----------------------------------------------------------
1224 recursive calls 10279 db block gets 1586 consistent gets 1191 physical reads 2803872 redo size 526 bytes sent via SQL*Net to client 634 bytes received via SQL*Net from client 3 SQL*Net roundtrips to/from client 12 sorts (memory) 0 sorts (disk) 10000 rows processed
三、解析计划
1. 通过Update的解析计划
SQL> set autot off SQL> update /*+gather_plan_statistics*/ test1 t1 2 set t1.object_name = (select t2.object_name
3 from test2 t2 4 where t2.object_id = t1.object_id); 10000 rows updated. Elapsed: 00:04:32.81 SQL> select * from table(dbms_xplan.display_cursor(null,null,'iostats')); PLAN_TABLE_OUTPUT --------------------------------------------------------------------------------------------
SQL_ID c8qt9a54qgmqg, child number 0
-------------------------------------
update /*+gather_plan_statistics*/ test1 t1 set t1.object_name = (select t2.object_name from test2 t2 where t2.object_id = t1.object_id) Plan hash value: 3883393169
--------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
--------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 1 | | 0 |00:04:32.73 | 10M|
| 1 | UPDATE | TEST1 | 1 | | 0 |00:04:32.73 | 10M|
| 2 | TABLE ACCESS FULL| TEST1 | 1 | 10011 | 10000 |00:00:00.17 | 133 |
|* 3 | TABLE ACCESS FULL| TEST2 | 10000 | 772 | 10000 |00:04:31.51 | 10M|
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id): ---------------------------------------------------
3 - filter("T2"."OBJECT_ID"=:B1) Note -----
- dynamic sampling used for this statement (level=2) 26 rows selected. Elapsed: 00:00:01.38
2. 通过Merge Into的解析计划
SQL> merge /*+gather_plan_statistics*/
2 into test1 t1 3 using test2 t2 4 on (t1.object_id = t2.object_id) 5 when matched then
6 update set t1.object_name = t2.object_name; 10000 rows merged. Elapsed: 00:00:00.52 SQL> select * from table(dbms_xplan.display_cursor(null,null,'iostats')); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------------------
SQL_ID 9n4tc6tvwaj9c, child number 0
-------------------------------------
merge /*+gather_plan_statistics*/ into test1 t1 using test2 t2 on (t1.object_id = t2.object_id) when matched then update set t1.object_name = t2.object_name
Plan hash value: 818823782
----------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
----------------------------------------------------------------------------------------
| 0 | MERGE STATEMENT | | 1 | | 0 |00:00:00.47 | 11458 |
| 1 | MERGE | TEST1 | 1 | | 0 |00:00:00.47 | 11458 |
| 2 | VIEW | | 1 | | 10000 |00:00:00.33 | 1179 |
|* 3 | HASH JOIN | | 1 | 10011 | 10000 |00:00:00.25 | 1179 |
| 4 | TABLE ACCESS FULL| TEST1 | 1 | 10011 | 10000 |00:00:00.08 | 133 |
| 5 | TABLE ACCESS FULL| TEST2 | 1 | 77163 | 73056 |00:00:00.26 | 1046 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id): ---------------------------------------------------
3 - access("T1"."OBJECT_ID"="T2"."OBJECT_ID") Note -----
- dynamic sampling used for this statement (level=2) 28 rows selected. Elapsed: 00:00:00.15
四、结果分析
1. 测试结果对比:update和merge into 都更新1w条记录,
update耗时6分钟,逻辑读消耗15275257;
merge into 耗时6秒钟,消耗逻辑读1586,相差太大了。
2. 其实看着执行计划,这个结果也很容易理解:
update采用的类似nested loop的方式,对更新的每一行,都会对查询的表扫描一次;
merge into这里选择的是hash join,则针对每张表都是做了一次 full table scan,对每张表都只是扫描一次。
3. Oracle官方建议,在大数据更新过程中,也是通过使用Merge Into代替Update
Thanks and Regards
参考: http://blog.csdn.net/xiexbb/article/details/4242063