SQL中加入ROWNUM = 1会让性能有所提升么?有可能,但也不全是。遇到了一个Case,ROWNUM = 1的加入并没有像想象的那样让性能有所提升。
背景:PL/SQL的代码,用下边的Query来检查符合条件的记录是否存在。
SELECT 'X' FROM MTL_MATERIAL_TRANSACTIONS A,MTL_TRANSACTION_TYPES B WHERE A.ORGANIZATION_ID = :b1 AND B.TRANSACTION_TYPE_ID = A.TRANSACTION_TYPE_ID AND B.TRANSACTION_ACTION_ID = A.TRANSACTION_ACTION_ID AND ROWNUM = 1 call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.01 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 1 423.38 20756.18 3779615 5714592 0 0 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 3 423.39 20756.20 3779615 5714592 0 0 Misses in library cache during parse: 1 Misses in library cache during execute: 1 Optimizer mode: ALL_ROWS Parsing user id: 98 (APPS) Rows Row Source Operation ------- --------------------------------------------------- 0 COUNT STOPKEY (cr=5714592 pr=3779615 pw=0 time=0 us) 0 NESTED LOOPS (cr=5714592 pr=3779615 pw=0 time=0 us) 32780990 NESTED LOOPS (cr=274762 pr=267808 pw=0 time=14400615 us cost=8 size=18 card=1) 114 TABLE ACCESS FULL MTL_TRANSACTION_TYPES (cr=7 pr=13 pw=0 time=0 us cost=2 size=7 card=1) 32780990 INDEX RANGE SCAN MTL_MATERIAL_TRANSACTIONS_U2 (cr=274755 pr=267795 pw=0 time=14379618 us cost=3 size=0 card=15)(object id 806637) 0 TABLE ACCESS BY GLOBAL INDEX ROWID MTL_MATERIAL_TRANSACTIONS PARTITION: ROW LOCATION ROW LOCATION (cr=5439830 pr=3511807 pw=0 time=0 us cost=6 size=11 card=1) Rows Execution Plan ------- --------------------------------------------------- 0 SELECT STATEMENT MODE: ALL_ROWS 0 COUNT (STOPKEY) 0 NESTED LOOPS 32780990 NESTED LOOPS 114 TABLE ACCESS MODE: ANALYZED (FULL) OF 'MTL_TRANSACTION_TYPES' (TABLE) 32780990 INDEX MODE: ANALYZED (RANGE SCAN) OF 'MTL_MATERIAL_TRANSACTIONS_U2' (INDEX (UNIQUE)) 0 TABLE ACCESS MODE: ANALYZED (BY GLOBAL INDEX ROWID) OF 'MTL_MATERIAL_TRANSACTIONS' (TABLE) PARTITION:ROW LOCATION Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ SQL*Net message to client 3 0.00 0.00 SQL*Net message from client 3 0.00 0.00 db file sequential read 3779603 1.54 20386.59 db file scattered read 3 0.05 0.09 gc cr grant 2-way 274917 0.41 90.72 gc current block 2-way 21959 0.21 10.07 latch: object queue header operation 23 0.00 0.02 latch: gc element 1 0.00 0.00 gc remaster 20 0.18 0.75 gcs drm freeze in enter server mode 24 0.33 1.30 ********************************************************************************
MTL_MATERIAL_TRANSACTIONS是一个千万级别的表,加索引的效果不大,因为即使加了索引到ORGANIZATION_ID+TRANSACTION_TYPE_ID+TRANSACTION_ACTION_ID,仍会有大量的数据,也就是数据的选择性并不好。
ROWNUM = 1并没有像预想的那样,大幅提升性能,这其中的原因我想应该是,MTL_MATERIAL_TRANSACTIONS和MTL_TRANSACTION_TYPES先做关联,再从中找第一行,而两表关联的成本非常高,所以造成了性能表现非常差。
解决方法:
使用EXISTS来检查记录是否存在(Check Existence),性能大幅提升.
select 'X' into v_garbage from DUAL where exists (SELECT 'X' FROM mtl_material_transactions a,mtl_transaction_types b WHERE a.organization_id = org_id AND b.transaction_type_id = a.transaction_type_id AND b.transaction_action_id = a.transaction_action_id);
另外关于Exists vs In
see Tom Posts:http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:953229842074
Tom: can you give me some example at which situation IN is better than exist, and vice versa.
Well, the two are processed very very differently. Select * from T1 where x in ( select y from T2 ) is typically processed as: select * from t1, ( select distinct y from t2 ) t2 where t1.x = t2.y; The subquery is evaluated, distinct'ed, indexed (or hashed or sorted) and then joined to the original table -- typically. As opposed to select * from t1 where exists ( select null from t2 where y = x ) That is processed more like: for x in ( select * from t1 ) loop if ( exists ( select null from t2 where y = x.x ) then OUTPUT THE RECORD end if end loop It always results in a full scan of T1 whereas the first query can make use of an index on T1(x). So, when is where exists appropriate and in appropriate? Lets say the result of the subquery ( select y from T2 ) is "huge" and takes a long time. But the table T1 is relatively small and executing ( select null from t2 where y = x.x ) is very very fast (nice index on t2(y)). Then the exists will be faster as the time to full scan T1 and do the index probe into T2 could be less then the time to simply full scan T2 to build the subquery we need to distinct on. Lets say the result of the subquery is small -- then IN is typicaly more appropriate. If both the subquery and the outer table are huge -- either might work as well as the other -- depends on the indexes and other factors.