反连接(anti-join)
反关连的基本语法
反连接是从一个结果集合中返回不在另一个结果集中的数据行。
例如:在scott/tiger的用户下执行
select* from emp where deptno not in (select deptno from dept where loc= 'NEW YORK')
执行反关连运算连接还有 not exists
如上述语句还可以表述为
select* from emp
where not exists (select null from dept where loc='NEW YORK'and deptno=emp.deptno)
但是not in 与 not esists还是有区别的,请看下面的语句
selectcount(*) from emp where empno not in ( select mgr from emp where mgr);
COUNT(*)
----------
0
selectcount(*) from emp T1
wherenot exists ( select null from emp T2 where t2.mgr = t1.empno );
COUNT(*)
----------
9
为什么在上述语句中使用not in返回0值哪?
观察emp表字字段mgr的行数据,我们发现有一条数据是null值, 由于null不等于null,所以not in (null)都返回 false,
如:select count(*) from emp where empno not in (7369,null)
COUNT(*)
----------
0
所以返回0条记录。而exists则是判断null值是否存在。
关于null的运算,参见null.
运用not in来实现反关连时,要注意其中的歧义。
在绝大多数情况下,开发人员编写not in的时候,实际含义是 not in (select c from t where c is not null and…).
例如
selectcount(*) from emp where empno not in ( select mgr from emp where mgr is not null);
COUNT(*)
----------
8
其他反关连的写法:
withtem_dept as
(selectdeptno from dept where loc= 'NEW YORK')
selectcount(*)
fromemp a left outer join tem_dept b on (a.deptno=b.deptno)
where b.deptno IS NULL
注:,那些原来反关连的数据实际上是本左关连中那些deptno为空的数据,这个语句将反关连连接转化为关连连接。
也可以用minus方法,求出在反关连的数据结合。
反关连操作的效率
据我们观察,反关连的最佳写法与条件与数据库版本有关系,本节试图给出9i 10g 11g不同版本下反关连的效率测试结果,并给出不同版本下的最佳的反关连写法。
[11g EE版本]
select* from v$version
BANNER ----------------------------------------------------------------------------------------- |
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production |
PL/SQL Release 11.1.0.6.0 - Production |
CORE 11.1.0.6.0 Production |
TNS for 32-bit Windows: Version 11.1.0.6.0 - Production |
NLSRTL Version 11.1.0.6.0 - Production |
在其他数据库中,开发人员一般惧怕反关连,因为在这些数据库中,在大数量的情况下,反关连常常存在效率低下的问题。在下面部分,我们研究oracle中的反关连的效率。
connscott/tiger
droptable table_1 purge;
droptable table_2 purge;
createtable table_1
asselect
cast(rownumas int) a,
cast(rownum+10as int) b,
cast(dbms_random.string('i',10)as varchar2(10)) c
fromdual connect by level<=50000
createtable table_2
asselect
cast(rownumas int) a,
cast(rownum+10as int) b,
cast(dbms_random.string('i',10)as varchar2(10)) c
fromdual connect by level<=49950
table_1与table_2的结构如下:
desctable_1 (desc table_2)
----------------------------------
ANUMBER(38,0),
BNUMBER(38,0),
CVARCHAR2(10)
altertable table_2 modify a not null;
createindex idx_table2_a on table_2(a);
现在我们要从table_1找出列a的值,不在table_2中的a列值的记录,从构建表的脚本我们看出 table_1的50000条中有49950条的记录存在table_2中,只有50条记录不存在table_2中。这是一个反关连查询。
我们打开tkprof对测试结果进行跟踪。诊断结果如下:
1:********************************************************************************
select /*2null1*/ c from table_1 where a not in (select a fromtable_2)
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.04 0.05 0 299 0 50
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 0.04 0.05 0 299 0 50
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 92
Rows Row Source Operation
------- ---------------------------------------------------
50 HASH JOIN RIGHT ANTI SNA (cr=299 pr=0 pw=0time=2 us cost=87 size=899100 card=49950)
49950 INDEX FAST FULL SCANIDX_TABLE2_A (cr=117 pr=0 pw=0 time=777 us cost=32 size=199800card=49950)(object id 80213)
50000 TABLE ACCESS FULL TABLE_1(cr=182 pr=0 pw=0 time=759 us cost=53 size=700000 card=50000)
2:********************************************************************************
select /*2null2*/ c
from
table_1where not exists (select null fromtable_2 where a= table_1.a)
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.04 0.05 0 299 0 50
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 0.04 0.05 0 299 0 50
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 92
Rows Row Source Operation
------- ---------------------------------------------------
50 HASH JOIN RIGHT ANTI (cr=299 pr=0 pw=0 time=3 us cost=87 size=899100 card=49950)
49950 INDEX FAST FULL SCANIDX_TABLE2_A (cr=117 pr=0 pw=0 time=782 us cost=32 size=199800card=49950)(object id 80213)
50000 TABLE ACCESS FULL TABLE_1(cr=182 pr=0 pw=0 time=745 us cost=53 size=700000 card=50000)
3:********************************************************************************
SQL ID : g9zd4b5q4886v
select /*2null3*/ table_1.c from table_1 leftouter join table_2 on (table_1.a=table_2.a)
where table_2.rowid is null
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.06 0.07 0 299 0 50
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 0.06 0.07 0 299 0 50
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 92
Rows Row Source Operation
------- ---------------------------------------------------
50 FILTER (cr=299 pr=0pw=0 time=9 us)
50000 HASH JOIN OUTER (cr=299 pr=0 pw=0 time=3770 us cost=215 size=1500000 card=50000)
50000 TABLE ACCESS FULL TABLE_1(cr=182 pr=0 pw=0 time=790 us cost=53 size=700000 card=50000)
49950 INDEX FAST FULL SCANIDX_TABLE2_A (cr=117 pr=0 pw=0 time=780 us cost=32 size=799200card=49950)(object id 80213)
现在我们去掉not null约束
alter table table_2 modify a null
analyze table table_1 compute statistics
analyze table table_2 compute statistics
for table
for all indexes
for all indexed columns
select/*1null1*/ c
from
table_1 where a not in (select a from table_2)
call count cpu elapsed disk query current rows
------------- -------- ---------- -------------------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.04 0.05 0 363 0 50
------------- -------- ---------- -------------------- ---------- ----------
total 3 0.04 0.05 0 363 0 50
Misses in library cache during parse: 1
Optimizermode: ALL_ROWS
Parsinguser id: 92
Rows Row Source Operation
------- ---------------------------------------------------
50 HASH JOIN RIGHT ANTI NA (cr=363 pr=0 pw=0 time=3us cost=108 size=899100 card=49950)
49950 TABLE ACCESS FULL TABLE_2 (cr=181 pr=0 pw=0 time=775 us cost=53size=199800 card=49950)
50000 TABLE ACCESS FULL TABLE_1 (cr=182 pr=0 pw=0 time=760 us cost=53size=700000 card=50000)
********************************************************************************
SQL ID : 9d9mgx3cnuxy9
select/*1null2*/ c from table_1 where notexists (select null from table_2 where a= table_1.a)
call count cpu elapsed disk query current rows
------------- -------- ---------- -------------------- ---------- ----------
Parse 1 0.01 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.04 0.05 0 299 0 50
------------- -------- ---------- -------------------- ---------- ----------
total 3 0.06 0.05 0 299 0 50
Misses in library cache during parse: 1
Optimizermode: ALL_ROWS
Parsinguser id: 92
Rows Row Source Operation
------- ---------------------------------------------------
50 HASH JOIN RIGHT ANTI (cr=299 pr=0 pw=0 time=3 uscost=87 size=899100 card=49950)
49950 INDEX FAST FULL SCAN IDX_TABLE2_A (cr=117 pr=0 pw=0 time=792 us cost=32size=199800 card=49950)(object id 80213)
50000 TABLE ACCESS FULL TABLE_1 (cr=182 pr=0 pw=0 time=764 us cost=53size=700000 card=50000)
********************************************************************************
SQL ID : cjt1rq7d6c8zx
select/*1null3*/ table_1.c
from
table_1 left outer join table_2 on(table_1.a=table_2.a) where
table_2.rowid is null
call count cpu elapsed disk query current rows
------------- -------- ---------- -------------------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.07 0.07 0 299 0 50
------------- -------- ---------- -------------------- ---------- ----------
total 3 0.07 0.07 0 299 0 50
Misses in library cache during parse: 1
Optimizermode: ALL_ROWS
Parsinguser id: 92
Rows Row Source Operation
------- ---------------------------------------------------
50 FILTER (cr=299 pr=0 pw=0 time=10us)
50000 HASH JOIN OUTER (cr=299 pr=0 pw=0 time=3798 us cost=215size=1500000 card=50000)
50000 TABLE ACCESS FULL TABLE_1 (cr=182 pr=0 pw=0 time=780 us cost=53size=700000 card=50000)
49950 INDEX FAST FULL SCAN IDX_TABLE2_A (cr=117 pr=0 pw=0 time=792 us cost=32size=799200 card=49950)(object id 80213)
我们看到有没有not null,对于not in的反关连有很大的影响, 而对not exists没有什么影响。
改变优化模式
SQL ID : 6zr7ys7uq3xk3
select/*+all_rows*/ c from table_1 where notexists (select null from table_2 where a= table_1.a)
call count cpu elapsed disk query current rows
------------- -------- ---------- -------------------- ---------- ----------
Parse 1 0.01 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.04 0.05 0 299 0 50
------------- -------- ---------- ---------- -------------------- ----------
total 3 0.06 0.05 0 299 0 50
Misses in library cache during parse: 1
Optimizermode: ALL_ROWS
Parsinguser id: 92
Rows Row Source Operation
------- ---------------------------------------------------
50 HASH JOIN RIGHT ANTI (cr=299 pr=0 pw=0 time=3 us cost=87 size=899100card=49950)
49950 INDEX FAST FULL SCAN IDX_TABLE2_A (cr=117 pr=0 pw=0 time=789 us cost=32size=199800 card=49950)(object id 80213)
50000 TABLE ACCESS FULL TABLE_1 (cr=182 pr=0 pw=0 time=770 us cost=53size=700000 card=50000)
********************************************************************************
SQL ID : awfphtf1k8c55
select/*+first_rows*/ c
from
table_1where not exists (select null fromtable_2 where a= table_1.a)
call count cpu elapsed disk query current rows
------------- -------- ---------- -------------------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.48 0.47 0 844 0 50
------------- -------- ---------- -------------------- ---------- ----------
total 3 0.48 0.47 0 844 0 50
Misses in library cache during parse: 1
Optimizermode: FIRST_ROWS
Parsinguser id: 92
Rows Row Source Operation
------- ---------------------------------------------------
50 NESTED LOOPS ANTI (cr=844 pr=0 pw=0 time=11 us cost=50072size=899100 card=49950)
50000 TABLE ACCESS FULL TABLE_1 (cr=182 pr=0 pw=0 time=1007 us cost=53size=700000 card=50000)
49950 INDEX RANGE SCAN IDX_TABLE2_A (cr=662 pr=0 pw=0 time=0 us cost=1size=200 card=50)(object id 80213)
********************************************************************************
SQL ID : 8jq5gkjyz41qp
select/*+rule*/ c
from
table_1 where not exists (select null from table_2 where a= table_1.a)
call count cpu elapsed disk query current rows
------------- -------- ---------- -------------------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.57 0.58 0 50841 0 50
------------- -------- ---------- -------------------- ---------- ----------
total 3 0.57 0.58 0 50841 0 50
Misses in library cache during parse: 1
Optimizermode: RULE
Parsinguser id: 92
Rows Row Source Operation
------- ---------------------------------------------------
50 FILTER (cr=50841 pr=0 pw=0time=15 us)
50000 TABLE ACCESS FULL TABLE_1 (cr=182 pr=0 pw=0 time=1109 us)
49950 INDEX RANGE SCAN IDX_TABLE2_A (cr=50659 pr=0 pw=0 time=0 us)(object id80213)
********************************************************************************
SQL ID : 0q6g5gw0h7ayv
select/*+choose*/ c from table_1 where notexists (select null from table_2 where a= table_1.a)
call count cpu elapsed disk query current rows
------------- -------- ---------- -------------------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.04 0.05 0 299 0 50
------------- -------- ---------- -------------------- ---------- ----------
total 3 0.04 0.05 0 299 0 50
Misses in library cache during parse: 1
Optimizermode: CHOOSE
Parsinguser id: 92
Rows Row Source Operation
------- ---------------------------------------------------
50 HASH JOIN RIGHT ANTI (cr=299 pr=0 pw=0 time=4 us cost=87 size=899100card=49950)
49950 INDEX FAST FULL SCAN IDX_TABLE2_A (cr=117 pr=0 pw=0 time=789 us cost=32size=199800 card=49950)(object id 80213)
50000 TABLE ACCESS FULL TABLE_1 (cr=182 pr=0 pw=0 time=755 us cost=53size=700000 card=50000)
我们看到 /*+all_rows*/是最好的优化模式,而采用rule模式后,则程序的效率大大下降,采用first_rows模式,效率也有所下降。
通过上面的测试,反关连最佳的语句为
select/*+all_rows*/ c from table_1 where notexists (select null from table_2 where a= table_1.a)
在table_2的反关连字段上,建立索引。
在反关连时,主表(table_1)总是全表扫描。
如果要采用not in 反关连,则table_2表上的关联字段不但要有索引,而且必须有not null约束。
[10g EE]
[9i EE]
反关连的其他问题
反关连的集合问题
with temp_table_2 as
(select to_number(column_value) a fromtable(str_to_table('1,2,3,4,5,6',',')))
select c
from table_1
where not exists (select null from temp_table_2where a= table_1.a)
CREATE OR REPLACE FUNCTION str_to_table(
string_in IN VARCHAR2 ,
delimiter_in IN VARCHAR2DEFAULT ','
) RETURN myTable IS
v_wkg_str VARCHAR2 ( 32767 ) :=string_in || delimiter_in;
v_pos PLS_INTEGER ;
nt_return myTable := myTable();
i_count pls_integer;
BEGIN
LOOP
v_pos :=INSTR(v_wkg_str,delimiter_in);
EXIT WHEN NVL(v_pos, 0 ) = 0 or v_wkg_str=delimiter_in;
if TRIM (SUBSTR(v_wkg_str, 1 ,v_pos-1 )) is not null then
nt_return. EXTEND ;
nt_return(nt_return. LAST ) := TRIM(SUBSTR(v_wkg_str, 1 ,v_pos- 1 ));
end if;
v_wkg_str := SUBSTR(v_wkg_str,v_pos+1 );
END LOOP ;
RETURN nt_return;
END ;
selectto_number(column_value) a from table(str_to_table('1,2,3,4,5,6',','));
A |
1 |
2 |
3 |
4 |
5 |
6 |
我们知道 在in 或者 exists的时候
with temp_table_2 as
(select /*+cardinality(10)*/to_number(column_value) a from table(str_to_table('1,2,3,4,5,6',',')))
select c
from table_1
where exists (select null from temp_table_2 where a= table_1.a)
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 90 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.03 0.02 0 182 0 6
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 0.03 0.02 0 272 0 6
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 92
Rows Row Source Operation
------- ---------------------------------------------------
6 HASH JOIN RIGHT SEMI (cr=182pr=0 pw=0 time=18 us cost=83 size=16 card=1)
6 COLLECTION ITERATOR PICKLERFETCH STR_TO_TABLE (cr=0 pr=0 pw=0 time=2 us)
50000 TABLE ACCESS FULL TABLE_1(cr=182 pr=0 pw=0 time=856 us cost=53 size=700000 card=50000)
with temp_table_2 as
(select /*+cardinality(m,10)*/to_number(column_value) a from table(str_to_table('1,2,3,4,5,6',',')) m)
select c
from table_1
where a in (select a from temp_table_2where a= table_1.a)
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- -------------------- ----------
Parse 1 0.00 0.00 0 90 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.42 0.41 0 182 0 0
------- ------ -------- ---------- ---------- -------------------- ----------
total 3 0.42 0.42 0 272 0 0
Misses inlibrary cache during parse: 1
Optimizermode: ALL_ROWS
Parsing userid: 92
Rows Row Source Operation
------- ---------------------------------------------------
0 FILTER (cr=182 pr=0 pw=0 time=0us)
50000 TABLE ACCESS FULL TABLE_1(cr=182 pr=0 pw=0 time=868 us cost=54 size=700000 card=50000)
0 FILTER (cr=0 pr=0 pw=0 time=0 us)
0 COLLECTION ITERATOR PICKLERFETCH STR_TO_TABLE (cr=0 pr=0 pw=0 time=0 us)
with temp_table_2 as
(select to_number(column_value) a from table(str_to_table('1,2,3,4,5,6',',')) m)
select c
from table_1
where a in (select a from temp_table_2)
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- -------------------- ----------
Parse 1 0.00 0.00 0 90 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.03 0.01 0 182 0 6
------- ------ -------- ---------- ---------- -------------------- ----------
total 3 0.03 0.02 0 272 0 6
Misses inlibrary cache during parse: 1
Optimizermode: ALL_ROWS
Parsing userid: 92
Rows Row Source Operation
------- ---------------------------------------------------
6 HASH JOIN RIGHT SEMI (cr=182 pr=0 pw=0 time=10 us cost=83 size=16card=1)
6 COLLECTION ITERATOR PICKLERFETCH STR_TO_TABLE (cr=0 pr=0 pw=0 time=2 us)
50000 TABLE ACCESS FULL TABLE_1(cr=182 pr=0 pw=0 time=751 us cost=53 size=700000 card=50000)
with temp_table_2 as
(select /*+cardinality(m,10)*/to_number(column_value) a from table(str_to_table('1,2,3,4,5,6',',')) m)
select c
from table_1
where a in (select a from temp_table_2)/**/
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- -------------------- ----------
Parse 1 0.01 0.01 0 90 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.00 0.00 0 9 0 6
------- ------ -------- ---------- ---------- -------------------- ----------
total 3 0.01 0.01 0 99 0 6
Misses inlibrary cache during parse: 1
Optimizermode: ALL_ROWS
Parsing userid: 92
Rows Row Source Operation
------- ---------------------------------------------------
6 NESTED LOOPS (cr=9 pr=0 pw=0time=42 us)
6 NESTED LOOPS (cr=8 pr=0 pw=0 time=25 us cost=40 size=160card=10)
6 SORT UNIQUE (cr=0 pr=0 pw=0time=2 us)
6 COLLECTION ITERATOR PICKLERFETCH STR_TO_TABLE (cr=0 pr=0 pw=0 time=2 us)
6 INDEX RANGE SCANIDX_TABLE_1_A (cr=8 pr=0 pw=0 time=0 us cost=1 size=0 card=1)(object id 80214)
6 TABLE ACCESS BY INDEX ROWIDTABLE_1 (cr=1 pr=0 pw=0 time=0 us cost=2 size=14 card=1)
with temp_table_2 as
(select /*+cardinality(m,10)*/ to_number(column_value) a fromtable(str_to_table('1,2,3,4,5,6',',')) m)
select c
from table_1
where exists (select null from temp_table_2 where a= table_1.a) /**/
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- -------------------- ----------
Parse 1 0.01 0.00 0 90 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.00 0.00 0 9 0 6
------- ------ -------- ---------- ---------- -------------------- ----------
total 3 0.01 0.00 0 99 0 6
Misses inlibrary cache during parse: 1
Optimizermode: ALL_ROWS
Parsing userid: 92
Rows Row Source Operation
------- ---------------------------------------------------
6 NESTED LOOPS (cr=9 pr=0 pw=0time=42 us)
6 NESTED LOOPS (cr=8 pr=0 pw=0 time=26 us cost=40 size=160card=10)
6 SORT UNIQUE (cr=0 pr=0 pw=0time=2 us)
6 COLLECTION ITERATOR PICKLERFETCH STR_TO_TABLE (cr=0 pr=0 pw=0 time=1 us)
6 INDEX RANGE SCANIDX_TABLE_1_A (cr=8 pr=0 pw=0 time=0 us cost=1 size=0 card=1)(object id 80214)
6 TABLE ACCESS BY INDEX ROWIDTABLE_1 (cr=1 pr=0 pw=0 time=0 us cost=2 size=14 card=1)
with temp_table_2 as
(
select /*+cardinality(m,10)*/to_number(column_value) a from table(str_to_table('1,2,3,4,5,6',',')) m
)
select c
from table_1 a, temp_table_2 b
where a.a=b.a /**/
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- -------------------- ----------
Parse 1 0.00 0.00 0 90 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.00 0.00 0 9 0 6
------- ------ -------- ---------- ---------- -------------------- ----------
total 3 0.00 0.00 0 99 0 6
Misses inlibrary cache during parse: 1
Optimizermode: ALL_ROWS
Parsing userid: 92
Rows Row Source Operation
------- ---------------------------------------------------
6 NESTED LOOPS (cr=9 pr=0 pw=0time=44 us)
6 NESTED LOOPS (cr=8 pr=0 pw=0 time=27 us cost=49 size=160card=10)
6 COLLECTION ITERATOR PICKLERFETCH STR_TO_TABLE (cr=0 pr=0 pw=0 time=2 us)
6 INDEX RANGE SCAN IDX_TABLE_1_A (cr=8 pr=0pw=0 time=0 us cost=1 size=0 card=1)(object id 80214)
6 TABLE ACCESS BY INDEX ROWIDTABLE_1 (cr=1 pr=0 pw=0 time=0 us cost=2 size=14 card=1)
但是 not in 或者 not exists 则不需要/*+cardinality(m,10)*/。
反关连副表(table_2)为视图
select /*view union all*/ a,c
from
table_1 where not exists
(select null from table_2 where a= table_1.a
union all
select null from table_3 where a= table_1.a
)
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- -------------------- ----------
Parse 2 0.00 0.00 0 2 0 0
Execute 2 0.00 0.00 0 0 0 0
Fetch 302 3.09 2.85 0 121626 0 30100
------- ------ -------- ---------- ---------- -------------------- ----------
total 306 3.09 2.86 0 121628 0 30100
Misses inlibrary cache during parse: 1
Optimizermode: ALL_ROWS
Parsing userid: 92
Rows Row Source Operation
------- ---------------------------------------------------
30000 FILTER (cr=90917 pr=0 pw=0time=11533 us)
50000 TABLE ACCESS FULL TABLE_1(cr=481 pr=0 pw=0 time=1249 us cost=54 size=989440 card=49472)
20000 UNION-ALL (cr=90436 pr=0 pw=0 time=0 us)
10000 INDEX RANGE SCANIDX_TAB2_A (cr=50173 pr=0 pw=0 time=0 us cost=1 size=4 card=1)(object id 80218)
10000 INDEX RANGE SCANIDX_TAB3_A (cr=40263 pr=0 pw=0 time=0 us cost=1 size=4 card=1)(object id 80219)
select /*view null*/ a,c
from
table_1 where a not in
(select a from table_2
union all
select a from table_3
)
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- -------------------- ----------
Parse 1 0.00 0.00 0 2 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 32.50 32.54 0 969450 0 100
------- ------ -------- ---------- ---------- -------------------- ----------
total 3 32.50 32.54 0 969452 0 100
Misses inlibrary cache during parse: 1
Optimizermode: ALL_ROWS
Parsing userid: 92
Rows Row Source Operation
------- ---------------------------------------------------
100 FILTER (cr=969450 pr=0 pw=0time=3704 us)
20100 TABLE ACCESS FULL TABLE_1(cr=73 pr=0 pw=0 time=410 us cost=54 size=989440 card=49472)
20000 UNION-ALL (cr=969377 pr=0 pw=0 time=0 us)
10000 TABLE ACCESS FULL TABLE_2(cr=588250 pr=0 pw=0 time=0 us cost=12 size=4 card=1)
10000 TABLE ACCESS FULL TABLE_3(cr=381127 pr=0 pw=0 time=0 us cost=23 size=4 card=1)
从上述分析来看,对于
select a,c
from
table_1 where not exists
(select null from table_2 where a= table_1.a
union all
select null from table_3 where a= table_1.a
)
的反关连查询,在11gEE版本中,not in 与 not exists在效率上存在很大的差异,not exists的效率比not in 要高出许多。
union 还是 unionall
我们知道,在这样的查询中 ,采用union 还是 union all,不会影响最终的结果。但是是采用union 还是 union all,效率又会不同,请看一下测试:
select /*view union p1*/ a,c
from
table_1 where not exists
(select null from table_2 where a= table_1.a
union
select null from table_3 where a= table_1.a
)
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- -------------------- ----------
Parse 1 0.00 0.00 0 2 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 1.50 1.50 0 40773 0 100
------- ------ -------- ---------- ---------- -------------------- ----------
total 3 1.50 1.50 0 40775 0 100
Misses inlibrary cache during parse: 1
Optimizermode: ALL_ROWS
Parsing userid: 92
Rows Row Source Operation
------- ---------------------------------------------------
100 FILTER (cr=40773 pr=0 pw=0time=56 us)
20100 TABLE ACCESS FULL TABLE_1(cr=73 pr=0 pw=0 time=435 us cost=54 size=989440 card=49472)
20000 SORT UNIQUE (cr=40700 pr=0pw=0 time=0 us cost=4 size=8 card=2)
30000 UNION-ALL (cr=40700 pr=0 pw=0 time=285605 us)
10000 INDEX RANGE SCANIDX_TAB2_A (cr=20294 pr=0 pw=0 time=0 us cost=1 size=4 card=1)(object id 80218)
20000 INDEX RANGE SCANIDX_TAB3_A (cr=20406 pr=0 pw=0 time=0 us cost=1 size=4 card=1)(object id 80219)
********************************************************************************
select /*view union all p1*/ a,c
from
table_1 where not exists
(select null from table_2 where a= table_1.a
union all
select null from table_3 where a= table_1.a
)
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- -------------------- ----------
Parse 1 0.00 0.00 0 2 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.73 0.72 0 30709 0 100
------- ------ -------- ---------- ---------- -------------------- ----------
total 3 0.73 0.73 0 30711 0 100
Misses inlibrary cache during parse: 1
Optimizermode: ALL_ROWS
Parsing userid: 92
结果表明 使用union all比使用union 的语句,效率要更高。
从上面语句的执行计划来看,副数据集合如果是多于一个表的数据集合的集合运算的结果,无论是union 还是 union all,都没有用到反关连(执行计划中没有anti的字样出现)。
但是我们知道存在着 , 因此,我们用下面的语句改写成下列语句
select a,c
from table_1 where not exists (select null from table_2 where a=table_1.a)
intersect
select a,c
from table_1 where not exists (select null from table_3 where a=table_1.a)
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- -------------------- ----------
Parse 1 0.01 0.00 0 2 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.09 0.09 0 440 0 100
------- ------ -------- ---------- ---------- -------------------- ----------
total 3 0.10 0.09 0 442 0 100
察看执行计划
explain plan for
select a,c
from table_1 where not exists (select null from table_2 where a=table_1.a)
intersect
select a,c
from table_1 where not exists (select null from table_3 where a=table_1.a)
select * from table(dbms_xplan.display());
Plan hash value: 2650233949
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 29472 | 1615K| | 620 (46)| 00:00:08 |
| 1 | INTERSECTION | | | | | | |
| 2 | SORT UNIQUE | | 39472 | 925K| 2488K| 342 (2)| 00:00:05 |
|* 3 | HASH JOIN RIGHT ANTI | | 39472 | 925K| | 62 (2)| 00:00:01 |
| 4 | INDEX FAST FULL SCAN|IDX_TAB2_A | 10000 | 40000 | | 8 (0)| 00:00:01 |
| 5 | TABLE ACCESS FULL | TABLE_1 | 49472 | 966K| | 53 (0)| 00:00:01 |
| 6 | SORT UNIQUE | | 29472 | 690K| 1864K| 279 (2)| 00:00:04 |
|* 7 | HASH JOIN RIGHTANTI | | 29472 | 690K| | 68 (2)| 00:00:01 |
| 8 | INDEX FAST FULL SCAN|IDX_TAB3_A | 20000 | 80000 | | 14 (0)| 00:00:01 |
| 9 | TABLE ACCESS FULL | TABLE_1 | 49472 | 966K| | 53 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
而原有的语句的执行计划为
explain plan for
select a,c
from
table_1 where not exists
(select null from table_2 where a=table_1.a
union all
select null from table_3 where a=table_1.a
)
select * from table(dbms_xplan.display());
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 49440 | 965K| 3147 (1)| 00:00:38 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL|TABLE_1 | 49472 | 966K| 54 (2)| 00:00:01 |
| 3 | UNION-ALL | | | | | |
|* 4 | INDEX RANGE SCAN|IDX_TAB2_A | 1 | 4 | 1 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN|IDX_TAB3_A | 1 | 4 | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------------
如果 副数据集合(如table_2table_3)能够方便合并的话,可以事先计算好集合运算的结果,存放在新的表(如 table_4),在用not exists反关连运算,这样效率会更高。
select a,c
from table_1
where not exists (select null from table_4 where a= table_1.a)
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- -------------------- ----------
Parse 1 0.00 0.00 0 1 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.03 0.02 0 107 0 100
------- ------ -------- ---------- ---------- -------------------- ----------
total 3 0.03 0.02 0 108 0 100
explain plan for
select /*new table*/ c
from table_1
where not exists (select null from table_4 where a= table_1.a)
select * from table(dbms_xplan.display());
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 29472 | 690K| 66 (2)| 00:00:01 |
|* 1 | HASH JOIN RIGHTANTI| | 29472 | 690K| 66 (2)| 00:00:01 |
| 2 | TABLE ACCESS FULL | TABLE_4 | 20000 | 80000 | 12 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | TABLE_1 | 49472 | 966K| 53 (0)| 00:00:01 |
--------------------------------------------------------------------------------
在in 与 exists运算的时候,到底是选择in 还是exists的效率高,这与主数据集与副数据集合的多寡有一定的关系。那not in 与not exists的效率与与主数据集与副数据集合的多寡是否也有一定的关系,在11版本中作如下测试
drop table table_1 purge; --主数据集表
drop table table_2 purge; --副数据集表
create table table_1
as select
cast(rownum as int) a,
cast(rownum+10 as int) b,
cast(dbms_random.string('i',10) asvarchar2(10)) c
from dual connect by level<=500
主表有500条记录
create table table_2
as select
cast(rownum*2 as int) a,
cast(rownum*2+10 as int) b,
cast(dbms_random.string('i',10) asvarchar2(10)) c
from dual connect by level<=10000
主表有10000条记录
create index idx_tab2_a on table_2(a);
select c from table_1 where not exists (select nullfrom table_2 where a= table_1.a)
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- -------------------- ----------
Parse 1 0.01 0.00 0 1 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.00 0.00 0 31 0 100
------- ------ -------- ---------- ---------- -------------------- ----------
total 3 0.01 0.00 0 32 0 100
Rows Row Source Operation
------- ---------------------------------------------------
100 HASH JOIN ANTI (cr=31 pr=0 pw=0 time=3 us cost=12 size=24 card=1)
500 TABLE ACCESS FULL TABLE_1(cr=4 pr=0 pw=0 time=8 us cost=3 size=10000 card=500)
10000 INDEX FAST FULL SCAN IDX_TAB2_A(cr=27 pr=0 pw=0 time=150 us cost=8 size=40000 card=10000)(object id 80225)
select c from table_1 where a not in (select a fromtable_2)
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- -------------------- ----------
Parse 1 0.00 0.00 0 1 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.00 0.00 0 42 0 100
------- ------ -------- ---------- ---------- -------------------- ----------
total 3 0.00 0.00 0 43 0 100
explain plan for select c from table_1 where not exists (select null from table_2 where a=table_1.a)
select * fromtable(dbms_xplan.display());
------------------------------------------------------------------------------------
| Id | Operation |Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 24 | 12 (9)| 00:00:01 |
|* 1 | HASH JOIN ANTI | | 1 | 24 | 12 (9)| 00:00:01 |
| 2 | TABLE ACCESS FULL | TABLE_1 | 500 | 10000 | 3 (0)| 00:00:01 |
| 3 | INDEX FAST FULL SCAN|IDX_TAB2_A | 10000 | 40000 | 8 (0)| 00:00:01 |
------------------------------------------------------------------------------------
explain plan for
select c from table_1 where a not in (select a from table_2);
select * fromtable(dbms_xplan.display());
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 24 | 16 (7)| 00:00:01 |
|* 1 | HASH JOIN ANTI NA | | 1 | 24 | 16 (7)| 00:00:01 |
| 2 | TABLE ACCESS FULL| TABLE_1| 500 | 10000 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| TABLE_2 |10000 | 40000 | 12 (0)| 00:00:01 |
------------------------------------------------------------------------------
修改副表的反关连列的属性
alter table table_2 modify a not null;
explain plan for
select c from table_1 where a not in (select a from table_2)
select * fromtable(dbms_xplan.display());
------------------------------------------------------------------------------------
| Id | Operation |Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 24 | 12 (9)| 00:00:01 |
|* 1 | HASH JOIN ANTI SNA | | 1 | 24 | 12 (9)| 00:00:01 |
| 2 | TABLE ACCESS FULL | TABLE_1 | 500 | 10000 | 3 (0)| 00:00:01 |
| 3 | INDEX FAST FULL SCAN|IDX_TAB2_A | 10000 | 40000 | 8 (0)| 00:00:01 |
------------------------------------------------------------------------------------
select c
from table_1
where a not in (select a from table_2)
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- -------------------- ----------
Parse 1 0.00 0.00 0 1 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.00 0.00 0 31 0 100
------- ------ -------- ---------- ---------- -------------------- ----------
total 3 0.00 0.00 0 32 0 100
结果表明,主表只有少量数据(500),但是副表有较多的数据(1000),not exists还是表现较好的性能。
小节小结,
本小节讨论了oracle的反关连的一些知识。
Oracle常用not in, not exists来实现反关连,本小节还介绍了一项用外关连来实现反关连的一种写法。
Oracle提供了反关连的优化算法,(是否真正地使用了反关连,可以通过执行计划观察是否用到anti-join)。避开了其它一些数据库在实现反关连查询时无法避免的低效率的问题。Anti-join是oracle的CBO中的一个连接方式,因此,要对反关连中涉及到的对象进行分析。
Oracle的反关连是oracle发展比较快的一项技术,虽然在oracle的新特性中并没有对此进行说明,通过测试,我们感觉到9i到11g,not in,notexists的效率也不尽相同,使用not in的时候,注意反关连的副集合的连接列应该有not null的约束(在反关连列中如果存在null值,总是返回0条记录,这一般不是我们所需要的业务逻辑)。在9i版本中,采用all_rows的优化模式+反关连列索引化,且有not null的约束,这时候not in表现出较好的效率; 而在11g版本中,采用all_rows优化模式的not exists是最优性能的反连接查询表现出较高的效率。
如果副集合的连接列的数据不是来自于表而是来自于临时的集合,在用in、 exists的时候,需要增加提示/*+cardinality(t,n)*/(在9i以前的版本中最好增加rownum>0),而在使用反关连not in,not exists的时候,则不需要这些技巧。
当副集合的连接列的数据不是来自于表,而是来自于多个表的集合,直接采用not in 与not exists,并不能用到oracle的反关连技术,在11g版本中,not exists比not in表现出较高的效率,本节给出一个根据集合的运算公式,改写成多个当副集合的连接列的数据来自于表的反关连,最后在对这些返回集合进行集合运算,这样又能采用到oracle的反关连的优化算法。
变化主数据集与副数据集的比例,看到notin与not esists对这个数据特征变化不是象in 与 exists那样敏感。在本节中,没有例举更多的数据特征的例子,在具体版本,具体数据特征的情况下,最好在测试后,再决定是采用not in 还是 not exists。
实际上in not in exists noexists的执行计划受诸多如主副数据特征、一些系统参数 如sort_area,db_block_buffer,hash_zrea_size、还有数据库的版本等因素的影响,在具体环境里,到底采用哪一种方式,建议在多对比测试后,在选择合适的写法。
start with:
create table t1 as select * fromall_objects where rownum <= 5000;
create table t2 as select * fromall_objects where rownum <= 4950;
create index t2_idx on t2(object_id);
Index created.
select count(*)
from t1 rbo
where object_id not in ( select object_id fromt2 )
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- -------------------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 8.63 8.53 129066 197295 0 1
------- ------ -------- ---------- ---------- -------------------- ----------
total 4 8.64 8.53 129066 197295 0 1
select count(*)
from t1 rbo
where NOT EXISTS (select null from t2 wheret2.object_id = rbo.object_id )
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- -------------------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.08 0.13 83 10075 0 1
------- ------ -------- ---------- ---------- -------------------- ----------
total 4 0.09 0.13 83 10075 0 1
select count(*)
from t1, t2 rbo
where t1.object_id = rbo.object_id(+)
and rbo.object_id IS NULL
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- -------------------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.08 0.11 72 5087 0 1
------- ------ -------- ---------- ---------- -------------------- ----------
total 4 0.09 0.11 72 5087 0 1
so, it looks "good"right? better than the alternatives inRBO, yes. but, analyze and
select count(*)
from t1 cbo
where object_id not in ( select object_id fromt2 )
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- -------------------- ----------
Parse 1 0.00 0.04 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.06 0.08 0 91 0 1
------- ------ -------- ---------- ---------- -------------------- ----------
total 4 0.06 0.12 0 91 0 1
select count(*)
from t1, t2 cbo
where t1.object_id = cbo.object_id(+)
and cbo.object_id IS NULL
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- -------------------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.08 0.10 0 91 0 1
------- ------ -------- ---------- ---------- -------------------- ----------
total 4 0.08 0.11 0 91 0 1
but I find the not in to be infinitelyeasier to code and alot more "informational" (clear,easy
to understand)
not in(select a from t1 union allselect b a from t2);
(a,b) not in (select m,n from t2);
/*+ HASH_AJ */
/*+ MERGE_AJ */
the "hint" seems to be notdocumented, but the anti join lives on
create table t1 ( x int not null, y int );
Table created.
create table t2 ( x int not null, y int );
Table created.
exec dbms_stats.set_table_stats( user, 'T1',numrows=>1000000, numblks=> 10000 );
PL/SQLprocedure successfully completed.
exec dbms_stats.set_table_stats( user, 'T2',numrows=>1000000, numblks=> 10000 );
PL/SQLprocedure successfully completed.
@plan "select * from t1 where x not in(select x from t2 )";
delete from plan_table;
4 rows deleted.
explain plan for &1;
old 1: explain plan for &1
new 1: explain plan for select * from t1 where x not in (select x from t2 )
Explained.
select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------------
Plan hashvalue: 629543484
--------------------------------------------------------------------------
| Id | Operation | Name |Rows | Bytes |TempSpc| Cost (%CPU)|
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 39 | | 8525 (2)|
|* 1 | HASH JOIN RIGHT ANTI| | 1 | 39 | 23M| 8525 (2)|
| 2 | TABLE ACCESS FULL | T2 | 1000K| 12M| | 2742 (2)|
| 3 | TABLE ACCESS FULL | T1 | 1000K| 24M| | 2742 (2)|
--------------------------------------------------------------------------
PredicateInformation (identified by operation id):
---------------------------------------------------
1 - access("X"="X")
15 rows selected.
原创文章,如果转载,请标注作者:田文 CSDN地址:http://blog.csdn.net/tiwen818