nested loops join(嵌套循环)
驱动表返回几条结果集,被驱动表访问多少次,有驱动顺序,无须排序,无任何限制。
驱动表限制条件有索引,被驱动表连接条件有索引。
hints:use_nl()
merge sort join(排序合并)
驱动表和被驱动表都是最多访问1次,无驱动顺序,需要排序(SORT_AREA_SIZE),连接条件是<>或like导致无法使用。
在连接条件上建立索引可以消除一张表的排序。
hints:use_merge()
hash join(哈希连接)
驱动表和被驱动表都是最多访问1次,有驱动顺序,无须排序(HASH_AREA_SIZE但是会消耗内存用于建HASH表),连接条件是<> > < 或like导致无法使用。
索引列在表连接中无特殊要求,与单表情况相同。
hints:use_hash()
实验验证:
首先,准备两张表t1,t2,分别初始化随机插入100条和100,000条数据:
drop table t1 cascade constraints purge; drop table t2 cascade constraints purge; create table t1( id number not null, n number, contents varchar2(4000) ); create table t2( id number not null, t1_id number not null, n number, contents varchar2(4000) ); execute dbms_random.seed(0); insert into t1 select rownum, rownum, dbms_random.string('a',50) from dual connect by level <= 100 order by dbms_random.random; commit; insert into t2 select rownum, rownum, rownum, dbms_random.string('b',50) from dual connect by level <= 100000 order by dbms_random.random; commit; select count(1) from t1; select count(1) from t2;
set linesize 1000 alter session set statistics_level = all;
select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
select /*+ leading(t1)use_nl(t2) */ * from t1, t2 where t1.id = t2.t1_id; select /*+ leading(t1)use_nl(t2) */ * from t1, t2 where t1.id = t2.t1_id and t1.n in(17,19); select /*+ leading(t1)use_nl(t2) */ * from t1, t2 where t1.id = t2.t1_id and t1.n = 19; select /*+ leading(t1)use_nl(t2) */ * from t1, t2 where t1.id = t2.t1_id and t1.n = 99999999;
select /*+ leading(t1)use_hash(t2) */ * from t1, t2 where t1.id = t2.t1_id; select /*+ leading(t1)use_hash(t2) */ * from t1, t2 where t1.id = t2.t1_id and t1.n in(17,19); select /*+ leading(t1)use_hash(t2) */ * from t1, t2 where t1.id = t2.t1_id and t1.n = 19; select /*+ leading(t1)use_hash(t2) */ * from t1, t2 where t1.id = t2.t1_id and t1.n = 99999999; select /*+ leading(t1)use_hash(t2) */ * from t1, t2 where t1.id = t2.t1_id and 1 = 2;
select /*+ ordered use_merge(t2) */ * from t1, t2 where t1.id = t2.t1_id;
select /*+ leading(t1)use_nl(t2) */ * from t1, t2 where t1.id = t2.t1_id and t1.n = 19; select /*+ leading(t2)use_nl(t1) */ * from t1, t2 where t1.id = t2.t1_id and t1.n = 19;
select /*+ leading(t1)use_hash(t2) */ * from t1, t2 where t1.id = t2.t1_id and t1.n = 19; select /*+ leading(t2)use_hash(t1) */ * from t1, t2 where t1.id = t2.t1_id and t1.n = 19;
select /*+ leading(t1)use_merge(t2) */ * from t1, t2 where t1.id = t2.t1_id and t1.n = 19; select /*+ leading(t2)use_merge(t1) */ * from t1, t2 where t1.id = t2.t1_id and t1.n = 19;
说到不同表连接表的驱动顺序,网上也有一个普遍流行的观点,就是小表作为驱动表。其实通过上面的实验可以发现这样的描述是不准确的。
正确地描述应该是:对于nested loops join和hash join来说,小的结果集先访问,大的结果集后访问(即与表的大小没有关系,与具体sql返回的结果集大小有关);而对于merge sort join 来说,先访问谁效率都是一样的。
嵌套循环,不排序;
hash连接,消耗内存建立hash表;
排序合并,需要排序。
开发人员需要注意不要取多余的字段参与排序:
select /*+ leading(t2)use_merge(t1) */ * from t1, t2 where t1.id = t2.t1_id and t1.n = 19; select /*+ leading(t2)use_merge(t1) */ t1.id from t1, t2 where t1.id = t2.t1_id and t1.n = 19;
select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
select /*+ leading(t1)use_hash(t2) */ * from t1, t2 where t1.id = t2.t1_id and t1.n = 19; select /*+ leading(t1)use_hash(t2) */ * from t1, t2 where t1.id <> t2.t1_id and t1.n = 19; select /*+ leading(t1)use_hash(t2) */ * from t1, t2 where t1.id > t2.t1_id and t1.n = 19; select /*+ leading(t1)use_hash(t2) */ * from t1, t2 where t1.id < t2.t1_id and t1.n = 19; select /*+ leading(t1)use_hash(t2) */ * from t1, t2 where t1.id like t2.t1_id and t1.n = 19;
select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
select /*+ leading(t1)use_merge(t2) */ * from t1, t2 where t1.id = t2.t1_id and t1.n = 19; select /*+ leading(t1)use_merge(t2) */ * from t1, t2 where t1.id <> t2.t1_id and t1.n = 19; select /*+ leading(t1)use_merge(t2) */ * from t1, t2 where t1.id > t2.t1_id and t1.n = 19; select /*+ leading(t1)use_merge(t2) */ * from t1, t2 where t1.id < t2.t1_id and t1.n = 19; select /*+ leading(t1)use_merge(t2) */ * from t1, t2 where t1.id like t2.t1_id and t1.n = 19;
select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
从上面实验结果来看,不能走hash和merge的表连接条件,都会走nested loops join。
驱动表的限制条件建立索引,被驱动表的连接条件建立索引。
create index idx_t1_n on t1(n); create index idx_t2_t1_id on t2(t1_id); select /*+ leading(t1)use_nl(t2) */ * from t1, t2 where t1.id = t2.t1_id and t1.n = 19; select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
select /*+ leading(t1)use_hash(t2) */ * from t1, t2 where t1.id = t2.t1_id and t1.n = 19; select * from t1, t2 where t1.id = t2.t1_id and t1.n = 19;
一般查询没有合适索引,Oracle都会选择用hash join的表连接。
create index idx_t1_id on t1(id);
select /*+ ordered use_merge(t2) */ * from t1, t2 where t1.id = t2.t1_id and t1.n = 19;
Oracle 10g版本,在连接条件建立索引可以消除merge sort join表连接的一张表的排序操作。(虽然在两张表的连接条件都建立了索引,却只能消除一张表的排序操作)
注:本文为《收获,不止Oracle》表连接一章的总结笔记。