Oracle学习笔记(二)——嵌套循环连接

连接方式
Oracle的表连接方式有以下三种:
  • 嵌套循环链接(Nested Loops Join)
  • 哈希链接(Hash Join)
  • 排序合并(Merge Sort Join)
连接涉及到两个表A和B,通俗的讲嵌套循环链接相当于遍历A中的每一条记录(满足A表条件),然后再在B表中遍历记录直至找到匹配的记录,等同于两层循环。而哈希链接和排序合并,可以看作是先各自处理自身的记录(排序或hash),处理完成后再批量的进行匹配。哈希链接和排序合并链接适合处理返回大量数据的操作,大吞吐量的查询对于嵌套循环来说效率不会太高。

准备工作是建立两个表test_join_t1和test_join_t2,两个表分别有记录100条和10000条。

create table test_join_t1 as select rownum as col1, rownum + 1 as col2, rownum + 2 as col3 from dual connect by level <= 100;
create table test_join_t2 as select rownum as col1, rownum + 1 as col2, rownum + 2 as col3 from dual connect by level <= 10000;
嵌套循环连接
嵌套循环连接,在一般的应用中占有绝大多数的比例。首先执行最基本的连接查询语句
select
    *
from   
    test_join_t1 t1, test_join_t2 t2
where
t1.col1 = t2.col1;

执行计划
----------------------------------------------------------
Plan hash value: 1262213046
-----------------------------------------------------------------------------------
| Id  | Operation          | Name         | Rows  | Bytes | Cost (%CPU)| Time  |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |              |   100 |  7800 |    11  (10)| 00:00:01 |
|*  1 |  HASH JOIN         |              |   100 |  7800 |    11  (10)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| TEST_JOIN_T1 |   100 |  3900 |     2   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| TEST_JOIN_T2 | 10000 |   380K|     8   (0)| 00:00:01 |
-----------------------------------------------------------------------------------
从执行计划中可以看出,Oracle优化器选择了Hash Join。并没有走嵌套循环链接,这里我们不深究Oracle如何做出执行计划选择。为了验证循环嵌套链接,给SQL添加HINT /*+ leading(t1) use_nl(t2) */。HINT的含义是test_join_t1表作为启动表使用循环嵌套方式进行表连接。
select
/*+ leading(t1) use_nl(t2) */
    *
from   
    test_join_t1 t1, test_join_t2 t2
where
t1.col1 = t2.col1;

Plan hash value: 3805247585
---------------------------------------------------------------------------------------------
| Id  | Operation          | Name         | Starts | E-Rows | A-Rows |   A-Time   | Buffers |

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |              |      1 |        |    100 |00:00:00.10 |    2717 |
|   1 |  NESTED LOOPS      |              |      1 |    100 |    100 |00:00:00.10 |    2717 |
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
|   2 |   TABLE ACCESS FULL| TEST_JOIN_T1 |      1 |    100 |    100 |00:00:00.01 |      10 |
|*  3 |   TABLE ACCESS FULL| TEST_JOIN_T2 |    100 |      1 |    100 |00:00:00.10 |    2707 |
---------------------------------------------------------------------------------------------
更改HINT,让test_join_t2表作为驱动表
select
/*+ leading(t2) use_nl(t1) */
    *
from
    test_join_t1 t1, test_join_t2 t2
where
t1.col1 = t2.col1;

Plan hash value: 2586695766
---------------------------------------------------------------------------------------------
| Id  | Operation          | Name         | Starts | E-Rows | A-Rows |   A-Time  | Buffers |

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |              |      1 |        |    100 |00:00:00.18 |   30041 |
|   1 |  NESTED LOOPS      |              |      1 |    100 |    100 |00:00:00.18 |   30041 |
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
|   2 |   TABLE ACCESS FULL| TEST_JOIN_T2 |      1 |  10000 |  10000 |00:00:00.01 |      34 |
|*  3 |   TABLE ACCESS FULL| TEST_JOIN_T1 |  10000 |      1 |    100 |00:00:00.17 |   30007 |
---------------------------------------------------------------------------------------------
先介绍一下执行计划中的各个字段含义:
  • Starts:表访问次数
  • E-Rows:Oracle预估返回条数
  • A-Rows:Oracle真实返回记录条数
  • A-Time:真实执行时间
通过对比以上两次执行计划可以得出一下结论:
  • 驱动表只被访问一次
  • 被驱动表访问的次数跟驱动表的记录数一致(应该是驱动表返回记录数一致,后面会再做测试)
接下来让t1表作为驱动表加上where限制条件col2在[10,50)范围内
select
/*+ leading(t1) use_nl(t2) */
    *
from   
    test_join_t1 t1, test_join_t2 t2
where
t1.col1 = t2.col1 and
t1.col2 >= 10 and
t1.col2 < 50;

Plan hash value: 3805247585
---------------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
| Id  | Operation          | Name         | Starts | E-Rows | A-Rows |   A-Time  | Buffers |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |              |      1 |        |     40 |00:00:00.04 |    1089 |
|   1 |  NESTED LOOPS      |              |      1 |     40 |     40 |00:00:00.04 |    1089 |
|*  2 |   TABLE ACCESS FULL| TEST_JOIN_T1 |      1 |     40 |     40 |00:00:00.01 |       6 |
|*  3 |   TABLE ACCESS FULL| TEST_JOIN_T2 |     40 |      1 |     40 |00:00:00.04 |    1083 |
---------------------------------------------------------------------------------------------
从执行计划可以看出,驱动表t1只被访问1次,被驱动表t2被访问40次,与驱动表真实返回的记录数一致。修改上面的sql,驱动表仍然为t1,对被驱动表t2添加限制条件col2的范围为[10,50)
select
/*+ leading(t1) use_nl(t2) */
    *
from
    test_join_t1 t1, test_join_t2 t2
where
t1.col1 = t2.col1 and
t2.col2 >= 10 and
t2.col2 < 50;

Plan hash value: 3805247585
---------------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
| Id  | Operation          | Name         | Starts | E-Rows | A-Rows |   A-Time  | Buffers |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |              |      1 |        |     40 |00:00:00.08 |    2709 |
|   1 |  NESTED LOOPS      |              |      1 |     40 |     40 |00:00:00.08 |    2709 |
|   2 |   TABLE ACCESS FULL| TEST_JOIN_T1 |      1 |    100 |    100 |00:00:00.01 |       6 |
|*  3 |   TABLE ACCESS FULL| TEST_JOIN_T2 |    100 |      1 |     40 |00:00:00.08 |    2703 |
---------------------------------------------------------------------------------------------
从执行计划可以看出,驱动表t1访问次数仍然是1,被驱动表访问次数为100,而非限制条件所限定的40条。因此,可以得出嵌套循环的特点:
  • 驱动表只被访问1次
  • 被驱动表访问的次数为驱动表的“实际返回记录数”,被驱动表添加查询条件是不会影响被访问次数的。

接下来在t2表的col2上建立索引,重新执行上面的sql看执行计划如何
create  index index_t2_col2 on test_join_t2(col2);

select
/*+ leading(t1) use_nl(t2) */
    *
from
    test_join_t1 t1, test_join_t2 t2
where
t1.col1 = t2.col1 and
t2.col2 >= 10 and
t2.col2 < 50;

Plan hash value: 3428653669
-----------------------------------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
| Id  | Operation                    | Name          | Starts | E-Rows | A-Rows|   A-Time   | Buffers | Reads  |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |               |      1 |        |     40|00:00:00.01 |      24 |      2| 
|   1 |  NESTED LOOPS                |               |      1 |        |     40|00:00:00.01 |      24 |      2| 
|   2 |   NESTED LOOPS               |               |      1 |     40 |   4000|00:00:00.01 |      20 |      2| 
|   3 |    TABLE ACCESS FULL         | TEST_JOIN_T1  |      1 |    100 |    100|00:00:00.01 |       6 |      0|
|*  4 |    INDEX RANGE SCAN          | INDEX_T2_COL2 |    100 |     45 |   4000|00:00:00.01 |      14 |      2|
|*  5 |   TABLE ACCESS BY INDEX ROWID| TEST_JOIN_T2  |   4000 |      1 |     40|00:00:00.01 |       4 |      0|
----------------------------------------------------------------------------------------------------------------
可以看出,t2表确实走了索引采用INDEX RANGE SCAN进行范围扫描。驱动表t1仍然只被访问1次,被驱动表范围扫描执行了100次,从执行计划可以看出索引被访问了100次,每次扫描完索引后又根据rowid回表读数,这样t2表就被访问了4000次。大大多于了未建索引时的100次。当然这里是因为我们使用了HINT,要求Oracle将t1作为驱动表。接下来将t2表作为驱动表看执行计划如何

select
/*+ leading(t2) use_nl(t1) */
    *
from
    test_join_t1 t1, test_join_t2 t2
where
t1.col1 = t2.col1 and
t2.col2 >= 10 and
t2.col2 < 50;

Plan hash value: 2941912635
--------------------------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
| Id  | Operation                    | Name          | Starts | E-Rows | A-Rows|   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |               |      1 |        |     40|00:00:00.01 |     132 |
|   1 |  NESTED LOOPS                |               |      1 |     40 |     40|00:00:00.01 |     132 |
|   2 |   TABLE ACCESS BY INDEX ROWID| TEST_JOIN_T2  |      1 |     40 |     40|00:00:00.01 |       9 |
|*  3 |    INDEX RANGE SCAN          | INDEX_T2_COL2 |      1 |     45 |     40|00:00:00.01 |       5 |
|*  4 |   TABLE ACCESS FULL          | TEST_JOIN_T1  |     40 |      1 |     40|00:00:00.01 |     123 |
--------------------------------------------------------------------------------------------------------
总结

对于驱动表t2先走利用了索引获取到最终的记录,然后再进入循环访问被驱动表t1。经过上面的测试,可以看出嵌套循环就是由两层for循环操作构成:
  1. 查询驱动表记录获取满足查询条件的所有记录
  2. 遍历第一步得到的驱动表记录,带入链接条件值获取满足链接条件的所有被驱动表记录
  3. 遍历第二步中得到的被驱动表记录,判断是否满足查询条件,如果是则返回结果,如果不是则跳过记录
到此可以总结一下嵌套循环的主要特点以及适用场景:
  • 嵌套循环有驱动表和被驱动表的概念,驱动顺序不同执行计划差异非常大
  • 驱动表只被访问一次,被驱动表被访问多次。嵌套循环访问表的次数直接受驱动表的返回记录数的影响。因此应当让实际返回记录数(A-Rows)小的表作为驱动表,返回记录数大的表作为被驱动表。
  • 在驱动表的查询条件上建立索引可以改善查询效率
  • 在连接条件上建立索引也可以改善查询效率
  • 在被驱动表查询条件建立索引对查询效率的影响要视情况而定,不一定带来好处




你可能感兴趣的:(数据库)