The optimizer usesnested loop joins when joining small number of rows, with a good drivingcondition between the two tables. You drive from the outer loop to the innerloop, so the order of tables in the execution plan is important.
The outer loop isthe driving row source. It produces a set of rows for driving the joincondition. The row source can be a table accessed using an index scan or a fulltable scan. Also, the rows can be produced from any other operation. Forexample, the output from a nested loop join can be used as a row source for anothernested loop join.
The inner loop isiterated for every row returned from the outer loop, ideally by an index scan.If the access path for the inner loop is not dependent on the outer loop, thenyou can end up with a Cartesian product; for every iteration of the outer loop,the inner loop produces the same set of rows. Therefore, you should use other joinmethods when two independent row sources are joined together.
The following noworkload system statistics will be used:
SELECT
PNAME,
PVAL1
FROM
SYS.AUX_STATS$
ORDER BY
PNAME;
PNAME PVAL1
--------------- ----------
CPUSPEED
CPUSPEEDNW 2116.57559
DSTART
DSTOP
FLAGS 1
IOSEEKTIM 10
IOTFRSPEED 4096
MAXTHR
MBRC
MREADTIM
SLAVETHR
SREADTIM
STATUS
First the tables. Wewill start simple, with one table(T3) having 100 rows and another table (T4) having 10 rows:
CREATE TABLE T3 (
C1 NUMBER,
C2 NUMBER,
C3 NUMBER,
C4 VARCHAR2(20),
PADDING VARCHAR2(200));
CREATE TABLE T4 (
C1 NUMBER,
C2 NUMBER,
C3 NUMBER,
C4 VARCHAR2(20),
PADDING VARCHAR2(200));
INSERT INTO
T3
SELECT
ROWNUM C1,
1000000-ROWNUM C2,
MOD(ROWNUM-1,1000) C3,
TO_CHAR(SYSDATE+MOD(ROWNUM-1,10000),'DAY') C4,
LPAD(' ',200,'A') PADDING
FROM
DUAL
CONNECT BY
LEVEL<=100;
INSERT INTO
T4
SELECT
ROWNUM C1,
1000000-ROWNUM C2,
MOD(ROWNUM-1,1000) C3,
TO_CHAR(SYSDATE+MOD(ROWNUM-1,10000),'DAY') C4,
LPAD(' ',200,'A') PADDING
FROM
DUAL
CONNECT BY
LEVEL<=10;
COMMIT;
CREATE INDEX IND_T3_C1 ON T3(C1);
CREATE INDEX IND_T3_C2 ON T3(C2);
CREATE INDEX IND_T3_C3 ON T3(C3);
CREATE INDEX IND_T4_C1 ON T4(C1);
CREATE INDEX IND_T4_C2 ON T4(C2);
CREATE INDEX IND_T4_C3 ON T4(C3);
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T3',CASCADE=>TRUE,ESTIMATE_PERCENT=>NULL)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T4',CASCADE=>TRUE,ESTIMATE_PERCENT=>NULL)
We are able to easily produce an example where the “smallest”table is selected as the driving table (note that I had to add a hinttospecifya nested loop joinin several of these examples):
SET AUTOTRACE TRACEONLY EXPLAIN
SELECT /*+ USE_NL(T3 T4) */
T3.C1,
T3.C2,
T3.C3,
T3.C4,
T4.C1,
T4.C2,
T4.C3,
T4.C4
FROM
T3,
T4
WHERE
T3.C1=T4.C1;
Execution Plan
----------------------------------------------------------
Plan hash value: 567778651
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 420 | 13 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 10 | 420 | 13 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | T4 | 10 | 210 | 3 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | IND_T3_C1 | 1 | | 0 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| T3 | 1 | 21 | 1 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("T3"."C1"="T4"."C1")
If we stop at that point, we could declare quite simply that theoptimizer selects the smaller table as the driving table. But wait aminute, take a look at this example where the optimizer selected the largesttable as the driving table:
SELECT /*+ USE_NL(T3 T4) */
T3.C1,
T3.C2,
T3.C3,
T3.C4,
T4.C1,
T4.C2,
T4.C3,
T4.C4
FROM
T3,
T4
WHERE
T3.C1=T4.C1
AND T3.C2=1;
Execution Plan
----------------------------------------------------------
Plan hash value: 4214127300
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 42 | 3 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 1 | 42 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| T3 | 1 | 21 | 2 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | IND_T3_C2 | 1 | | 1 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | IND_T4_C1 | 1 | | 0 (0)| 00:00:01 |
| 6 | TABLE ACCESS BY INDEX ROWID | T4 | 1 | 21 | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("T3"."C2"=1)
5 - access("T3"."C1"="T4"."C1")
The above execution plans were generated on 11.2.0.2, whichsometimes differs a bit from older Oracle Database release versions when nestedloops joins are used (note the two nested loops joins),however we areable to hint the optimizer to generate the older style nested loops join:
SELECT /*+ USE_NL(T3 T4) OPTIMIZER_FEATURES_ENABLE('10.2.0.4') */
T3.C1,
T3.C2,
T3.C3,
T3.C4,
T4.C1,
T4.C2,
T4.C3,
T4.C4
FROM
T3,
T4
WHERE
T3.C1=T4.C1;
Execution Plan
----------------------------------------------------------
Plan hash value: 2465588182
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 420 | 13 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T3 | 1 | 21 | 1 (0)| 00:00:01 |
| 2 | NESTED LOOPS | | 10 | 420 | 13 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | T4 | 10 | 210 | 3 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | IND_T3_C1 | 1 | | 0 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("T3"."C1"="T4"."C1")
SELECT /*+ USE_NL(T3 T4) OPTIMIZER_FEATURES_ENABLE('10.2.0.4') */
T3.C1,
T3.C2,
T3.C3,
T3.C4,
T4.C1,
T4.C2,
T4.C3,
T4.C4
FROM
T3,
T4
WHERE
T3.C1=T4.C1
AND T3.C2=1;
Execution Plan
----------------------------------------------------------
Plan hash value: 3446668716
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 42 | 3 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID | T4 | 1 | 21 | 1 (0)| 00:00:01 |
| 2 | NESTED LOOPS | | 1 | 42 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| T3 | 1 | 21 | 2 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | IND_T3_C2 | 1 | | 1 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | IND_T4_C1 | 1 | | 0 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("T3"."C2"=1)
5 - access("T3"."C1"="T4"."C1")
We found one case where the larger table was selected as thedriving table, so the books and articles that simply state absolutely that thesmallest tablewill bethe driving table are not completelycorrect. Maybe the larger table is only selected as the driving tablewhen both tables are small? Let’s test that theory by creating a coupleof more tables:
SET AUTOTRACE OFF
CREATE TABLE T1 (
C1 NUMBER,
C2 NUMBER,
C3 NUMBER,
C4 VARCHAR2(20),
PADDING VARCHAR2(200));
CREATE TABLE T2 (
C1 NUMBER,
C2 NUMBER,
C3 NUMBER,
C4 VARCHAR2(20),
PADDING VARCHAR2(200));
INSERT INTO
T1
SELECT
ROWNUM C1,
1000000-ROWNUM C2,
MOD(ROWNUM-1,1000) C3,
TO_CHAR(SYSDATE+MOD(ROWNUM-1,10000),'DAY') C4,
LPAD(' ',200,'A') PADDING
FROM
DUAL
CONNECT BY
LEVEL<=1000000;
INSERT INTO
T2
SELECT
ROWNUM C1,
1000000-ROWNUM C2,
MOD(ROWNUM-1,1000) C3,
TO_CHAR(SYSDATE+MOD(ROWNUM-1,10000),'DAY') C4,
LPAD(' ',200,'A') PADDING
FROM
DUAL
CONNECT BY
LEVEL<=100000;
COMMIT;
CREATE INDEX IND_T1_C1 ON T1(C1);
CREATE INDEX IND_T1_C2 ON T1(C2);
CREATE INDEX IND_T1_C3 ON T1(C3);
CREATE INDEX IND_T2_C1 ON T2(C1);
CREATE INDEX IND_T2_C2 ON T2(C2);
CREATE INDEX IND_T2_C3 ON T2(C3);
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE,ESTIMATE_PERCENT=>NULL)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE,ESTIMATE_PERCENT=>NULL)
The above script created table T1 with 1,000,000 rows and table T2with 100,000 rows. We will now use queries that are similar to those thatwere used with the 100 and 10 row tables.
The smaller table (T2) as the driving table:
SET AUTOTRACE TRACEONLY EXPLAIN
SELECT /*+ USE_NL(T1 T2) */
T1.C1,
T1.C2,
T1.C3,
T1.C4,
T2.C1,
T2.C2,
T2.C3,
T2.C4
FROM
T1,
T2
WHERE
T1.C1=T2.C1;
Execution Plan
----------------------------------------------------------
Plan hash value: 2610346857
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100K| 4687K| 300K (1)| 01:00:12 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 100K| 4687K| 300K (1)| 01:00:12 |
| 3 | TABLE ACCESS FULL | T2 | 100K| 2343K| 889 (1)| 00:00:11 |
|* 4 | INDEX RANGE SCAN | IND_T1_C1 | 1 | | 2 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 24 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("T1"."C1"="T2"."C1")
The larger table as the driving table:
SELECT /*+ USE_NL(T1 T2) */
T1.C1,
T1.C2,
T1.C3,
T1.C4,
T2.C1,
T2.C2,
T2.C3,
T2.C4
FROM
T1,
T2
WHERE
T1.C1=T2.C1
AND T1.C2 BETWEEN 1 AND 10000;
Execution Plan
----------------------------------------------------------
Plan hash value: 2331401024
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10001 | 468K| 11353 (1)| 00:02:17 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 10001 | 468K| 11353 (1)| 00:02:17 |
| 3 | TABLE ACCESS BY INDEX ROWID| T1 | 10001 | 234K| 348 (0)| 00:00:05 |
|* 4 | INDEX RANGE SCAN | IND_T1_C2 | 10001 | | 25 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | IND_T2_C1 | 1 | | 1 (0)| 00:00:01 |
| 6 | TABLE ACCESS BY INDEX ROWID | T2 | 1 | 24 | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("T1"."C2">=1 AND "T1"."C2"<=10000)
5 - access("T1"."C1"="T2"."C1")
So, what is happening? Is it simply the case that it is theexpected number of rows that will be returned from each table that determineswhich table will be the driving table? Let’s test:
SET AUTOTRACE OFF
SELECT
COUNT(*)
FROM
T1
WHERE
T1.C1 BETWEEN 890000 AND 1000000;
COUNT(*)
----------
110001
SELECT
COUNT(*)
FROM
T2
WHERE
T2.C2 BETWEEN 900000 AND 1000000;
COUNT(*)
----------
100000
The above shows that if we specifyT1.C1 BETWEEN890000 AND 1000000in the WHERE clause there will be 110,001 rows from the largertable that match the criteria. If we specifyT2.C2 BETWEEN 900000 AND 1000000in the WHERE clause there will be 100,000 rows from the smallertable that match the criteria. If we execute the following query, which tablewill be the driving table, the 10 times larger T1 table where we are retrieving110,001 rows or the smaller T2 table where we are retrieving 100,000 rows?
SET AUTOTRACE TRACEONLY EXPLAIN
SELECT /*+ USE_NL(T1 T2) */
T1.C1,
T1.C2,
T1.C3,
T1.C4,
T2.C1,
T2.C2,
T2.C3,
T2.C4
FROM
T1,
T2
WHERE
T1.C3=T2.C3
AND T1.C1 BETWEEN 890000 AND 1000000
AND T2.C2 BETWEEN 900000 AND 1000000;
This is the result that I received, which seems to demonstratethat it is not just the size of the tables, nor is it the number of expectedrows to be returned from the tables, that determines which table will be thedriving table:
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 11M| 503M| 11M (1)| 37:03:27 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 11M| 503M| 11M (1)| 37:03:27 |
| 3 | TABLE ACCESS BY INDEX ROWID| T1 | 110K| 2578K| 3799 (1)| 00:00:46 |
|* 4 | INDEX RANGE SCAN | IND_T1_C1 | 110K| | 248 (1)| 00:00:03 |
|* 5 | INDEX RANGE SCAN | IND_T2_C3 | 100 | | 1 (0)| 00:00:01 |
|* 6 | TABLE ACCESS BY INDEX ROWID | T2 | 100 | 2400 | 101 (0)| 00:00:02 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("T1"."C1">=890000 AND "T1"."C1"<=1000000)
5 - access("T1"."C3"="T2"."C3")
6 - filter("T2"."C2">=900000 AND "T2"."C2"<=1000000)
110,001 rows from T1 is still somewhat close in number to the100,000 rows from T2, so let’s try an experiment selecting 992,701 rows fromT1:
SELECT /*+ USE_NL(T1 T2) */
T1.C1,
T1.C2,
T1.C3,
T1.C4,
T2.C1,
T2.C2,
T2.C3,
T2.C4
FROM
T1,
T2
WHERE
T1.C3=T2.C3
AND T1.C1 BETWEEN 7300 AND 1000000
AND T2.C2 BETWEEN 900000 AND 1000000;
Execution Plan
----------------------------------------------------------
Plan hash value: 3718770616
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 99M| 4544M| 100M (1)|334:20:23 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 99M| 4544M| 100M (1)|334:20:23 |
|* 3 | TABLE ACCESS FULL | T1 | 992K| 22M| 8835 (1)| 00:01:47 |
|* 4 | INDEX RANGE SCAN | IND_T2_C3 | 100 | | 1 (0)| 00:00:01 |
|* 5 | TABLE ACCESS BY INDEX ROWID| T2 | 100 | 2400 | 101 (0)| 00:00:02 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("T1"."C1">=7300 AND "T1"."C1"<=1000000)
4 - access("T1"."C3"="T2"."C3")
5 - filter("T2"."C2">=900000 AND "T2"."C2"<=1000000)
As shown above, table T1 is still the driving table in the nestedloops join. Let’s test retrieving 993,001 rows from T1:
SELECT /*+ USE_NL(T1 T2) */
T1.C1,
T1.C2,
T1.C3,
T1.C4,
T2.C1,
T2.C2,
T2.C3,
T2.C4
FROM
T1,
T2
WHERE
T1.C3=T2.C3
AND T1.C1 BETWEEN 7000 AND 1000000
AND T2.C2 BETWEEN 900000 AND 1000000;
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 99M| 4545M| 100M (1)|334:26:13 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 99M| 4545M| 100M (1)|334:26:13 |
|* 3 | TABLE ACCESS FULL | T2 | 100K| 2343K| 889 (1)| 00:00:11 |
|* 4 | INDEX RANGE SCAN | IND_T1_C3 | 1000 | | 3 (0)| 00:00:01 |
|* 5 | TABLE ACCESS BY INDEX ROWID| T1 | 993 | 23832 | 1003 (0)| 00:00:13 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("T2"."C2">=900000 AND "T2"."C2"<=1000000)
4 - access("T1"."C3"="T2"."C3")
5 - filter("T1"."C1">=7000 AND "T1"."C1"<=1000000)
As shown above, table T2 is now the driving table for the nestedloops join. So, there must be other factors beyond table (or betterworded row source) size and the number of rows that will be retrieved from thetables. You might be wondering if the CLUSTERING_FACTOR of the indexesalso plays a role in determining which table is the driving table:
SET AUTOTRACE OFF
SELECT
TABLE_NAME,
INDEX_NAME,
CLUSTERING_FACTOR,
NUM_ROWS
FROM
USER_INDEXES
WHERE
TABLE_NAME IN ('T1','T2')
ORDER BY
TABLE_NAME,
INDEX_NAME;
TABLE_NAME INDEX_NAME CLUSTERING_FACTOR NUM_ROWS
---------- ---------- ----------------- ----------
T1 IND_T1_C1 32259 1000000
T1 IND_T1_C2 32259 1000000
T1 IND_T1_C3 1000000 1000000
T2 IND_T2_C1 3226 100000
T2 IND_T2_C2 3226 100000
T2 IND_T2_C3 100000 100000
I suggested (without checking) in the OTN thread that theCLUSTERING_FACTOR of the index on columns C2 would be higher than theCLUSTERING_FACTOR of the index on columns C1 because of the reverse (descending)order in which the C2 column values were inserted into the tables.Surprisingly (at least to me), the optimizer set the CLUSTERING_FACTOR of theC1 and C2 columns to be the same values, and set the CLUSTERING_FACTOR ofcolumn C3 to be the same as the number of rows in the table. Maybe one ofthe readers of this blog article can explain what happened to theCLUSTERING_FACTOR.
So, the answer to the OP’s question is not as simple as “theSmaller Table is the Driving Table” or”the Larger Table is the DrivingTable”, but that there are other factors involved. I think that it mightbe time for a third read through of the book “Cost-Based OracleFundamentals”. In the mean time, anyone care to share more insight (yeswe could look inside a 10053 trace file, but there must be at least one othertip that can be provided without referencing such a trace file).