A common sight in databases is a table that contains the start and end values of a range - usually dates. One or both of the dates are typically part of the primary key; sometimes they are the entire key. Some examples:
START_DATE
) and the date it was superceded (END_DATE
). The primary key of a history table is typically a non-unique identifier combined with the START_DATE
, with an unenforced constraint that the start and end dates of rows for any given identifier are both contiguous and non-overlapping.
Range tables are tables where the start and end values of the range form the entire primary key. For example, a table to determine the delivery method depending on volume ordered:
LOWER_BOUND UPPER_BOUND DELIVERY_METHOD ----------- ----------- -------------------- 0 12 Mail 13 144 Courier 145 10000 Curtain-side truck 10001 50000 Semi-trailer 50001 99999999 Ship
Like history tables, range tables typically have an unenforced constraint that the lower and upper bounds of successive ranges are both contiguous and non-overlapping.
A common feature of History and Range Tables is the way they are queried:
History Table | Range Table |
---|---|
SELECT * FROM cust_hist WHERE cust_num = :a AND :b BETWEEN cust_eff_date AND cust_exp_date |
SELECT * FROM delivery_method WHERE :b BETWEEN lower_bound AND upper_bound |
A BETWEEN
clause is internally expanded out into separate <
/ >
clauses. For example:
SELECT * FROM delivery_method WHERE lower_bound <= :b AND upper_bound >= :b
The problem is that an index can only scan on one column with a range predicate (<
, >
, LIKE
, BETWEEN
). So even if an index contained both the lower_bound
and upper_bound
columns, the index scan will return all of the rows matching lower_bound <= :b
, and then filter the rows that do not match upper_bound >= :b
.
In the case where the sought value is somewhere in the middle, the range scan will return half of the rows in the table in order to find a single row. In the worst case where the most commonly sought rows are at the top (highest values), the index scan will process almost every row in the table for every lookup.
This type of behaviour can usually be detected in Explain Plan's Access Predicates and Filter Predicates. Access Predicates are clauses used to find rows in a structure (such as an index); non-matching rows are not read. Filter Predicates are clauses used to filter rows after they have been found but before they are projected to the next step of the plan. Explain Plan produces confusing results with unbounded range scans (like we are doing here). Note how both clauses appear as both access predicates and filter predicates:
1 SELECT * 2 FROM delivery_method 3* where :b BETWEEN lower_bound AND upper_bound ------------------------------------------------------------------- | Id | Operation | Name | Rows | ------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 | TABLE ACCESS BY INDEX ROWID| DELIVERY_METHOD | 1 | |* 2 | INDEX RANGE SCAN | DELIVERY_METHOD_PK | 1 | ------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("DELIVERY_METHOD"."UPPER_BOUND">=TO_NUMBER(:Z) AND "DELIVERY_METHOD"."LOWER_BOUND"<=TO_NUMBER(:Z)) filter("DELIVERY_METHOD"."LOWER_BOUND"<=TO_NUMBER(:Z) AND "DELIVERY_METHOD"."UPPER_BOUND">=TO_NUMBER(:Z))
Explain Plan appears to be telling us that lower_bound <= :b
is used as both an Access Predicate and a Filter Predicate, where in fact it is only an Access Predicate. Similarly, upper_bound >= :b
appears to be both an Access and a Filter Predicate, where in fact it is only a Filter Predicate. More distressingly, the Performance Tuning manual seems to also claim that we can scan on the second column of an index after a range predicate on the first column:
The optimizer uses a range scan when it finds one or more leading columns of an index specified in conditions, such as the following:
- col1 = :b1
- col1 < :b1
- col1 > :b1
- AND combination of the preceding conditions for leading columns in the index
- col1 like 'ASD%' wild-card searches should not be in a leading position otherwise the condition col1 like '%ASD' does not result in a range scan.
To dispel any doubts, a test is in order to demonstrate that the Access Predicate on the second column in the index is not resulting in reduced IO:
SQL> create table delivery_method ( 2 lower_bound number(*) 3 ,upper_bound number(*) 4 ,delivery_method varchar2(50) 5 ); Table created. SQL> SQL> insert /*+append*/ into delivery_method 2 select rownum 3 , 1 4 , dbms_random.string('A', 25) 5 from dual 6 connect by level <= 100000 7 / 100000 rows created. SQL> SQL> create index delivery_method_pk 2 on delivery_method(lower_bound, upper_bound) 3 / Index created.
SQL> set autotrace on SQL> select /*+index(delivery_method)*/ * 2 from delivery_method 3 where lower_bound >= 0 4 and upper_bound <> 1 5 / no rows selected ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 53 | 2 (50)| | 1 | TABLE ACCESS BY INDEX ROWID| DELIVERY_METHOD | 1 | 53 | 2 (50)| |* 2 | INDEX RANGE SCAN | DELIVERY_METHOD_PK | 1 | | 3 (34)| ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("DELIVERY_METHOD"."LOWER_BOUND">=0) filter("DELIVERY_METHOD"."LOWER_BOUND">=0 AND "DELIVERY_METHOD"."UPPER_BOUND"<>1) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 132 consistent gets 132 physical reads 0 redo size 417 bytes sent via SQL*Net to client 456 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed
SQL> select /*+index(delivery_method)*/ * 2 from delivery_method 3 where lower_bound >= 0 4 and upper_bound > 1 5 / no rows selected ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 53 | 2 (50)| | 1 | TABLE ACCESS BY INDEX ROWID| DELIVERY_METHOD | 1 | 53 | 2 (50)| |* 2 | INDEX RANGE SCAN | DELIVERY_METHOD_PK | 1 | | 3 (34)| ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("DELIVERY_METHOD"."LOWER_BOUND">=0 AND "DELIVERY_METHOD"."UPPER_BOUND">1) filter("DELIVERY_METHOD"."LOWER_BOUND">=0 AND "DELIVERY_METHOD"."UPPER_BOUND">1) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 132 consistent gets 132 physical reads 0 redo size 417 bytes sent via SQL*Net to client 456 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed
upper_bound > 1
, which should filter out EVERY row since every row contains the value '1', it still processed the same amount of IO as the filter. In order to get a level playing field, I performed some additional steps to flush the buffer cache that are not shown here . I also pasted the DBMS_XPLAN
instead of the inferior AUTOTRACE
plan. Your results will be no less compelling, but may differ slighlty if you repeat these steps.The problem is similar but less serious for the History Table. By prefixing the index with the identifier (cust_num
in the example above), Oracle will range-scan the index to return the rows with matching identifier and cust_eff_date
before filtering rows with non-matching cust_exp_date
. Unlike the range table where we could inadvertently scan the entire table, the cost of this scan will never exceed the number of rows that match the non-unique identifier.
Despite some of the tips below that help to avoid scanning non-matching rows for range-based queries, it is somtimes unavoidable to scan every row. Ensure that both the start and the end columns of the range are included in the same index; this allows Oracle to filter out non-matching rows as they are retrieved from the index without having to lookup the table.
Note in the example below that the filter predicate is performed against the TABLE ACCESS
step when the index does not contain both dates.
SQL> create index delivery_method_pk 2 on delivery_method(lower_bound) 3 / Index created. ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 53 | 2 (50)| |* 1 | TABLE ACCESS BY INDEX ROWID| DELIVERY_METHOD | 1 | 53 | 2 (50)| |* 2 | INDEX RANGE SCAN | DELIVERY_METHOD_PK | 1 | | 3 (34)| ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("DELIVERY_METHOD"."UPPER_BOUND">=3000) 2 - access("DELIVERY_METHOD"."LOWER_BOUND"<=3000)
Looking at the TK*Prof output of this query, we can see that the index passed 301 rows back to the TABLE ACCESS
, which filtered out 300 of them to leave a single row.
select /*+index(delivery_method)*/ * from delivery_method where 3000 BETWEEN lower_bound AND upper_bound call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.01 0.00 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 2 0.00 0.00 3 4 0 1 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 4 0.01 0.00 3 4 0 1 Misses in library cache during parse: 1 Optimizer goal: CHOOSE Parsing user id: 33 Rows Row Source Operation ------- --------------------------------------------------- 1 TABLE ACCESS BY INDEX ROWID DELIVERY_METHOD 301 INDEX RANGE SCAN DELIVERY_METHOD_PK (object id 1326005)
By adding both range columns to the index, Oracle still reads the 301 rows, but the filtering is done in the index; only one row is retrieved from the table.
SQL> create index delivery_method_pk 2 on delivery_method(lower_bound, upper_bound) 3 / Index created. select /*+index(delivery_method)*/ * from delivery_method where 3000 BETWEEN lower_bound AND upper_bound call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 2 0.00 0.04 3 4 0 1 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 4 0.00 0.04 3 4 0 1 Misses in library cache during parse: 1 Optimizer goal: CHOOSE Parsing user id: 33 Rows Row Source Operation ------- --------------------------------------------------- 1 TABLE ACCESS BY INDEX ROWID DELIVERY_METHOD 1 INDEX RANGE SCAN DELIVERY_METHOD_PK (object id 1326178)
Note the trap in interpretation here: accoring to TK*Prof, it appears as if a single row was processed by the INDEX RANGE SCAN
. In fact, a single row was returned from the index range scan; it actually processed 301 rows and filtered 300 of them but TK*Prof does not show this. When Oracle performs an Access and a Filter in a single step, it is impossible to tell without further testing how many rows were accessed and how many were filtered.
When ranges are known to be non-overlapping, queries of the style shown above will not return more than a single row. Unfortunately, non-overlapping ranges cannot be enforced by a database constraint or a unique index, so Oracle resolves the query as if it can return more than one row.
One tuning technique for single-row selects on non-overlapping ranges involves the use of ROWNUM = 1
to tell Oracle that only a single row is required.
SELECT * FROM delivery_method WHERE :b BETWEEN lower_bound AND upper_bound AND ROWNUM = 1
But this is not enough. ROWNUM = 1
will short-circuit the query so that it does not go looking for subsequent rows, but this assumes that a matching row is found at the beginning of the search. If we have to scan and filter out a large number of non-matching rows then there is no saving at all.
An index on (lower_bound, upper_bound)
will be stored in ascending order of lower_bound
. Remember that the BETWEEN
predicate is expanded out to an index access predicate (lower_bound <= :b
) and a filter predicate of (upper_bound >= :b)
: if the ranges are non-overlapping then of all the rows with lower_bound <= :b
, only the last one scanned will have upper_bound >= :b
.
select /*+index(delivery_method)*/ * from delivery_method where 3000 BETWEEN lower_bound AND upper_bound and rownum = 1 call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 2 0.00 0.00 3 3 0 1 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 4 0.00 0.00 3 3 0 1 Misses in library cache during parse: 1 Optimizer goal: CHOOSE Parsing user id: 33 Rows Row Source Operation ------- --------------------------------------------------- 1 COUNT STOPKEY 1 TABLE ACCESS BY INDEX ROWID DELIVERY_METHOD 301 INDEX RANGE SCAN DELIVERY_METHOD_PK (object id 1326010)
Note that 301 rows were still scanned in the index. In this case, the ROWNUM = 1
is useless because we don't find a match for the filter predicates until the last row. To make ROWNUM = 1
effective, we need to ensure that the matching row is the first one scanned, not the last. This is possible with a descending index scan (/*+INDEX_DESC(..) */ hint
), but it is simpler and more robust to simply reverse the order of columns in the index. If upper_bound
precedes lower_bound
in the index, then upper_bound >= :b
becomes the Access Predicate and lower_bound <= :b
becomes the Filter Predicate.
Of all the non-overlapping ranges with upper_bound >= :b
, the one that matches lower_bound <= :b
will be the first one scanned. This time, the ROWNUM = 1
will short-circuit after the first row is read and returned, preventing a scan of all of the remaining rows with upper_bound >= :b
.
SQL> create index delivery_method_pk 2 on delivery_method(upper_bound) 3 / Index created. select /*+index(delivery_method)*/ * from delivery_method where 3000 BETWEEN lower_bound AND upper_bound and rownum = 1 call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 2 0.00 0.00 3 3 0 1 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 4 0.00 0.00 3 3 0 1 Misses in library cache during parse: 1 Optimizer goal: CHOOSE Parsing user id: 33 Rows Row Source Operation ------- --------------------------------------------------- 1 COUNT STOPKEY 1 TABLE ACCESS BY INDEX ROWID DELIVERY_METHOD 1 INDEX RANGE SCAN DELIVERY_METHOD_PK (object id 1326016)
But watch out for:
The same technique can be applied to History Tables by indexing the non-unique identifier followed by the end_date
rather than the non-unique identifier followed by the start_date
.
The ROWNUM = 1
technique above does not extend to table joins where a lookup is required on the Range Table. Consider the following SQL:
SELECT * FROM orders o JOIN delivery_method m ON o.order_amt BETWEEN m.lower_bound and m.upper_bound
The purpose of this query is to lookup the delivery_method
table for every order
. Adding ROWNUM = 1
to this query would result in only one order being returned rather than limiting the lookup to the first matching row. As it stands though, the join query suffers the same problem as described above: one column is used to scan the index, the other is used to filter the results (notwitshanding the misleading Explain Plan output).
----------------------------------------------------------- | Id | Operation | Name | ----------------------------------------------------------- | 0 | SELECT STATEMENT | | | 1 | TABLE ACCESS BY INDEX ROWID| DELIVERY_METHOD | | 2 | NESTED LOOPS | | | 3 | TABLE ACCESS FULL | ORDERS | |* 4 | INDEX RANGE SCAN | DELIVERY_METHOD_PK | ----------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("O"."ORDER_AMT"<="M"."UPPER_BOUND" AND "O"."ORDER_AMT">="M"."LOWER_BOUND") filter("O"."ORDER_AMT">="M"."LOWER_BOUND" AND "O"."ORDER_AMT"<="M"."UPPER_BOUND")
If we can be certain that ranges are non-overlapping, then this SQL can be tuned by adding a sub-query. We make the assumption that if a matching row exists then it will be the one with the lowest upper_bound
that is greater than o.order_amt
(remember from above that the upper_bound
is now placed first in the index).
SELECT * FROM orders o , delivery_method m WHERE o.order_amt BETWEEN m.lower_bound and m.upper_bound AND m.upper_bound = ( SELECT min(m1.upper_bound) FROM delivery_method m1 WHERE m1.upper_bound >= o.order_amt )
The sub-query exploits a feature of the cost-based optimizer whereby the min()
or max()
of an index range can be resolved by simply picking the first or last rows respectively from the index range. It is not necessary to scan the entire range.
The o.order_amt BETWEEN m.lower_bound and m.upper_bound
is no longer used in the index access; it is merely a filter clause. The delivery_band
row that is joined to each order is accessed by the subquery predicate AND m.lower_bound = (...)
using an index scan on the lower_bound
column. In the example below, it was necessary to add hints to force a Nested Loops join, but the result was a 300-fold performance improvement.
Without sub-query | With sub-query |
---|---|
SELECT * FROM orders o , delivery_method m WHERE o.order_amt BETWEEN m.lower_bound and m.upper_bound call count cpu elapsed disk query ------- ------ -------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 Execute 1 0.00 0.00 0 0 Fetch 603 206.44 213.85 61 218928 ------- ------ -------- ---------- ---------- ---------- total 605 206.44 213.85 61 218928 Misses in library cache during parse: 0 Optimizer goal: CHOOSE Parsing user id: 33 Rows Row Source Operation ------- --------------------------------------------------- 9021 TABLE ACCESS BY INDEX ROWID DELIVERY_METHOD 50184810 NESTED LOOPS 10000 TABLE ACCESS FULL ORDERS 50174809 INDEX RANGE SCAN DELIVERY_METHOD_PK |
SELECT /*+ordered use_nl(m)*/ * FROM orders o , delivery_method m WHERE o.order_amt BETWEEN m.lower_bound and m.upper_bound AND m.upper_bound = ( SELECT min(m1.upper_bound) FROM delivery_method m1 WHERE m1.upper_bound >= o.order_amt ) call count cpu elapsed disk query ------- ------ -------- ---------- ---------- ---------- Parse 1 0.01 0.00 0 0 Execute 1 0.00 0.00 0 0 Fetch 603 0.42 0.77 78 50936 ------- ------ -------- ---------- ---------- ---------- total 605 0.43 0.77 78 50936 Misses in library cache during parse: 1 Optimizer goal: CHOOSE Parsing user id: 33 Rows Row Source Operation ------- --------------------------------------------------- 9021 TABLE ACCESS BY INDEX ROWID DELIVERY_METHOD 20001 NESTED LOOPS 10000 TABLE ACCESS FULL ORDERS 10000 INDEX RANGE SCAN DELIVERY_METHOD_PK |
Keen observers will note that the plan shown above with the sub-query is incorrect. This appears to be a bug in SQL*Trace on the database used for this demo. I checked the trace file and the TK*Prof output several times to make sure. The actual plan used by the optimizer is shown below.
-------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| -------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 15 | 1575 | 12614 (4)| | 1 | TABLE ACCESS BY INDEX ROWID | DELIVERY_METHOD | 1 | 35 | 3 (34)| | 2 | NESTED LOOPS | | 15 | 1575 | 12614 (4)| | 3 | TABLE ACCESS FULL | ORDERS | 6054 | 413K| 19 (37)| |* 4 | INDEX RANGE SCAN | DELIVERY_METHOD_PK | 1 | | 2 (50)| | 5 | SORT AGGREGATE | | 1 | 5 | | | 6 | FIRST ROW | | 500 | 2500 | 3 (34)| |* 7 | INDEX RANGE SCAN (MIN/MAX)| DELIVERY_METHOD_PK | 20 | | 3 (34)| --------------------------------------------------------------------------------------------
In this plan, the Nested Loops join executes steps 4-7 for each row returned from the ORDERS
table (step 3). Steps 6 and 7 ensure that only a single row is scanned in each iteration of the loop.
But watch out for:
OPTIMIZER_MODE
/ OPTIMIZER_GOAL
session parameter is not set to RULE
, and do not use the /*+RULE*/
hint.High volume joins to smaller range tables are somewhat easier to tune. Since we will be joining a lot of rows, we will also be reading a lot of rows from the Range Table. Under these circumstances, the cost of reading every row in the range table is just a small proportion of the cost of the query, providing we do it just once.
In the example above, the Nested Loops join is performing repeated scans of the Range Table. By changing the Nested Loops join to a Sort-Merge join, the Range Table is scanned just once, sorted, and joined to a sorted copy of the inner table. For the example below, DELIVERY_METHOD
has been rebuilt with just 10 rows instead of 10,000. Curiously, TK*Prof shows the correct plan for the sub-query on the smaller table. When I saw this, I rebuilt the larger table and tried again but received the wrong plan in TK*Prof again.
Nested Loops sub-query | Sort-Merge |
---|---|
SELECT /*+ordered use_nl(m)*/ * FROM orders o , delivery_method m WHERE o.order_amt BETWEEN m.lower_bound and m.upper_bound AND m.upper_bound = ( SELECT min(m1.upper_bound) FROM delivery_method m1 WHERE m1.upper_bound >= o.order_amt ) call count cpu elapsed disk query ------- ------ -------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 Execute 1 0.00 0.00 0 0 Fetch 34 0.23 0.41 42 30597 ------- ------ -------- ---------- ---------- ---------- total 36 0.23 0.41 42 30597 Misses in library cache during parse: 0 Optimizer goal: CHOOSE Parsing user id: 33 Rows Row Source Operation ------- --------------------------------------------------- 491 FILTER 491 NESTED LOOPS 10000 TABLE ACCESS FULL ORDERS 491 TABLE ACCESS FULL DELIVERY_METHOD 491 SORT AGGREGATE 491 FIRST ROW 491 INDEX RANGE SCAN (MIN/MAX) DELIVERY_METHOD_PK |
SELECT * FROM orders o , delivery_method m WHERE o.order_amt BETWEEN m.lower_bound and m.upper_bound call count cpu elapsed disk query ------- ------ -------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 Execute 1 0.00 0.00 0 0 Fetch 34 0.14 0.15 40 43 ------- ------ -------- ---------- ---------- ---------- total 36 0.14 0.16 40 43 Misses in library cache during parse: 1 Optimizer goal: CHOOSE Parsing user id: 33 Rows Row Source Operation ------- --------------------------------------------------- 491 MERGE JOIN 10 SORT JOIN 10 TABLE ACCESS FULL DELIVERY_METHOD 491 FILTER 97708 SORT JOIN 10000 TABLE ACCESS FULL ORDERS |
Oracle will often choose a Sort-Merge join automatically for a non-equijoin. If not, a /*+ USE_MERGE(..)*/
hint may be required.
But watch out for:
BETWEEN
clause on the date range columns merely filters the results. For equi-joins, Hash Joins are usually are much faster option.ROWNUM = 1
or sub-queries to exploit non-overlapping ranges. Always check the Access Predicates and Filter Predicates of Explain Plan carefully to determine which columns are contributing to a Range Scan and which columns are merely filtering the returned rows. Be sceptical if the same clause is shown in both.
本文转自:http://www.orafaq.com/node/1798