Subquery Unnesting
Subqueries are nested when they appear in the WHERE clause of the parent statement. When Oracle Database evaluates a statement with a nested subquery, it must evaluate the subquery portion multiple times and may overlook some efficient access paths or joins.
Subquery unnesting unnests and merges the body of the subquery into the body of the statement that contains it, allowing the optimizer to consider them together when evaluating access paths and joins. The optimizer can unnest most subqueries, with some exceptions. Those exceptions include hierarchical subqueries and subqueries that contain a ROWNUM pseudocolumn, one of the set operators, a nested aggregate function, or a correlated reference to a query block that is not the immediate outer query block of the subquery.
Assuming no restrictions exist, the optimizer automatically unnests some (but not all) of the following nested subqueries:
- Uncorrelated IN subqueries
- IN and EXISTS correlated subqueries, as long as they do not contain aggregate functions or a GROUP BY clause
You can enable extended subquery unnesting by instructing the optimizer to unnest additional types of subqueries:
- You can unnest an uncorrelated NOT IN subquery by specifying the HASH_AJ or MERGE_AJ hint in the subquery.
- You can unnest other subqueries by specifying the UNNEST hint in the subquery.
PARAMETER AND HINT
Subquery unnesting did not happen by default in Oracle 8i. In 8i the hidden parameter "_unnest_subquery" defaults to false whereas it defaults to true in 9i, 10G and 11G.
Subquery unnesting is part from query transformations. Query transformation was based on heuristic rules in 9i. From 10G R1 however by default query transformations are cost based since "_optimizer_cost_based_transformation" defaults to linear. ( there are bugs related to cost based query transformation causing huge performance degradation )
We can avoid that subquery unnesting happens at session level using the underscore "_unnest_subquery" parameter
SQL> alter session set "_unnest_subquery"=false;
We can also control subquery nested by UNNEST/NO_UNNEST hint.
We can avoid that subquery unnesting happens at statement level using the hint OPT_PARAM('_unnest_subquery','false')
select /*+ GATHER_PLAN_STATISTICS OPT_PARAM('_unnest_subquery','false') */ * from t_order_or or1 where
cr_fk = 'EUR' and or_totalamount =
(select max(or_totalamount) from t_order_or or2 where or2.cr_fk = or1.cr_fk );
We can avoid that subquery unnesting AND OTHER QUERY TRANSFORMATIONS happens at statement level using the NO_QUERY_TRANSFORMATION hint
select /*+ GATHER_PLAN_STATISTICS NO_QUERY_TRANSFORMATION */ * from t_order_or or1 where
cr_fk = 'EUR' and or_totalamount =
(select max(or_totalamount) from t_order_or or2 where or2.cr_fk = or1.cr_fk );
Terminology
Any sub-query block in a query statement may be called a subquery; however, we use the term subquery for a sub-query block that appears in the WHERE, SELECT and HAVING clauses. Some Oracle documentation uses the term "nested subquery" for what we refer to as a subquery. A sub-query block that appears in the FROM clause is called a view or derived table.
There are many ways to classify a subquery. The main categorization comes from the way a subquery is used in SQL statements. A WHERE clause subquery belongs to one of the following types: SINGLE-ROW, EXISTS, NOT EXISTS, ANY, or ALL. A single-row subquery must return at most one row, whereas the other types of subquery can return zero or more rows. ANY and ALL subqueries are used with relational comparison operators: =, >,>=, <, <=, and <>. In SQL, the set operator IN is used as a shorthand for =ANY and the set operator NOT IN is used as a shorthand for <>ALL.
Query A shows an example of a correlated EXISTS subquery
A.
SELECT C.cust_last_name, C.country_id
FROM customers C
WHERE EXISTS (SELECT 1
FROM sales S
WHERE S.quantity_sold > 1000 and
S.cust_id = C.cust_id);
A column that appears in a subquery is called a correlated column, if it comes from a table not defined by the subquery. The subquery in A is correlated, as it refers to a correlated column, C.cust_id, which comes from, customers, a table not defined by the subquery. The predicate, S.cust_id = C.cust_id, is called a correlating condition or a correlated predicate.
Consider query B, which contains an uncorrelated ANY subquery. Note that queries B and A are semantically equivalent.
B.
SELECT C.cust_last_name, C.country_id
FROM customers C
WHERE C.cust_id =ANY (SELECT S.cust_id
FROM sales S
WHERE S.quantity_sold > 1000);
The subquery in B is uncorrelated, as it does not refer to a correlated column. "C.cust_id = ANY S.cust_id" in B is called a connecting condition.
Subquery Evaluation
A NOT EXISTS subquery evaluates to TRUE, if it returns no rows. The ANY/ALL subquery returns a set of values, and the predicate containing the ANY/ALL subquery will evaluate to TRUE, if it is satisfied. For example, at least one S.cust_id values must match C.cust_id in the connecting condition of the ANY subquery of query B.
Note that in Oracle, a non-unnested ANY and ALL subquery is converted into a correlated EXISTS and NOT EXISTS subquery respectively.
When a correlated subquery is not unnested, the subquery is evaluated multiple times, for each row of the outer tables, substituting the values of correlated columns (e.g., customer.cust_id in A). Thus, table accesses and joins inside the subquery are repeatedly performed with each invocation and join orders involving subquery tables and outer query tables cannot be explored. This type of evaluation also inhibits parallelization.
XA shows the execution plan for query A. Here subquery unnesting has been disabled. Observe that the text of the non-unnested subquery filter is displayed in the predicate dump at the bottom of the plan. As the execution plan shows, the subquery will be evaluated multiple (i.e., 50K) times for each outer row of CUSTOMERS table.
XA.
Execution Plan
----------------------------------------------------------
Plan hash value: 1012411109
----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 18 | 2428K (1)| 08:05:41 | | |
|* 1 | FILTER | | | | | | | |
| 2 | TABLE ACCESS FULL | CUSTOMERS | 55500 | 975K| 405 (1)| 00:00:05 | | |
| 3 | PARTITION RANGE ALL | | 1 | 8 | 54 (0)| 00:00:01 | 1 | 28 |
|* 4 | TABLE ACCESS BY LOCAL INDEX ROWID| SALES | 1 | 8 | 54 (0)| 00:00:01 | 1 | 28 |
| 5 | BITMAP CONVERSION TO ROWIDS | | | | | | | |
|* 6 | BITMAP INDEX SINGLE VALUE | SALES_CUST_BIX | | | | | 1 | 28 |
----------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( EXISTS (SELECT /*+ NO_UNNEST */ 0 FROM "SALES" "S" WHERE "S"."CUST_ID"=:B1 AND
"S"."QUANTITY_SOLD">1000))
4 - filter("S"."QUANTITY_SOLD">1000)
6 - access("S"."CUST_ID"=:B1)
EXISTS and ANY Subqueries
Semi-join is generally used for unnesting EXISTS and ANY subqueries. However, in some cases where duplicate rows are not relevant, inner join can also be used to unnest EXISTS and ANY subqueries. Here we represent semi-join by the following non-standard syntax: T1.x S= T2.y, where T1 is the left table and T2 is the right table of the semi-join. The semantics of semi-join is the following: A row of T1 is returned as soon as T1.x finds a match with any value of T2.y without searching for further matches.
Consider the previously shown query A. Unnesting of the subquery in A produces query C, where the body of the subquery has been merged into the outer query. Here the correlating condition has been turned into a join predicate; customers and sales become the left and right tables respectively in the semi-join.
C.
SELECT C.cust_last_name, C.country_id
FROM customers C, sales S
WHERE S.quantity_sold > 1000 and
C.cust_id S= S.cust_id;
The execution plan of C is shown below as XC. Note the difference between the costs of the plan XC and the plan XA; recall that XA was generated by disabling unnesting. Clearly, the plan with unnesting (XC) is much more optimal; the cost has come down from 2428K to 896. (The query B also produces the same plan as XC.)
XC.
Execution Plan
----------------------------------------------------------
Plan hash value: 3983182601
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 52 | 896 (2)| 00:00:11 | | |
|* 1 | HASH JOIN SEMI | | 2 | 52 | 896 (2)| 00:00:11 | | |
| 2 | PARTITION RANGE ALL| | 1 | 8 | 490 (2)| 00:00:06 | 1 | 28 |
|* 3 | TABLE ACCESS FULL | SALES | 1 | 8 | 490 (2)| 00:00:06 | 1 | 28 |
| 4 | TABLE ACCESS FULL | CUSTOMERS | 55500 | 975K| 405 (1)| 00:00:05 | | |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("S"."CUST_ID"="C"."CUST_ID")
3 - filter("S"."QUANTITY_SOLD">1000)
Now consider query D, which contains an uncorrelated ANY subquery that has two tables.
D.
SELECT C.cust_last_name, C.country_id
FROM customers C
WHERE C.cust_id =ANY (SELECT S.cust_id
FROM sales S, products P
WHERE P.prod_id = S.prod_id and
P.prod_list_price > 105);
The subquery in D can be unnested by using a semi-join; however, the inner join of the tables in the subquery, sales and products must take place before the semi-join is performed. Therefore, an inline view needs to be generated in order to enforce the join order. The query E shows the unnesting transformation of D. Here the subquery is decorrelated and converted into an inline view, which becomes the right table in the semi-join; and the correlated predicate is turned into a join predicate.
E.
SELECT C.cust_last_name, C.country_id
FROM customers C,
(SELECT S.cust_id as s_cust_id
FROM sales S, products P
WHERE P.prod_id = S.prod_id and P.prod_list_price > 105) VW
WHERE C.cust_id S= VW.s_cust_id;
XE shows the execution plan of E. It has an optimizer-generated inline view named VW_SQ_1. Of the three available join methods (i.e., nested-loop, hash, and sort-merge), the hash method was selected by the optimizer to do the semi-join.
XE
Execution Plan
----------------------------------------------------------
Plan hash value: 4030918423
------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 7059 | 213K| | 1987 (1)| 00:00:24 | | |
|* 1 | HASH JOIN SEMI | | 7059 | 213K| 1632K| 1987 (1)| 00:00:24 | | |
| 2 | TABLE ACCESS FULL | CUSTOMERS | 55500 | 975K| | 405 (1)| 00:00:05 | | |
| 3 | VIEW | VW_NSO_1 | 849K| 10M| | 493 (2)| 00:00:06 | | |
|* 4 | HASH JOIN | | 849K| 14M| | 493 (2)| 00:00:06 | | |
|* 5 | TABLE ACCESS FULL | PRODUCTS | 67 | 603 | | 3 (0)| 00:00:01 | | |
| 6 | PARTITION RANGE ALL| | 918K| 8075K| | 486 (2)| 00:00:06 | 1 | 28 |
| 7 | TABLE ACCESS FULL | SALES | 918K| 8075K| | 486 (2)| 00:00:06 | 1 | 28 |
------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("C"."CUST_ID"="CUST_ID")
4 - access("P"."PROD_ID"="S"."PROD_ID")
5 - filter("P"."PROD_LIST_PRICE">105)
NOT EXISTS Subqueries
Anti-join is always used for unnesting NOT EXISTS and ALL subqueries. We represent anti-join by the following non-standard syntax: T1.x A= T2.y, where T1 is the left table and T2 is the right table of the anti-join. The semantics of anti-join is the following: A row of T1 is rejected as soon as T1.x finds a match with any value of T2.y. A row of T1 is returned, only if T1.x does not match with any value of T2.y.
Consider query F, which has a NOT EXISTS subquery containing two tables.
F.
SELECT C.cust_last_name, C.country_id
FROM customers C
WHERE NOT EXISTS (SELECT 1
FROM sales S, products P
WHERE P.prod_id = S.prod_id and
P.prod_min_price > 90 and
S.cust_id = C.cust_id);
The subquery in F can be unnested by using an anti-join; however, the inner join of the tables in the subquery, sales and products must take place before the anti-join is performed. An inline view is generated in order to enforce the join order. This unnesting produces query G; here the inline view becomes the right table of anti-join.
G.
SELECT C.cust_last_name, C.country_id
FROM customers C,
(SELECT S.cust_id AS s_cust_id
FROM sales S, products P
WHERE P.prod_id = S.prod_id and
P.prod_min_price > 90) VW
WHERE C.cust_id A= VW.s_cust_id;
The execution plan of G is shown as XG. Of the three join methods (i.e., nested-loop, hash and sort-merge), the hash method was selected by the optimizer to do the anti-join.
XG.
Execution Plan
----------------------------------------------------------
Plan hash value: 1110507518
------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 55500 | 1680K| | 2000 (1)| 00:00:24 | | |
|* 1 | HASH JOIN ANTI | | 55500 | 1680K| 1632K| 2000 (1)| 00:00:24 | | |
| 2 | TABLE ACCESS FULL | CUSTOMERS | 55500 | 975K| | 405 (1)| 00:00:05 | | |
| 3 | VIEW | VW_SQ_1 | 859K| 10M| | 493 (2)| 00:00:06 | | |
|* 4 | HASH JOIN | | 859K| 14M| | 493 (2)| 00:00:06 | | |
|* 5 | TABLE ACCESS FULL | PRODUCTS | 67 | 603 | | 3 (0)| 00:00:01 | | |
| 6 | PARTITION RANGE ALL| | 918K| 8075K| | 486 (2)| 00:00:06 | 1 | 28 |
| 7 | TABLE ACCESS FULL | SALES | 918K| 8075K| | 486 (2)| 00:00:06 | 1 | 28 |
------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ITEM_1"="C"."CUST_ID")
4 - access("P"."PROD_ID"="S"."PROD_ID")
5 - filter("P"."PROD_MIN_PRICE">90)
Single-Row Aggregated Subqueries
Consider query H that contains an aggregated, correlated, single-row subquery.
H.
SELECT C.cust_last_name, C.cust_income_level
FROM customers C
WHERE C.cust_credit_limit < (SELECT SUM(S.amount_sold)
FROM sales S
WHERE S.cust_id = C.cust_id);
Doing aggregation for multiple values of equality correlation predicate is like doing aggregation and group-by on the local column, and then joining the view with the outer table on the group-by columns. The subquery in H is unnested by decorrelating it and converting it into a group-by view, which is inner joined with the outer table, customer; here both the correlating and connecting conditions have been turned into join predicates. The transformed query is shown as I.
I.
SELECT C.cust_last_name, C.cust_income_level
FROM customers C,
SELECT SUM(S.amount_sold) AS sum_amt, S.cust_id
FROM sales S
GROUP BY S.cust_id) VW
WHERE C.cust_credit_limit < VW.sum_amt and
C.cust_id = VW.cust_id;
XI shows the execution plan of I.
XI.
Execution Plan
----------------------------------------------------------
Plan hash value: 1157114491
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 353 | 22592 | 921 (4)| 00:00:12 | | |
|* 1 | HASH JOIN | | 353 | 22592 | 921 (4)| 00:00:12 | | |
| 2 | VIEW | | 7059 | 179K| 515 (7)| 00:00:07 | | |
| 3 | HASH GROUP BY | | 7059 | 70590 | 515 (7)| 00:00:07 | | |
| 4 | PARTITION RANGE ALL| | 918K| 8973K| 489 (2)| 00:00:06 | 1 | 28 |
| 5 | TABLE ACCESS FULL | SALES | 918K| 8973K| 489 (2)| 00:00:06 | 1 | 28 |
| 6 | TABLE ACCESS FULL | CUSTOMERS | 55500 | 2059K| 405 (1)| 00:00:05 | | |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("C"."CUST_ID"="VW"."CUST_ID")
filter("C"."CUST_CREDIT_LIMIT"<"VW"."SUM_AMT")
Validity of Unnesting
Every subquery, before it can be unnested, goes through a set of validity checks. The optimizer decisions to unnest or not to unnest a subquery can be overridden by specifying an appropriate hint, but the validity requirements cannot be, since unnesting in such cases would not guarantee a semantically equivalent query.
In the following, we enumerate some important checks that currently invalidate subquery unnesting. Note that this list of checks is by no means exhaustive.
- Subqueries that are correlated to non-parent; for example, subquery SQ3 is contained by SQ2 (parent of SQ3) and SQ2 in turn is contained by SQ1 and SQ3 is correlated to tables defined in SQ1.
- A group-by subquery is correlated; in this case, unnesting implies doing join after group-by. Changing the given order of the two operations may not be always legal.
- Connecting or correlating conditions are not well-formed (e.g., they contains a mix of local and correlated columns on either side of the predicate) and the subquery requires inline view generation, as predicates of this kind do not allow separating out view columns and outer table columns.
- For disjunctive subqueries, the outer columns in the connecting or correlating conditions are not the same.
- Using view-merging transformation, Oracle may merge the group-by or distinct inline view generated during unnesting, and therefore the execution plan may not show any view even when a view is expected.
Summary
In these posts we have tried to illustrate the basic ideas behind unnesting of different types of subquery by showing simple example queries. Oracle can handle far more complex queries - query statements with multiple subqueries at one or more levels, multiple tables, correlated and connecting conditions containing inequality predicates and expressions, subqueries that contain set operators, subqueries with group-by and COUNT aggregate function, ALL subqueries containing nullable columns in its connecting condition, and subqueries in disjunction.
If unnesting of a subquery does not require generation of an inline view, then the subquery is always unnested, because this unnesting provides a more efficient execution plan by allowing more join methods and join orders. If the local column (e.g., S.cust_id in A) in the correlating predicate of the subquery has an index on it, then the subquery evaluation becomes akin to doing index-based nested-loop join; and thus, in some cases, not unnesting may be more efficient than unnesting that generates an inline view and enables only sort-merge and hash join of the view with outer tables. Therefore, in Oracle, subquery unnesting which generates inline views, is done based on cost under the cost-based query transformation framework.
参考至:https://blogs.oracle.com/optimizer/entry/optimizer_transformations_subquery_unesting_part_1
https://blogs.oracle.com/optimizer/entry/optimizer_transformations_subquery_unesting_part_2
http://www.online-database.eu/index.php/sql-tuning/177-subquery-unnesting
http://docs.oracle.com/cd/E11882_01/server.112/e41084/queries008.htm#SQLRF52358
本文原创,转载请注明出处、作者
如有错误,欢迎指正