Optimizer Transformations: Subquery Unnesting part 1

We have received a ton of requests for more information on Oracle Optimizer Transformations so we thought we would put together a series of blog posts describing the most commonly used transformations. You should expect to see two blog posts a month on this over the next quarter.

Before we begin we should explain what we mean by transformation? When a SQL statement is parsed, the Optimizer will try to "transform" or rewrite the SQL statement into a semantically equivalent SQL statement that can be processed more efficiently. We first discussed the concepts of transformations in a blog post in June 2008 called why are some of tables in my query missing in my plan. This original post dealt with just one of the new transformations introduced in Oracle Database 11g called table elimination. Part one of our new series will deal with subquery unnesting. I must give credit to Rafi, one of the senior Optimizer developers, who provided the content for this topic.


Subquery Unnesting
Subquery unnesting is an optimization that converts a subquery into a join in the outer query and allows the optimizer to consider subquery table(s) during access path, join method, and join order selection. Unnesting either merges the subquery into the body of the outer query or turns it into an inline view.

Without unnesting, the subquery is evaluated multiple times, for each row of the outer table, and thus many efficient access paths and join methods cannot be considered. Here we will discuss the unnesting of ANY and EXISTS subqueries, which appear in the WHERE clause.
Terminology
Any sub-query block in a query statement may be called a subquery; however, we use the term subquery for a sub-query block that appears in the WHERE, SELECT and HAVING clauses. Some Oracle documentation uses the term "nested subquery" for what we refer to as a subquery. A sub-query block that appears in the FROM clause is called a view or derived table.

There are many ways to classify a subquery. The main categorization comes from the way a subquery is used in SQL statements. A WHERE clause subquery belongs to one of the following types: SINGLE-ROW, EXISTS, NOT EXISTS, ANY, or ALL. A single-row subquery must return at most one row, whereas the other types of subquery can return zero or more rows. ANY and ALL subqueries are used with relational comparison operators: =, >,>=, <, <=, and <>. In SQL, the set operator IN is used as a shorthand for =ANY and the set operator NOT IN is used as a shorthand for <>ALL.

Query A shows an example of a correlated EXISTS subquery.

A.
SELECT C.cust_last_name, C.country_id
FROM    customers C
WHERE EXISTS (SELECT 1
                          FROM sales S
                          WHERE S.quantity_sold > 1000 and
                                        S.cust_id = C.cust_id);

A column that appears in a subquery is called a correlated column, if it comes from a table not defined by the subquery. The subquery in A is correlated, as it refers to a correlated column, C.cust_id, which comes from, customers, a table not defined by the subquery. The predicate, S.cust_id = C.cust_id, is called a correlating condition or a correlated predicate.

Consider query B, which contains an uncorrelated ANY subquery. Note that queries B and A are semantically equivalent.

B.
SELECT C.cust_last_name, C.country_id
FROM    customers C
WHERE C.cust_id =ANY (SELECT S.cust_id
                                       FROM sales S
                                       WHERE S.quantity_sold > 1000);

The subquery in B is uncorrelated, as it does not refer to a correlated column. "C.cust_id = ANY S.cust_id" in B is called a connecting condition.
Subquery Evaluation
A NOT EXISTS subquery evaluates to TRUE, if it returns no rows. The ANY/ALL subquery returns a set of values, and the predicate containing the ANY/ALL subquery will evaluate to TRUE, if it is satisfied. For example, at least one S.cust_id values must match C.cust_id in the connecting condition of the ANY subquery of query B.  

Note that in Oracle, a non-unnested ANY and ALL subquery is converted into a correlated EXISTS and NOT EXISTS subquery respectively.

When a correlated subquery is not unnested, the subquery is evaluated multiple times, for each row of the outer tables, substituting the values of correlated columns (e.g., customer.cust_id in A). Thus, table accesses and joins inside the subquery are repeatedly performed with each invocation and join orders involving subquery tables and outer query tables cannot be explored. This type of evaluation also inhibits parallelization.

XA shows the execution plan for query A. Here subquery unnesting has been disabled. Observe that the text of the non-unnested subquery filter is displayed in the predicate dump at the bottom of the plan. As the execution plan shows, the subquery will be evaluated multiple (i.e., 50K) times for each outer row of CUSTOMERS table.

XA.

 
EXISTS and ANY Subqueries
Semi-join is generally used for unnesting EXISTS and ANY subqueries. However, in some cases where duplicate rows are not relevant, inner join can also be used to unnest EXISTS and ANY subqueries. Here we represent semi-join by the following non-standard syntax: T1.x S= T2.y, where T1 is the left table and T2 is the right table of the semi-join. The semantics of semi-join is the following: A row of T1 is returned as soon as T1.x finds a match with any value of T2.y without searching for further matches.

Consider the previously shown query A. Unnesting of the subquery in A produces query C, where the body of the subquery has been merged into the outer query. Here the correlating condition has been turned into a join predicate; customers and sales become the left and right tables respectively in the semi-join.

C.
SELECT C.cust_last_name, C.country_id
FROM customers C, sales S
WHERE S.quantity_sold > 1000 and
C.cust_id S= S.cust_id;

The execution plan of C is shown below as XC. Note the difference between the costs of the plan XC and the plan XA; recall that XA was generated by disabling unnesting. Clearly, the plan with unnesting (XC) is much more optimal; the cost has come down from 5006K to 2300. (The query B also produces the same plan as XC.)
XC.

 

Now consider query D, which contains an uncorrelated ANY subquery that has two tables.
D.
SELECT C.cust_last_name, C.country_id
FROM    customers C
WHERE C.cust_id =ANY (SELECT S.cust_id
                                       FROM sales S, products P
                                       WHERE P.prod_id = S.prod_id and
                                                      P.prod_list_price > 105);

The subquery in D can be unnested by using a semi-join; however, the inner join of the tables in the subquery, sales and products must take place before the semi-join is performed. Therefore, an inline view needs to be generated in order to enforce the join order. The query E shows the unnesting transformation of D. Here the subquery is decorrelated and converted into an inline view, which becomes the right table in the semi-join; and the correlated predicate is turned into a join predicate.

E.
SELECT C.cust_last_name, C.country_id
FROM customers C,
          (SELECT S.cust_id as s_cust_id
           FROM    sales S, products P
           WHERE P.prod_id = S.prod_id and P.prod_list_price > 105) VW
WHERE C.cust_id S= VW.s_cust_id;

XE shows the execution plan of E. It has an optimizer-generated inline view named VW_SQ_1. Of the three available join methods (i.e., nested-loop, hash, and sort-merge), the hash method was selected by the optimizer to do the semi-join.

XE

 

Look out for part 2 of our blog on Subquery Unnesting,  which will discuss NOT EXISTS subqueries, single-row aggregated subqueries, as well as the validity checks performed before unnesting is allowed.

你可能感兴趣的:(Oracle,SQL,调优)