oracle学习笔记10: 子查询因子化

with子句最有用的特性之一就是消除复杂的sql查询,当查询中包含大量的表和数据列时,想要搞清楚查询中的数据流就变得很困难。使用子查询因子化,只要一个个查询就可以将一些较复杂的部分移到主查询之外,从而使得查询更易于理解。

下面使用pivot运算符生成一个交叉数据分析报告。最里层的查询在sales表的关键列上创建了一系列的聚合,而接下来的最外层查询只是提供了在pivot运算符中出现的列的列名,从而生成了每种产品不同渠道和季度的最终销售值。

--没有进行子查询因子化的交叉数据分析查询
SELECT *
FROM (
 SELECT /*+ gather_plan_statistics */
 product,channel,quarter,country,quantity_sold 
 FROM (
  SELECT pr.prod_name product,co.country_name country,sa.channel_id channel, Substr(t.calendar_quarter_desc,6,2) quarter,
  SUM(sa.amount_sold) amount_sold,
  SUM(sa.quantity_sold) quantity_sold
  FROM sh.sales sa
  JOIN sh.times t ON t.time_id=sa.time_id
  JOIN sh.customers cu ON cu.cust_id=sa.cust_id
  JOIN sh.countries co ON co.country_id=cu.country_id
  JOIN sh.products pr ON pr.prod_id=sa.prod_id
  GROUP BY 
  pr.prod_name,co.country_name,sa.channel_id,Substr(t.calendar_quarter_desc,6,2)
 ) 
) PIVOT (
  SUM(quantity_sold) FOR (channel,quarter) IN
  (
   (5,'02') AS catolog_q2,
   (4,'01') AS internet_q1,
   (4,'04') AS internet_q4,
   (2,'02') AS partners_q2,
   (9,'03') AS tele_q3
  )
)
ORDER BY product,country;

使用with子句将这个查询分解为易于理解的字节级大小的块。使用with子句建立3个因子化子查询来进行了重写,分别命名为sales_countries、top_sales 、sales_rpt 子查询。sales_countries指的是销售所发生的国家,top_sales收集销售数据,而sales_rpt子查询对这些数据进行聚合。

--进行子查询因子化的交叉表
WITH sales_countries AS (
 SELECT /*+ gather_plan_statistics */
 cu.cust_id,co.country_name
 FROM sh.countries co, sh.customers cu
 WHERE cu.country_id=co.country_id
),
top_sales AS
(
 SELECT p.prod_name,sc.country_name,sa.channel_id,
 t.calendar_quarter_desc,sa.amount_sold,sa.quantity_sold
 FROM sh.sales sa
 JOIN sh.times t ON t.time_id=sa.time_id
 JOIN sh.customers c ON c.cust_id = sa.cust_id
 JOIN sales_countries sc ON sc.cust_id = c.cust_id
 JOIN sh.products p ON p.prod_id = sa.prod_id
),
sales_rpt AS
(
 SELECT ts.prod_name product,
 ts.country_name country,
 ts.channel_id channel,
 SUBSTR(ts.calendar_quarter_desc,6,2) quarter,
 SUM(amount_sold) amount_sold,
 SUM(quantity_sold) quantity_sold
 FROM top_sales ts
 GROUP BY ts.prod_name,
 ts.country_name,
 ts.channel_id,
 SUBSTR(ts.calendar_quarter_desc,6,2)
)
SELECT * FROM 
(
 SELECT product, channel,quarter,country,quantity_sold
 FROM sales_rpt
) PIVOT (
 SUM(quantity_sold)
 FOR (channel,quarter) IN
 (
     (5,'02') AS catalog_q2,
     (4,'01') AS internet_q1,
     (4,'04') AS internet_q4,
     (2,'02') AS partners_q2,
     (9,'03') AS tele_q3
 )
)
ORDER BY product,country;

用with定义PL/SQL函数

Oracle 12c中引入了一种特性,可以使用with子句声明并定义pl/sql函数和存储过程。在定义之后,可以在声明这个子句的查询中引用该pl/sql函数。

WITH 
  FUNCTION calc_markup(p_markup NUMBER,p_price NUMBER) RETURN NUMBER
  IS
  BEGIN
         RETURN p_markup*p_price;
    END;
  SELECT t.prod_name,
  t.prod_list_price cur_price,
  calc_markup(.05,t.prod_list_price) mup5,
  ROUND(t.prod_list_price+calc_markup(0.10,t.prod_list_price),2) new_price 
  FROM sh.products t;
/
SELECT prod_name,cur_price,mup5,new_price
FROM (
  WITH 
    FUNCTION calc_markup(p_markup NUMBER,p_price NUMBER) RETURN NUMBER
    IS
    BEGIN
           RETURN p_markup*p_price;
      END;
      SELECT t.prod_name,
      t.prod_list_price cur_price,
      calc_markup(.05,t.prod_list_price) mup5,
      ROUND(t.prod_list_price+calc_markup(0.10,t.prod_list_price),2) new_price  
      FROM sh.products t
) WHERE cur_price<1000
AND new_price>1000;
SELECT /*+ WITH_PLSQL */ prod_name,cur_price,mup5,new_price
FROM (
  WITH 
    FUNCTION calc_markup(p_markup NUMBER,p_price NUMBER) RETURN NUMBER
    IS
    BEGIN
           RETURN p_markup*p_price;
      END;
      SELECT t.prod_name,
      t.prod_list_price cur_price,
      calc_markup(.05,t.prod_list_price) mup5,
      ROUND(t.prod_list_price+calc_markup(0.10,t.prod_list_price),2) new_price  
      FROM sh.products t
) WHERE cur_price<1000
AND new_price>1000;
/

必须使用斜线/运行这个语名,类似执行一个匿名的PL/SQL块。

SQL优化

当一个SQL查询被设计或修改以利用子查询因子化时,在优化器为查询建立执行计划时,可能将因子化的子查询作为临时表来处理。

sqlplus scott/scott@orcl

命令行登陆语法 sqlplus username/password@servername as sysdba
set autotrace on显示计划信息和查询的数据
set autotrace traceonly只显示计划信息

SQL> set autotrace traceonly
SQL> --with和materialize
SQL> WITH cust AS (
  2       SELECT /*+ materialize gather_plan_statistics */
  3       t.cust_income_level,
  4       a.country_name
  5       FROM sh.customers t
  6       JOIN sh.countries a ON a.country_id=t.country_id
  7  )
  8  SELECT  c.country_name,cust_income_level,COUNT(c.country_name) country_cust_count
  9  FROM cust c
 10  HAVING COUNT(country_name) >
 11  (
 12         SELECT COUNT(*) * .01 FROM cust c2
 13  )
 14  OR COUNT(cust_income_level) >
 15  (
 16         SELECT MEDIAN(income_income_count)
 17         FROM (
 18              SELECT cust_income_level,COUNT(*)* .25 income_income_count
 19              FROM cust
 20              GROUP BY cust_income_level
 21         )
 22  )
 23  GROUP BY country_name,cust_income_level
 24  ORDER BY 1,2;

已选择35行。


执行计划
----------------------------------------------------------
Plan hash value: 3455850065

--------------------------------------------------------------------------------------------------------

| Id  | Operation                  | Name                      | Rows  | Bytes | Cost (%CPU)| Time     |

--------------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT           |                           |    20 |   620 |   499   (2)| 00:00:06 |

|   1 |  TEMP TABLE TRANSFORMATION |                           |       |       |            |          |

|   2 |   LOAD AS SELECT           |                           |       |       |           |          |

|*  3 |    HASH JOIN               |                           | 55500 |  2222K|   410   (1)| 00:00:05 |

|   4 |     TABLE ACCESS FULL      | COUNTRIES                 |    23 |   345 |     3   (0)| 00:00:01 |

|   5 |     TABLE ACCESS FULL      | CUSTOMERS                 | 55500 |  1409K|   406   (1)| 00:00:05 |

|*  6 |   FILTER                   |                           |       |       |            |          |

|   7 |    SORT GROUP BY           |                           |    20 |   620 |    89   (5)| 00:00:02 |

|   8 |     VIEW                   |                           | 55500 |  1680K|    86   (2)| 00:00:02 |

|   9 |      TABLE ACCESS FULL     | SYS_TEMP_0FD9D6607_61C9C4 | 55500 |  1680K|    86   (2)| 00:00:02 |

|  10 |    SORT AGGREGATE          |                           |     1 |       |           |          |

|  11 |     VIEW                   |                           | 55500 |       |    86   (2)| 00:00:02 |

|  12 |      TABLE ACCESS FULL     | SYS_TEMP_0FD9D6607_61C9C4 | 55500 |  1680K|    86   (2)| 00:00:02 |

|  13 |    SORT GROUP BY           |                           |     1 |    13 |            |          |

|  14 |     VIEW                   |                           |    12 |   156 |    89   (5)| 00:00:02 |

|  15 |      SORT GROUP BY         |                           |    12 |   252 |    89   (5)| 00:00:02 |

|  16 |       VIEW                 |                           | 55500 |  1138K|    86   (2)| 00:00:02 |

|  17 |        TABLE ACCESS FULL   | SYS_TEMP_0FD9D6607_61C9C4 | 55500 |  1680K|    86   (2)| 00:00:02 |

--------------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("A"."COUNTRY_ID"="T"."COUNTRY_ID")
   6 - filter(COUNT("COUNTRY_NAME")> (SELECT COUNT(*)*.01 FROM  (SELECT /*+ CACHE_TEMP_TABLE

              ("T1") */ "C0" "CUST_INCOME_LEVEL","C1" "COUNTRY_NAME" FROM "SYS"."SYS_TEMP_0FD9D6607_61C9C4"

              "T1") "C2") OR COUNT("CUST_INCOME_LEVEL")> (SELECT PERCENTILE_CONT(0.500000) WITHIN GROUP (

              ORDER BY "INCOME_INCOME_COUNT") FROM  (SELECT "CUST_INCOME_LEVEL"
              "CUST_INCOME_LEVEL",COUNT(*)*.25 "INCOME_INCOME_COUNT" FROM  (SELECT /*+ CACHE_TEMP_TABLE

              ("T1") */ "C0" "CUST_INCOME_LEVEL","C1" "COUNTRY_NAME" FROM "SYS"."SYS_TEMP_0FD9D6607_61C9C4"

              "T1") "CUST" GROUP BY "CUST_INCOME_LEVEL") "from$_subquery$_006"))


统计信息
----------------------------------------------------------
          4  recursive calls
        314  db block gets
       2382  consistent gets
        303  physical reads
        600  redo size
       1916  bytes sent via SQL*Net to client
        438  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          3  sorts (memory)
          0  sorts (disk)
         35  rows processed

需要测试才能得到最优查询性能可以通过一个管理层需要的报表说明。这个报告必须按照照国家和收入水平显示消费者的分布情况,并且只显示那些占总消费比例等于或超过1%的国家和收入水平的数据。如果某个收入水平范围的消费者数目等于或超过该收入水平范围的总消费者数的25%,这样的国家和收入水平也需要被收入到报告中。前一个查询中因子化的子查询cust被保留了下来,新的内容是having子句中的子查询,这是用来保证执行报告所规定的规则的。
执行这个sql语句时,所有一切都像你预想的那样。然后检查执行计划发现customers和countries表的联结经过了一个临时表转换(TEMP TABLE TRANSFORMATION ),接下来的查询都会用到这个临时表sys_temp_of。到止前为止,如果怀疑所选择的执行计划是不是合理,可以使用materialized和inline提示测试。

SQL> --with和inline
SQL> WITH cust AS (
  2       SELECT /*+ inline gather_plan_statistics */
  3       t.cust_income_level,
  4       a.country_name
  5       FROM sh.customers t
  6       JOIN sh.countries a ON a.country_id=t.country_id
  7  )
  8  SELECT  c.country_name,cust_income_level,COUNT(c.country_name) country_cust_count
  9  FROM cust c
 10  HAVING COUNT(country_name) >
 11  (
 12         SELECT COUNT(*) * .01 FROM cust c2
 13  )
 14  OR COUNT(cust_income_level) >
 15  (
 16         SELECT MEDIAN(income_income_count)
 17         FROM (
 18              SELECT cust_income_level,COUNT(*)* .25 income_income_count
 19              FROM cust
 20              GROUP BY cust_income_level
 21         )
 22  )
 23  GROUP BY country_name,cust_income_level
 24  ORDER BY 1,2;

已选择35行。


执行计划
----------------------------------------------------------
Plan hash value: 1412345716

---------------------------------------------------------------------------------------

| Id  | Operation              | Name         | Rows  | Bytes | Cost (%CPU)| Tim
e     |

---------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT       |              |    20 |   820 |   413   (2)| 00:00:05 |

|*  1 |  FILTER                |              |       |       |            |      |

|   2 |   SORT GROUP BY        |              |    20 |   820 |   413   (2)| 00:00:05 |

|*  3 |    HASH JOIN           |              | 55500 |  2222K|   410   (1)| 00:00:05 |

|   4 |     TABLE ACCESS FULL  | COUNTRIES    |    23 |   345 |     3   (0)| 00:00:01 |

|   5 |     TABLE ACCESS FULL  | CUSTOMERS    | 55500 |  1409K|   406   (1)| 00:00:05 |

|   6 |   SORT AGGREGATE       |              |     1 |    10 |            |      |

|*  7 |    HASH JOIN           |              | 55500 |   541K|   408   (1)| 00:00:05 |

|   8 |     INDEX FULL SCAN    | COUNTRIES_PK |    23 |   115 |     1   (0)| 00:00:01 |

|   9 |     TABLE ACCESS FULL  | CUSTOMERS    | 55500 |   270K|   406   (1)| 00:00:05 |

|  10 |   SORT GROUP BY        |              |     1 |    13 |            |      |

|  11 |    VIEW                |              |    12 |   156 |   411   (2)| 00:00:05 |

|  12 |     SORT GROUP BY      |              |    12 |   372 |   411   (2)| 00:00:05 |

|* 13 |      HASH JOIN         |              | 55500 |  1680K|   408   (1)| 00:00:05 |

|  14 |       INDEX FULL SCAN  | COUNTRIES_PK |    23 |   115 |     1   (0)| 00:00:01 |

|  15 |       TABLE ACCESS FULL| CUSTOMERS    | 55500 |  1409K|   406   (1)| 00:00:05 |

---------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(COUNT(*)> (SELECT COUNT(*)*.01 FROM "SH"."COUNTRIES"
              "A","SH"."CUSTOMERS" "T" WHERE "A"."COUNTRY_ID"="T"."COUNTRY_ID") OR

              COUNT("T"."CUST_INCOME_LEVEL")> (SELECT PERCENTILE_CONT(0.500000)WITHIN GROUP

              ( ORDER BY "INCOME_INCOME_COUNT") FROM  (SELECT "T"."CUST_INCOME_LEVEL"

              "CUST_INCOME_LEVEL",COUNT(*)*.25 "INCOME_INCOME_COUNT" FROM "SH"."COUNTRIES"

              "A","SH"."CUSTOMERS" "T" WHERE "A"."COUNTRY_ID"="T"."COUNTRY_ID" GROUP BY

              "T"."CUST_INCOME_LEVEL") "from$_subquery$_006"))
   3 - access("A"."COUNTRY_ID"="T"."COUNTRY_ID")
   7 - access("A"."COUNTRY_ID"="T"."COUNTRY_ID")
  13 - access("A"."COUNTRY_ID"="T"."COUNTRY_ID")


统计信息
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       4382  consistent gets
          1  physical reads
          0  redo size
       1916  bytes sent via SQL*Net to client
        438  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          3  sorts (memory)
          0  sorts (disk)
         35  rows processed

从执行计划中可以看出,对customers表进行了3次全扫描,对COUNTRIES进行了一次全扫描。两次执行cust子查询的只需要COUNTRIES_PK索引中的信息,因此对索引而不是表进行了1次全扫描,节省了少量的时间和资源。

清除共享池
alter system flush shared_pool;
清除缓冲区
alter system flush buffer_cache;

测试查询改变的影响

在前面,报告中需要的是任何国家一定收入层次的人员等于或超过该收入层次所有人员的25%。如果被要求如果某个收入层次的数目大于该收入层次总消费数的中间值,则将该收入层次也包括到报告中。

修改后的查询收入inline

SQL> WITH cust AS
  2   (SELECT /*+ inline gather_plan_statistics */    --查询国家的收入等级及对应国家
  3     t.cust_income_level, a.country_name
  4      FROM sh.customers t
  5      JOIN sh.countries a
  6        ON a.country_id = t.country_id
  7    ),
  8  median_income_set AS
  9   (SELECT /*+ inline */
 10     cust_income_level, COUNT(*) income_level_count --某个收入层次的数目大于该收入层次的中间数
 11      FROM cust
 12     GROUP BY cust_income_level
 13    HAVING COUNT(cust_income_level) > (SELECT MEDIAN(income_level_count) income_level_count
 14                                        FROM (SELECT cust_income_level,
 15                                                     COUNT(*) income_level_count
 16                                                FROM cust
 17                                               GROUP BY cust_income_level)))
 18  SELECT country_name,
 19         cust_income_level,
 20         COUNT(country_name) country_cust_count
 21    FROM cust c
 22  HAVING COUNT (country_name) > (SELECT COUNT(*) * .01 FROM cust c2) OR cust_income_level IN (SELECT mis.cust_income_level
 23                                                                                                FROM median_income_set mis)
 24   GROUP BY country_name, cust_income_level;

已选择123行。


执行计划
----------------------------------------------------------
Plan hash value: 1635819209

----------------------------------------------------------------------------------------

| Id  | Operation               | Name         | Rows  | Bytes | Cost (%CPU)| Time     |

----------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT        |              |    20 |   820 |   413   (2)| 00:00:05 |

|*  1 |  FILTER                 |              |       |       |            |       |

|   2 |   HASH GROUP BY         |              |    20 |   820 |   413   (2)| 00:00:05 |

|*  3 |    HASH JOIN            |              | 55500 |  2222K|   410   (1)| 00:00:05 |

|   4 |     TABLE ACCESS FULL   | COUNTRIES    |    23 |   345 |     3   (0)| 00:00:01 |

|   5 |     TABLE ACCESS FULL   | CUSTOMERS    | 55500 |  1409K|   406   (1)| 00:00:05 |

|   6 |   SORT AGGREGATE        |              |     1 |    10 |            |       |

|*  7 |    HASH JOIN            |              | 55500 |   541K|   408   (1)| 00:00:05 |

|   8 |     INDEX FULL SCAN     | COUNTRIES_PK |    23 |   115 |     1   (0)| 00:00:01 |

|   9 |     TABLE ACCESS FULL   | CUSTOMERS    | 55500 |   270K|   406   (1)| 00:00:05 |

|* 10 |   FILTER                |              |       |       |            |       |

|  11 |    HASH GROUP BY        |              |     1 |    31 |   411   (2)| 00:00:05 |

|* 12 |     HASH JOIN           |              | 55500 |  1680K|   408   (1)| 00:00:05 |

|  13 |      INDEX FULL SCAN    | COUNTRIES_PK |    23 |   115 |     1   (0)| 00:00:01 |

|  14 |      TABLE ACCESS FULL  | CUSTOMERS    | 55500 |  1409K|   406   (1)| 00:00:05 |

|  15 |    SORT GROUP BY        |              |     1 |    13 |            |       |

|  16 |     VIEW                |              |    12 |   156 |   411   (2)| 00:00:05 |

|  17 |      SORT GROUP BY      |              |    12 |   372 |   411   (2)| 00:00:05 |

|* 18 |       HASH JOIN         |              | 55500 |  1680K|   408   (1)| 00:00:05 |

|  19 |        INDEX FULL SCAN  | COUNTRIES_PK |    23 |   115 |     1   (0)| 00:00:01 |

|  20 |        TABLE ACCESS FULL| CUSTOMERS    | 55500 |  1409K|   406   (1)| 00:00:05 |

----------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(COUNT(*)> (SELECT COUNT(*)*.01 FROM "SH"."COUNTRIES"
              "A","SH"."CUSTOMERS" "T" WHERE "A"."COUNTRY_ID"="T"."COUNTRY_ID") OR  EXISTS

              (SELECT 0 FROM "SH"."COUNTRIES" "A","SH"."CUSTOMERS" "T" WHERE
              "A"."COUNTRY_ID"="T"."COUNTRY_ID" GROUP BY "T"."CUST_INCOME_LEVEL" HAVING

              "T"."CUST_INCOME_LEVEL"=:B1 AND COUNT("T"."CUST_INCOME_LEVEL")> (SELECT

              PERCENTILE_CONT(0.500000) WITHIN GROUP ( ORDER BY "INCOME_LEVEL_COUNT") FROM

              (SELECT "T"."CUST_INCOME_LEVEL" "CUST_INCOME_LEVEL",COUNT(*)
              "INCOME_LEVEL_COUNT" FROM "SH"."COUNTRIES" "A","SH"."CUSTOMERS" "T" WHERE

              "A"."COUNTRY_ID"="T"."COUNTRY_ID" GROUP BY "T"."CUST_INCOME_LEVEL")

              "from$_subquery$_005")))
 3 - access("A"."COUNTRY_ID"="T"."COUNTRY_ID")
 7 - access("A"."COUNTRY_ID"="T"."COUNTRY_ID")
 10 - filter("T"."CUST_INCOME_LEVEL"=:B1 AND COUNT("T"."CUST_INCOME_LEVEL")>
              (SELECT PERCENTILE_CONT(0.500000) WITHIN GROUP ( ORDER BY "INCOME_LEVEL_COUNT")

              FROM  (SELECT "T"."CUST_INCOME_LEVEL" "CUST_INCOME_LEVEL",COUNT(*)


              "INCOME_LEVEL_COUNT" FROM "SH"."COUNTRIES" "A","SH"."CUSTOMERS" "T" WHERE

              "A"."COUNTRY_ID"="T"."COUNTRY_ID" GROUP BY "T"."CUST_INCOME_LEVEL")

              "from$_subquery$_005"))
  12 - access("A"."COUNTRY_ID"="T"."COUNTRY_ID")
  18 - access("A"."COUNTRY_ID"="T"."COUNTRY_ID")


统计信息
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      23362  consistent gets
          0  physical reads
          0  redo size
       5460  bytes sent via SQL*Net to client
        504  bytes received via SQL*Net from client
         10  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
        123  rows processed

SQL>

修改后的查询收入materialize

增加了1次全表扫描和索引扫描,下面是允许临时表转换查询的性能输出:

SQL> WITH cust AS
  2   (SELECT /*+ materialize gather_plan_statistics */    --查询国家的收入等级及对应国家
  3     t.cust_income_level, a.country_name
  4      FROM sh.customers t
  5      JOIN sh.countries a
  6        ON a.country_id = t.country_id
  7    ),
  8  median_income_set AS
  9   (SELECT /*+ inline */
 10     cust_income_level, COUNT(*) income_level_count --某个收入层次的数目大于该收入层次的中间数
 11      FROM cust
 12     GROUP BY cust_income_level
 13    HAVING COUNT(cust_income_level) > (SELECT MEDIAN(income_level_count) income_level_count
 14                                        FROM (SELECT cust_income_level,
 15                                                     COUNT(*) income_level_count
 16                                                FROM cust
 17                                               GROUP BY cust_income_level)))
 18  SELECT country_name,
 19         cust_income_level,
 20         COUNT(country_name) country_cust_count
 21    FROM cust c
 22  HAVING COUNT (country_name) > (SELECT COUNT(*) * .01 FROM cust c2) OR cust_income_level IN (SELECT mis.cust_income_level
 23                                                                                                FROM median_income_set mis)
 24   GROUP BY country_name, cust_income_level;

已选择123行。


执行计划
----------------------------------------------------------
Plan hash value: 2452612

--------------------------------------------------------------------------------------------------------

| Id  | Operation                  | Name                      | Rows  | Bytes |
 Cost (%CPU)| Time     |

--------------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT           |                           |    20 |   620 |  499   (2)| 00:00:06 |

|   1 |  TEMP TABLE TRANSFORMATION |                           |       |       |           |          |

|   2 |   LOAD AS SELECT           |                           |       |       |            |          |

|*  3 |    HASH JOIN               |                           | 55500 |  2222K|   410   (1)| 00:00:05 |

|   4 |     TABLE ACCESS FULL      | COUNTRIES                 |    23 |   345 |     3   (0)| 00:00:01 |

|   5 |     TABLE ACCESS FULL      | CUSTOMERS                 | 55500 |  1409K|   406   (1)| 00:00:05 |

|*  6 |   FILTER                   |                           |       |       |            |          |

|   7 |    HASH GROUP BY           |                           |    20 |   620 |    89   (5)| 00:00:02 |

|   8 |     VIEW                   |                           | 55500 |  1680K|    86   (2)| 00:00:02 |

|   9 |      TABLE ACCESS FULL     | SYS_TEMP_0FD9D6607_644975 | 55500 |  1680K|    86   (2)| 00:00:02 |

|  10 |    SORT AGGREGATE          |                           |     1 |       |            |          |

|  11 |     VIEW                   |                           | 55500 |       |    86   (2)| 00:00:02 |

|  12 |      TABLE ACCESS FULL     | SYS_TEMP_0FD9D6607_644975 | 55500 |  1680K|    86   (2)| 00:00:02 |

|* 13 |    FILTER                  |                           |       |       |            |          |

|  14 |     HASH GROUP BY          |                           |     1 |    21 |    89   (5)| 00:00:02 |

|  15 |      VIEW                  |                           | 55500 |  1138K|    86   (2)| 00:00:02 |

|  16 |       TABLE ACCESS FULL    | SYS_TEMP_0FD9D6607_644975 | 55500 |  1680K|    86   (2)| 00:00:02 |

|  17 |     SORT GROUP BY          |                           |     1 |    13 |            |          |

|  18 |      VIEW                  |                           |    12 |   156 |    89   (5)| 00:00:02 |

|  19 |       SORT GROUP BY        |                           |    12 |   252 |    89   (5)| 00:00:02 |

|  20 |        VIEW                |                           | 55500 |  1138K|    86   (2)| 00:00:02 |

|  21 |         TABLE ACCESS FULL  | SYS_TEMP_0FD9D6607_644975 | 55500 |  1680K|    86   (2)| 00:00:02 |

--------------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("A"."COUNTRY_ID"="T"."COUNTRY_ID")
   6 - filter(COUNT("COUNTRY_NAME")> (SELECT COUNT(*)*.01 FROM  (SELECT /*+ CACH E_TEMP_TABLE

              ("T1") */ "C0" "CUST_INCOME_LEVEL","C1" "COUNTRY_NAME" FROM "SYS"."SYS_TEMP_0FD9D6607_644975"

              "T1") "C2") OR  EXISTS (SELECT 0 FROM  (SELECT /*+ CACHE_TEMP_TABLE ("T1") */ "C0"

              "CUST_INCOME_LEVEL","C1" "COUNTRY_NAME" FROM "SYS"."SYS_TEMP_0FD9D6607_644975" "T1") "CUST"

              GROUP BY "CUST_INCOME_LEVEL" HAVING "CUST_INCOME_LEVEL"=:B1 AND COUNT("CUST_INCOME_LEVEL")>

              (SELECT PERCENTILE_CONT(0.500000) WITHIN GROUP ( ORDER BY "INCOME_LEVEL_COUNT") FROM  (SELECT

              "CUST_INCOME_LEVEL" "CUST_INCOME_LEVEL",COUNT(*) "INCOME_LEVEL_COUNT" FROM  (SELECT /*+

              CACHE_TEMP_TABLE ("T1") */ "C0" "CUST_INCOME_LEVEL","C1" "COUNTRY_NAME" FROM

              "SYS"."SYS_TEMP_0FD9D6607_644975" "T1") "CUST" GROUP BY "CUST_INCOME_LEVEL")

              "from$_subquery$_005")))
  13 - filter("CUST_INCOME_LEVEL"=:B1 AND COUNT("CUST_INCOME_LEVEL")> (SELECT
              PERCENTILE_CONT(0.500000) WITHIN GROUP ( ORDER BY "INCOME_LEVEL_COUNT") FROM  (SELECT

              "CUST_INCOME_LEVEL" "CUST_INCOME_LEVEL",COUNT(*) "INCOME_LEVEL_COUNT" FROM  (SELECT /*+

              CACHE_TEMP_TABLE ("T1") */ "C0" "CUST_INCOME_LEVEL","C1" "COUNTRY_NAME" FROM

              "SYS"."SYS_TEMP_0FD9D6607_644975" "T1") "CUST" GROUP BY "CUST_INCOME_LEVEL")

              "from$_subquery$_005"))


统计信息
----------------------------------------------------------
        138  recursive calls
        317  db block gets
       6379  consistent gets
        303  physical reads
       1520  redo size
       5460  bytes sent via SQL*Net to client
        504  bytes received via SQL*Net from client
         10  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
        123  rows processed

因为在查询修改后的版本中增加了扫描次数,逻辑IO的支出更明显了。在这个查询中允许oracle进行表转换,将散列联结的结果写入到磁盘中的一张临时表中然后在查询中多次重用的效率就明显更高。

寻找其他优化机会

计算产口各个销售渠道的成本找出2000年所生的每种产品的平均,最小和最大成本。但下面的查询不仅阅读起来困难并且难以修改,而且在某种程度上效率也是不高的。

SQL>  --用来计算成本的老sql语句
SQL> SELECT /*+ gather_plan_statistics */
  2  SUBSTR(prod_name,1,30) prod_name,
  3  channel_desc,
  4  (
  5   SELECT AVG(c2.unit_cost) AS avg_cost FROM sh.costs c2
  6   WHERE c2.prod_id=c.prod_id AND c2.channel_id=c.channel_id
  7   AND c2.time_id BETWEEN to_date('01/01/2000','mm/dd/yyyy')
  8   AND to_date('12/31/2000','mm/dd/yyyy')
  9  ),
 10  (
 11   SELECT MIN(c2.unit_cost) AS min_cost FROM sh.costs c2
 12   WHERE c2.prod_id=c.prod_id AND c2.channel_id=c.channel_id
 13   AND c2.time_id BETWEEN to_date('01/01/2000','mm/dd/yyyy')
 14   AND to_date('12/31/2000','mm/dd/yyyy')
 15  ),
 16  (
 17   SELECT MAX(c2.unit_cost) AS max_cost FROM sh.costs c2
 18   WHERE c2.prod_id=c.prod_id AND c2.channel_id=c.channel_id
 19   AND c2.time_id BETWEEN to_date('01/01/2000','mm/dd/yyyy')
 20   AND to_date('12/31/2000','mm/dd/yyyy')
 21  )
 22  FROM (
 23   SELECT DISTINCT pr.prod_id,pr.prod_name,ch.channel_id,ch.channel_desc
 24   FROM  sh.channels ch,sh.products pr,sh.costs co
 25   WHERE ch.channel_id=co.channel_id
 26   AND co.prod_id=pr.prod_id
 27   AND co.time_id BETWEEN to_date('01/01/2000','mm/dd/yyyy')
 28   AND to_date('12/31/2000','mm/dd/yyyy')
 29  ) c
 30  ORDER BY prod_name,channel_desc;

已选择216行。


执行计划
----------------------------------------------------------
Plan hash value: 1877279774

------------------------------------------------------------------------------------------------------------------------------

| Id  | Operation                           | Name           | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |

------------------------------------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT                    |                | 20640 |  1310K|     |   640   (1)| 00:00:08 |       |       |

|   1 |  SORT AGGREGATE                     |                |     1 |    20 |     |            |          |       |       |

|   2 |   PARTITION RANGE ITERATOR          |                |    96 |  1920 |     |    17   (0)| 00:00:01 |    13 |    16 |

|*  3 |    TABLE ACCESS BY LOCAL INDEX ROWID| COSTS          |    96 |  1920 |     |    17   (0)| 00:00:01 |    13 |    16 |

|   4 |     BITMAP CONVERSION TO ROWIDS     |                |       |       |     |            |          |       |       |

|*  5 |      BITMAP INDEX SINGLE VALUE      | COSTS_PROD_BIX |       |       |     |            |          |    13 |    16 |

|   6 |  SORT AGGREGATE                     |                |     1 |    20 |     |            |          |       |       |

|   7 |   PARTITION RANGE ITERATOR          |                |    96 |  1920 |     |    17   (0)| 00:00:01 |    13 |    16 |

|*  8 |    TABLE ACCESS BY LOCAL INDEX ROWID| COSTS          |    96 |  1920 |     |    17   (0)| 00:00:01 |    13 |    16 |

|   9 |     BITMAP CONVERSION TO ROWIDS     |                |       |       |     |            |          |       |       |

|* 10 |      BITMAP INDEX SINGLE VALUE      | COSTS_PROD_BIX |       |       |     |            |          |    13 |    16 |

|  11 |  SORT AGGREGATE                     |                |     1 |    20 |     |            |          |       |       |

|  12 |   PARTITION RANGE ITERATOR          |                |    96 |  1920 |     |    17   (0)| 00:00:01 |    13 |    16 |

|* 13 |    TABLE ACCESS BY LOCAL INDEX ROWID| COSTS          |    96 |  1920 |     |    17   (0)| 00:00:01 |    13 |    16 |

|  14 |     BITMAP CONVERSION TO ROWIDS     |                |       |       |     |            |          |       |       |

|* 15 |      BITMAP INDEX SINGLE VALUE      | COSTS_PROD_BIX |       |       |     |            |          |    13 |    16 |

|  16 |  SORT ORDER BY                      |                | 20640 |  1310K|1632K|   640   (1)| 00:00:08 |       |       |

|  17 |   VIEW                              |                | 20640 |  1310K|     |   316   (2)| 00:00:04 |       |       |

|  18 |    HASH UNIQUE                      |                | 20640 |  1169K|1384K|   316   (2)| 00:00:04 |       |       |

|* 19 |     HASH JOIN                       |                | 20640 |  1169K|     |    25   (8)| 00:00:01 |       |       |

|  20 |      TABLE ACCESS FULL              | PRODUCTS       |    72 |  2160 |     |     3   (0)| 00:00:01 |       |       |

|* 21 |      HASH JOIN                      |                | 20640 |   564K|     |    21   (5)| 00:00:01 |       |       |

|  22 |       TABLE ACCESS FULL             | CHANNELS       |     5 |    65 |     |     3   (0)| 00:00:01 |       |       |

|  23 |       PARTITION RANGE ITERATOR      |                | 20640 |   302K|     |    17   (0)| 00:00:01 |    13 |    16 |

|* 24 |        TABLE ACCESS FULL            | COSTS          | 20640 |   302K|     |    17   (0)| 00:00:01 |    13 |    16 |

------------------------------------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("C2"."CHANNEL_ID"=:B1 AND "C2"."TIME_ID"<=TO_DATE(' 2000-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

   5 - access("C2"."PROD_ID"=:B1)
   8 - filter("C2"."CHANNEL_ID"=:B1 AND "C2"."TIME_ID"<=TO_DATE(' 2000-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

  10 - access("C2"."PROD_ID"=:B1)
  13 - filter("C2"."CHANNEL_ID"=:B1 AND "C2"."TIME_ID"<=TO_DATE(' 2000-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

  15 - access("C2"."PROD_ID"=:B1)
  19 - access("CO"."PROD_ID"="PR"."PROD_ID")
  21 - access("CH"."CHANNEL_ID"="CO"."CHANNEL_ID")
  24 - filter("CO"."TIME_ID"<=TO_DATE(' 2000-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))



统计信息
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      29642  consistent gets
          0  physical reads
          0  redo size
      14092  bytes sent via SQL*Net to client
        570  bytes received via SQL*Net from client
         16  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
        216  rows processed

将begin_date和end_date列放入单独的查询bookends 开始,只留下需要设定值的地方。产品的数据被放在prodmaster子查询中。尽管这几段sql语句放在子查询也可以实现其功能,但将它们移到因子化的子查询中大大地增强了sql语句整体的可读性。
平均、最大和最小成本的计算被一个称为cost_compare的子查询取代。最后,加入了联结prodmaster和cost_compare子查询的sql语句。

SQL> --使用with子句进行重构后的老sql语句
SQL> WITH bookends AS
  2   (SELECT to_date('01/01/2000', 'mm/dd/yyyy') begin_date,
  3           to_date('12/31/2000', 'mm/dd/yyyy') end_date
  4      FROM dual),
  5  prodmaster AS
  6   (SELECT DISTINCT pr.prod_id, pr.prod_name, ch.channel_id, ch.channel_desc
  7      FROM sh.channels ch, sh.products pr, sh.costs co
  8     WHERE ch.channel_id = co.channel_id
  9       AND co.prod_id = pr.prod_id
 10       AND co.time_id BETWEEN (SELECT begin_date FROM bookends) AND
 11           (SELECT end_date FROM bookends)),
 12  cost_compare AS
 13   (SELECT c2.prod_id,
 14           c2.channel_id,
 15           AVG(c2.unit_cost) avg_cost,
 16           MIN(c2.unit_cost) min_cost,
 17           MAX(c2.unit_cost) max_cost
 18      FROM sh.costs c2
 19     WHERE c2.time_id BETWEEN (SELECT begin_date FROM bookends) AND
 20           (SELECT end_date FROM bookends)
 21     GROUP BY c2.prod_id, c2.channel_id)
 22  SELECT /*+ gather_plan_statistics */
 23   SUBSTR(pm.prod_name, 1, 30) prod_name,
 24   pm.channel_desc,
 25   cc.avg_cost,
 26   cc.min_cost,
 27   cc.max_cost
 28    FROM prodmaster pm
 29    JOIN cost_compare cc
 30      ON cc.prod_id = pm.prod_id
 31     AND cc.channel_id = pm.channel_id
 32   ORDER BY pm.prod_id, pm.channel_id;

已选择216行。


执行计划
----------------------------------------------------------
Plan hash value: 2361085328

----------------------------------------------------------------------------------------------------------------------------

| Id  | Operation                                 | Name           | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |

----------------------------------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT                          |                |   138 | 12696 |    83   (5)| 00:00:01 |       |       |

|   1 |  MERGE JOIN                               |                |   138 | 12696 |    83   (5)| 00:00:01 |       |       |

|   2 |   SORT JOIN                               |                |   205 |  9430 |    44   (5)| 00:00:01 |       |       |

|   3 |    VIEW                                   |                |   205 |  9430 |    44   (5)| 00:00:01 |       |       |

|   4 |     HASH UNIQUE                           |                |   205 | 11890 |    44   (5)| 00:00:01 |       |       |

|*  5 |      HASH JOIN                            |                |   205 | 11890 |    39   (3)| 00:00:01 |       |       |

|   6 |       TABLE ACCESS FULL                   | PRODUCTS       |    72 |  2160 |     3   (0)| 00:00:01 |       |       |

|   7 |       MERGE JOIN                          |                |   205 |  5740 |    36   (3)| 00:00:01 |       |       |

|   8 |        TABLE ACCESS BY INDEX ROWID        | CHANNELS       |     5 |65 |     2   (0)| 00:00:01 |       |       |

|   9 |         INDEX FULL SCAN                   | CHANNELS_PK    |     5 |   |     1   (0)| 00:00:01 |       |       |

|* 10 |        SORT JOIN                          |                |   205 |  3075 |    34   (3)| 00:00:01 |       |       |

|  11 |         PARTITION RANGE ITERATOR          |                |   205 |  3075 |    33   (0)| 00:00:01 |   KEY |   KEY |

|  12 |          TABLE ACCESS BY LOCAL INDEX ROWID| COSTS          |   205 |  3075 |    33   (0)| 00:00:01 |   KEY |   KEY |

|  13 |           BITMAP CONVERSION TO ROWIDS     |                |       |   |            |          |       |       |

|* 14 |            BITMAP INDEX RANGE SCAN        | COSTS_TIME_BIX |       |   |            |          |   KEY |   KEY |

|  15 |             FAST DUAL                     |                |     1 |   |     2   (0)| 00:00:01 |       |       |

|  16 |             FAST DUAL                     |                |     1 |   |     2   (0)| 00:00:01 |       |       |

|* 17 |   SORT JOIN                               |                |   145 |  6670 |    39   (6)| 00:00:01 |       |       |

|  18 |    VIEW                                   |                |   145 |  6670 |    38   (3)| 00:00:01 |       |       |

|  19 |     HASH GROUP BY                         |                |   145 |  2900 |    38   (3)| 00:00:01 |       |       |

|  20 |      PARTITION RANGE ITERATOR             |                |   205 |  4100 |    33   (0)| 00:00:01 |   KEY |   KEY |

|  21 |       TABLE ACCESS BY LOCAL INDEX ROWID   | COSTS          |   205 |  4100 |    33   (0)| 00:00:01 |   KEY |   KEY |

|  22 |        BITMAP CONVERSION TO ROWIDS        |                |       |   |            |          |       |       |

|* 23 |         BITMAP INDEX RANGE SCAN           | COSTS_TIME_BIX |       |   |            |          |   KEY |   KEY |

|  24 |          FAST DUAL                        |                |     1 |   |     2   (0)| 00:00:01 |       |       |

|  25 |          FAST DUAL                        |                |     1 |   |     2   (0)| 00:00:01 |       |       |

----------------------------------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("CO"."PROD_ID"="PR"."PROD_ID")
  10 - access("CH"."CHANNEL_ID"="CO"."CHANNEL_ID")
       filter("CH"."CHANNEL_ID"="CO"."CHANNEL_ID")
  14 - access("CO"."TIME_ID">= (SELECT TO_DATE(' 2000-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') FROM "SYS"."DUAL"

              "DUAL") AND "CO"."TIME_ID"<= (SELECT TO_DATE(' 2000-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') FROM "SYS"."DUAL"

              "DUAL"))
  17 - access("CC"."PROD_ID"="PM"."PROD_ID" AND "CC"."CHANNEL_ID"="PM"."CHANNEL_ID")

       filter("CC"."CHANNEL_ID"="PM"."CHANNEL_ID" AND "CC"."PROD_ID"="PM"."PROD_ID")

  23 - access("C2"."TIME_ID">= (SELECT TO_DATE(' 2000-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') FROM "SYS"."DUAL"

              "DUAL") AND "C2"."TIME_ID"<= (SELECT TO_DATE(' 2000-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') FROM "SYS"."DUAL"

              "DUAL"))


统计信息
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       7436  consistent gets
          0  physical reads
          0  redo size
      13596  bytes sent via SQL*Net to client
        570  bytes received via SQL*Net from client
         16  SQL*Net roundtrips to/from client
          3  sorts (memory)
          0  sorts (disk)
        216  rows processed

将子查询因子化应用到pl/sql中

例子:

  • 只列出至少在3个不同的年份中都采购了产品的消费者
  • 按照产品类别分组统计每个消费者的购买总额
    用常规的pl/sql获取所需的数据
    查询满足标准的所有消费者并将他们的ID保存在一张临时表中
    然后在新保存的消费者ID中执行循环并找到他们,加起来,将这些信息插入到另一张临时表中。然后再将得到的结果与customers和products表联结以生成报告。
--用PL/SQL生成消费者报告
BEGIN
  EXECUTE IMMEDIATE 'drop table cust3year';
  EXECUTE IMMEDIATE 'drop table sales3year';
  EXCEPTION 
    WHEN OTHERS THEN
      NULL;
END;
/
create global temporary table cust3year(cust_id number);
CREATE GLOBAL TEMPORARY TABLE sales3year(
       cust_id NUMBER,
       prod_category VARCHAR2(50),
       total_sale NUMBER
)
/
BEGIN
  EXECUTE IMMEDIATE 'truncate table cust3year';
  EXECUTE IMMEDIATE 'truncate table sales3year';
  INSERT INTO cust3year
  SELECT cust_id--,count(cust_years) year_count
  FROM (
       SELECT DISTINCT cust_id,TRUNC(time_id,'YEAR') cust_years
       FROM sh.sales
  ) 
  GROUP BY cust_id
  HAVING COUNT(cust_years)>=3;
  --SELECT * FROM cust3year;
  FOR crec IN (SELECT cust_id FROM  cust3year)
    LOOP
      INSERT INTO sales3year 
      SELECT sa.cust_id,p.prod_category,SUM(co.unit_cost*sa.quantity_sold)
      FROM sh.sales sa
      JOIN sh.products p ON p.prod_id=sa.prod_id
      JOIN sh.costs co ON co.prod_id=sa.prod_id AND co.time_id=sa.time_id
      JOIN sh.customers cu ON cu.cust_id=sa.cust_id
      WHERE crec.cust_id=sa.cust_id
      GROUP BY sa.cust_id,p.prod_category; 
    END LOOP;
END;
/    
SELECT c3.cust_id,c.cust_last_name,c.cust_first_name,s3.prod_category,s3.total_sale FROM sales3year s3
JOIN cust3year c3 ON s3.cust_id=c3.cust_id
JOIN sh.customers c ON c.cust_id=s3.cust_id
ORDER BY 1,4;

上面是一段很好的PL/SQL程序块,如果考虑子查询因子化,还可以改进。首先将消费者ID的部分放到with子句中,接下来再利用子查询的结果生成报告所需的销售,产品和消费者信息就可以了。

--使用with子句生成消费者报告
WITH cust3year AS(
 SELECT cust_id
  FROM (
       SELECT DISTINCT cust_id,TRUNC(time_id,'YEAR') cust_years
       FROM sh.sales
  ) 
  GROUP BY cust_id
  HAVING COUNT(cust_years)>=3
),
sales3year AS (
  SELECT sa.cust_id,p.prod_category,SUM(co.unit_cost*sa.quantity_sold) AS total_sale
        FROM sh.sales sa
        JOIN sh.products p ON p.prod_id=sa.prod_id
        JOIN sh.costs co ON co.prod_id=sa.prod_id AND co.time_id=sa.time_id
        JOIN sh.customers cu ON cu.cust_id=sa.cust_id
        WHERE sa.cust_id IN (SELECT cust_id FROM cust3year)
        GROUP BY sa.cust_id,p.prod_category
)
SELECT c3.cust_id,c.cust_last_name,c.cust_first_name,s3.prod_category,s3.total_sale FROM sales3year s3
JOIN cust3year c3 ON s3.cust_id=c3.cust_id
JOIN sh.customers c ON c.cust_id=s3.cust_id
ORDER BY 1,4;

WITH custyear AS
 (SELECT sa.cust_id, EXTRACT(YEAR FROM time_id) sales_year
    FROM sh.sales sa
   WHERE EXTRACT(YEAR FROM time_id) BETWEEN 1998 AND 2002
   GROUP BY sa.cust_id, EXTRACT(YEAR FROM time_id)),
cust3year AS
 (SELECT DISTINCT c3.cust_id
    FROM (SELECT cust_id, COUNT(*) OVER(PARTITION BY cust_id) year_count
            FROM custyear) c3
   WHERE c3.year_count >= 3)
SELECT c.cust_id,
       c.cust_last_name,
       c.cust_first_name,
       p.prod_category,
       SUM(co.unit_price * sa.quantity_sold) AS total_sale
  FROM cust3year c3
  JOIN sh.sales sa
    ON sa.cust_id = c3.cust_id
  JOIN sh.products p
    ON p.prod_id = sa.prod_id
  JOIN sh.costs co
    ON co.prod_id = sa.prod_id
   AND co.time_id = sa.time_id
  JOIN sh.customers c
    ON c.cust_id = c3.cust_id
 GROUP BY c.cust_id, c.cust_last_name, c.cust_first_name, p.prod_category
 ORDER BY c.cust_id;

extract()函数将年份从日期中提取出来并转化为整形值以简化年份的比较

子查询因子化可以用来更好地组织一些查询,在某些情况下甚至可以用来作为性能调优的工具。学会使用它就等于在你的oracle工具箱中添加了一个新工具。

递归子查询

递归子查询因子化 recursive subquery factoring RSF

--基本的connect by 
SELECT LPAD(' ', LEVEL * 2 - 1, ' ') || emp.emp_last_name emp_last_name,
       emp.emp_first_name,
       emp.employee_id,
       emp.mgr_last_name,
       emp.mgr_first_name,
       emp.manager_id,
       emp.department_name
  FROM (SELECT /*+ inline gather_plan_statistics */
         e.last_name       emp_last_name,
         e.first_name      emp_first_name,
         e.employee_id,
         d.department_id,
         e.manager_id,
         d.department_name,
         es.last_name      mgr_last_name,
         es.first_name     mgr_first_name
          FROM hr.employees e
          LEFT OUTER JOIN hr.departments d
            ON e.department_id = d.department_id
          LEFT OUTER JOIN hr.employees es
            ON es.employee_id = e.manager_id) emp
CONNECT BY PRIOR emp.employee_id = emp.manager_id
 START WITH emp.manager_id IS NULL
 ORDER SIBLINGS BY emp.emp_last_name;

内嵌视图emp用来与employee和department表进行联结,然后将一个数据集提供给select ... connect by语句。用prior运算符来将当前的employee_id与另一行中的manager_id列值匹配。反复的这么做就建立了一个递归查询。
start with子句是用来指引从manager_id为空的那一行开始。level伪列保存了递归的深度值,使得可以通过一个简单的方法来输出进行缩进,从而可以直观地看出组织层次结构。

RSF示例

--基本的递归子查询因子化
WITH emp AS
 (SELECT /*+ inline gather_plan_statistics */
   e.last_name, e.first_name, e.employee_id, e.manager_id, d.department_name
    FROM hr.employees e
    LEFT OUTER JOIN hr.departments d
      ON e.department_id = d.department_id),
emp_recurse(last_name,
first_name,
employee_id,
manager_id,
department_name,
lvl) AS
 (SELECT e.last_name AS last_name,
         e.first_name AS first_name,
         e.employee_id AS employee_id,
         e.manager_id AS manager_id,
         e.department_name AS department_name,
         1 AS lvl
    FROM emp e
   WHERE e.manager_id IS NULL
  UNION ALL
  SELECT emp.last_name AS last_name,
         emp.first_name AS first_name,
         emp.employee_id AS employee_id,
         emp.manager_id AS manager_id,
         emp.department_name AS department_name,
         empr.lvl + 1 AS lvl
    FROM emp
    JOIN emp_recurse empr
      ON empr.employee_id = emp.manager_id)
search DEPTH FIRST BY last_name SET order1      
SELECT LPAD(' ', lvl * 2 - 1, ' ') || er.last_name last_name,
       er.first_name,
       er.department_name
  FROM emp_recurse er;

递归的with子句需要两个查询块:定位点成员和递归成员。这两个子查询块必须通过集合运算符union all结合到一起。定位点成员是union all之前的查询,而递归成员是其后面的查询。递归子查询必须引用定义子查询,这样就进行了递归。

RSF的限制条件

RSF的使用比connect by要灵活得多,但是,它的使用也有一些限制:

  • distinct关键字或group by子句
  • model子句
  • 聚合函数,但在select列表中可以使用分析函数
  • 引用query_name的子查询
  • 引用query_name作为右表的外联结

与connect by的不同点

与connect by相比较,rsf查询返回的列必须在查询暄义中声明,如emp_recurse(last_name,first_name,employee_id,manager_id,department_name,lvl)
search depth first,默认的搜索是breadth first,这通常不是一个层级型查询所想要的输出。breadth first搜索在返回任何子数据行之前返回每一层级上的兄弟数据行。指定search depth first将会按照层级的顺序返回数据行。search子句中的set order1部分将order1伪列的值设置为数据行返回的顺序值。

类型 名称 用途
函数 sys_connect_by_path 返回当前数据行的所有祖先
运算符 connect_by_root 返回根数据行的值
运算符 prior 用来表明层级型查询,在递归子查询中不需要
伪列 connect_by_iscycle 在层级中检测循环
参数 nocycle connect by的参数,与connect_by_iscycle一起使用
伪列 connect_by_isleaf 标识叶子数据行
伪列 level 用来表明层级中的深度

level伪列

--level伪列
SELECT LPAD(' ', LEVEL * 2 - 1, ' ') || e.last_name last_name, LEVEL
  FROM hr.employees e
CONNECT BY PRIOR e.employee_id = e.manager_id
 START WITH e.manager_id IS NULL
 ORDER SIBLINGS BY e.last_name;

在层级型查询中经常被用来实现输出缩进,使得层级看起来很直观。;

--创建lvl列
 WITH emp_recurse(employee_id,manager_id,last_name,lvl) AS (
      SELECT e.employee_id,NULL,e.last_name,1 AS lvl
      FROM hr.employees e
      WHERE e.manager_id IS NULL
      UNION ALL
      SELECT e1.employee_id,e1.manager_id,e1.last_name,e2.lvl+1 AS lvl
      FROM hr.employees e1
      JOIN emp_recurse e2 ON e2.employee_id=e1.manager_id  
 )
 search DEPTH FIRST BY last_name SET last_time_order
 SELECT LPAD(' ', r.lvl * 2 - 1, ' ') || r.last_name last_name,r.lvl
 FROM emp_recurse r
 ORDER BY last_time_order;

sys_connect_by_path函数

用来返回组成层级的直到当前的行的值。下面的列子用sys_connect_by_path函数用来建立一个冒号分隔的从根到节点的层级。

 --sys_connect_by_path
 SELECT LPAD(' ', 2 * (LEVEL - 1)) || e.last_name AS last_name,
        sys_connect_by_path(last_name, ':') path
   FROM hr.employees e
  START WITH e.manager_id IS NULL
 CONNECT BY PRIOR e.employee_id = e.manager_id
  ORDER SIBLINGS BY e.last_name;

尽管sys_connect_by_path函数不能在RSF查询中使用,你可以使用与重新产生的level伪列几乎相同的方法来复制这个函数的功能。现在不用使用计算器来计数,而是附加一个字符串值。

--建立你自己的sys_connect_by_path函数
 WITH emp_recurse(employee_id,manager_id,last_name,lvl,PATH) AS (
      SELECT e.employee_id,NULL,e.last_name,1 AS lvl,':'||to_char(e.last_name) AS path
      FROM hr.employees e
      WHERE e.manager_id IS NULL
      UNION ALL
      SELECT e1.employee_id,e1.manager_id,e1.last_name,e2.lvl+1 AS lvl,e2.path||':'||to_char(e1.last_name) AS path
      FROM hr.employees e1
      JOIN emp_recurse e2 ON e2.employee_id=e1.manager_id  
)
 search DEPTH FIRST BY last_name SET last_time_order
 SELECT LPAD(' ', r.lvl * 2 - 1, ' ') || r.last_name last_name,r.path
 FROM emp_recurse r
 ORDER BY last_time_order;
oracle学习笔记10: 子查询因子化_第1张图片
建立你自己的sys_connect_by_path函数

如果你需要将层级显示为逗号分隔的列表,sys_connect_by_path无法做到,因为sys_connect_by_path函数的问题在于输出中的第一个字符必须是冒号。

 --RSF逗号分隔的路径
 WITH emp_recurse(employee_id,manager_id,last_name,lvl,PATH) AS (
      SELECT e.employee_id,NULL,e.last_name,1 AS lvl,to_char(e.last_name) AS path
      FROM hr.employees e
      WHERE e.manager_id IS NULL
      UNION ALL
      SELECT e1.employee_id,e1.manager_id,e1.last_name,e2.lvl+1 AS lvl,e2.path||','||to_char(e1.last_name) AS path
      FROM hr.employees e1
      JOIN emp_recurse e2 ON e2.employee_id=e1.manager_id  
)
 search DEPTH FIRST BY last_name SET last_time_order
 SELECT LPAD(' ', r.lvl * 2 - 1, ' ') || r.last_name last_name,r.path
 FROM emp_recurse r
 ORDER BY last_time_order;
oracle学习笔记10: 子查询因子化_第2张图片
RSF逗号分隔的路径

connect_by_root运算符

这个运算符强化了connect by语法,使得它可以返回当前行的根节点。

 --connect_by_root
 UPDATE hr.employees SET manager_id=NULL WHERE last_name='Kochhar';
 SELECT /*+ inline gather_plan_statistics */
 LEVEL,LPAD(' ',2*(LEVEL-1))||last_name last_name,first_name,
 connect_by_root last_name AS root,
 sys_connect_by_path(last_name,':') PATH
 FROM hr.employees
 WHERE connect_by_root last_name='Kochhar'
 CONNECT BY PRIOR employee_id=manager_id
 START WITH manager_id IS NULL;
oracle学习笔记10: 子查询因子化_第3张图片
connect_by_root
 --复制connect_by_root运算符功能
 WITH emp_recurse(employee_id,manager_id,last_name,lvl,path) AS (
   SELECT /*+ gather_plan_statistics */
   e.employee_id,NULL AS manager_id,
   e.last_name,1 AS lvl,
   ':'||e.last_name||':' AS path
   FROM hr.employees e
   WHERE e.manager_id IS NULL
   UNION ALL
   SELECT
   e.employee_id,e.manager_id,
   e.last_name,er.lvl+1 AS lvl,
   er.path||e.last_name||':' AS path
   FROM hr.employees e
   JOIN emp_recurse er ON er.employee_id=e.manager_id
   JOIN hr.employees e2 ON e2.employee_id=e.manager_id
 )
 search DEPTH FIRST BY last_name SET order1,
 emps AS (
   SELECT lvl,last_name,path,SUBSTR(path,2,INSTR(path,':',2)-2) root
   FROM emp_recurse
 )
 SELECT lvl,LPAD(' ',2*(lvl-1))|| last_name last_name,
 root,path FROM emps
 WHERE root='Kochhar';
oracle学习笔记10: 子查询因子化_第4张图片
复制connect_by_root运算符功能

connect_by_iscycle伪列和nocycle参数

connect_by_iscycle伪列使得在层级中检测循环变得很容易。
这里将smith设置为king的经理来故意引入了一个错误,这将导致connect by中出现错误。

 --connect by中的循环错误
 SELECT * FROM hr.employees WHERE employee_id IN (100,171);
 --将Smith设置为King的经理
 UPDATE hr.employees SET manager_id=171 WHERE employee_id=100;
 SELECT LPAD(' ',2*(LEVEL-1))|| last_name last_name,
 first_name,employee_id,LEVEL
 FROM hr.employees 
 START WITH employee_id=100
 CONNECT BY PRIOR employee_id=manager_id;
oracle学习笔记10: 子查询因子化_第5张图片
connect by中的循环错误

nocycle和connect_by_iscycle可以用来检测层级中的循环。nocycle参数可以阻止发ora-1436错误,使得所有行都要以输出。connect_by_iscycle运算符使得你可以很容易地找到导致错误发生的行。

 --通过connect_by_iscycle检测循环
 SELECT LPAD(' ',2*(LEVEL-1))|| last_name last_name,
 first_name,employee_id,LEVEL,
 connect_by_iscycle
 FROM hr.employees 
 START WITH employee_id=100
 CONNECT BY NOCYCLE PRIOR employee_id=manager_id;
oracle学习笔记10: 子查询因子化_第6张图片
通过connect_by_iscycle检测循环

connect_by_iscycle的值为1,表示smith的那一行数据导致了错误。接下来查询Smith的数据,所有一切看上去都很正常。最后,你再以Smith的员工ID寻找他所管理的所有员工,错误就是公司总裁没有经理。因此解决办法不是将这一行的manager_id设置回空值。

SELECT e.last_name, e.first_name, e.employee_id, e.manager_id
  FROM hr.employees e
 WHERE e.employee_id = 171
    OR e.manager_id = 171;
公司总裁没有经理
--在递归查询中检测循环
WITH emp(employee_id,manager_id,last_name,first_name,lvl) AS (
 SELECT e.employee_id,NULL AS manager_id,e.last_name,e.first_name,1 AS lvl
 FROM hr.employees e
 WHERE e.employee_id=100
 UNION ALL 
 SELECT e.employee_id,e.manager_id,e.last_name,e.first_name,emp.lvl+1 AS lvl
 FROM hr.employees e
 JOIN emp ON emp.employee_id=e.manager_id
)
search DEPTH FIRST BY last_name SET order1
CYCLE employee_id SET is_cycle TO '1' DEFAULT '0'
SELECT LPAD(' ',2*(lvl-1))||last_name last_name,first_name,employee_id,lvl,is_cycle 
FROM emp ORDER BY order1;

oracle学习笔记10: 子查询因子化_第7张图片
在递归查询中检测循环

注意: cycle子句让你将is_cycle列值设置为0或1的。这里只允许单值字符。这一列的名称同样是用户自定义的。检查输出,可以看到RSF中的cycle子句在指明导致数据循环的行时做得更好。出现错误的数据行很清楚地标记为King那一行,因此可以查询那一行并迅速确定错误所在。

connect_by_isleaf伪列

connect_by_isleaf用来在层级数据中识别叶子节点。

--connect_by_isleaf伪列
SELECT LPAD(' ',2*(LEVEL-1))|| e.last_name last_name,connect_by_isleaf
FROM hr.employees e
START WITH e.manager_id IS NULL
CONNECT BY PRIOR e.employee_id=e.manager_id
ORDER SIBLINGS BY e.last_name;
oracle学习笔记10: 子查询因子化_第8张图片
connect_by_isleaf伪列

RSF中要复制这一点还比较困难的。你需要在员工层级中标识出叶子节点,从定义上来说,叶子节点都不是经理。所有不是经理的行就是叶子节点。

--在递归查询中找出叶子节点
WITH leaves AS (
 SELECT e.employee_id FROM hr.employees e
 WHERE e.employee_id NOT IN (
  SELECT manager_id FROM hr.employees WHERE manager_id IS NOT NULL
 )
),
emp(manager_id,employee_id,last_name,lvl,isleaf) AS (
 SELECT e.manager_id,e.employee_id,e.last_name,1 AS lvl,0 AS isleaf
 FROM hr.employees e
 WHERE e.manager_id IS NULL
 UNION ALL
 SELECT e.manager_id,nvl(e.employee_id,NULL),e.last_name,emp.lvl+1 AS lvl, DECODE(l.employee_id,NULL,0,1) AS isleaf
 FROM hr.employees e
 JOIN emp ON emp.employee_id=e.manager_id
 LEFT OUTER JOIN leaves l ON l.employee_id=e.employee_id
)
search DEPTH FIRST BY last_name SET order1
SELECT LPAD(' ',2*(lvl-1))||last_name last_name, isleaf 
FROM emp;
oracle学习笔记10: 子查询因子化_第9张图片
在递归查询中找出叶子节点

leaves子查询被用来寻找叶子节点,然后将结果与employees表进行左外联结。leaves.employee_id列的值表时当前行是否是叶子。
别一种方法利用分析函数lead()使用lvl列的值来确定数据行是否为叶子节点。lead()函数依赖seach子句中所定的last_name_order列的值。

--使用lead()寻找叶子节点
WITH emp(manager_id,employee_id,last_name,lvl) AS (
 SELECT e.manager_id,e.employee_id,e.last_name,1 AS lvl
 FROM hr.employees e
 WHERE e.manager_id IS NULL
 UNION ALL
 SELECT e.manager_id,nvl(e.employee_id,NULL),e.last_name,emp.lvl+1 AS lvl
 FROM hr.employees e
 JOIN emp ON emp.employee_id=e.manager_id
)
search DEPTH FIRST BY last_name SET last_name_order
SELECT LPAD(' ',2*(lvl-1))||last_name last_name,lvl,
LEAD(lvl) OVER(ORDER BY last_name_order) leadlvlorder,
CASE
  WHEN (lvl-LEAD(lvl) OVER (ORDER BY last_name_order))<0
  THEN 
    0
  ELSE 1
  END isleaf
FROM emp;
oracle学习笔记10: 子查询因子化_第10张图片
使用lead()寻找叶子节点

如果search子从depth first改为breadth first,它因为依赖数据的顺序而显得有点脆弱,这样的输出有可能是不正确的,如下的运行:

--使用breadth first的lead()
WITH emp(manager_id,employee_id,last_name,lvl) AS (
 SELECT e.manager_id,e.employee_id,e.last_name,1 AS lvl
 FROM hr.employees e
 WHERE e.manager_id IS NULL
 UNION ALL
 SELECT e.manager_id,nvl(e.employee_id,NULL),e.last_name,emp.lvl+1 AS lvl
 FROM hr.employees e
 JOIN emp ON emp.employee_id=e.manager_id
)
search breadth FIRST BY last_name SET last_name_order
SELECT LPAD(' ',2*(lvl-1))||last_name last_name,lvl,
LEAD(lvl) OVER(ORDER BY last_name_order) leadlvlorder,
CASE
  WHEN (lvl-LEAD(lvl) OVER (ORDER BY last_name_order))<0
  THEN 
    0
  ELSE 1
  END isleaf
FROM emp;
oracle学习笔记10: 子查询因子化_第11张图片
使用lead()寻找叶子节点

尽管在大多数据实践中你都可以使用在递归因子化子查询中复制connect by的功能,但很多情况下,全用connect by语法更简单,在RSF中做同样的事件在多数情况下需要更多的SQL代码。connect by可以产生比RSF更好的执行计划,铖是对于相对简单的查询。

你可能感兴趣的:(oracle学习笔记10: 子查询因子化)