SQL调优——调优技巧

文章目录

  • 1、查看真实的基数(Rows)
  • 2、使用UNION代替OR
  • 3、分页语句优化思路
    • 3.1、单表分页优化思路
    • 3.2、多表关联分页优化思路
  • 4、使用分析函数优化自连接
  • 5、超大表与超小表关联优化方法
  • 6、超大表与超大表关联优化方法
  • 7、LIKE语句优化方法
  • 8、DBLINK优化
  • 9、对表进行ROWID切片
  • 10、SQL三段分拆法

1、查看真实的基数(Rows)

执行计划中的Rows是假的,是CBO根据统计信息和数学公式估算出来的,所以在看执行计划的时候,一定要注意嵌套循环驱动表的Rows是否估算准确,同时也要注意执行计划的入口Rows是否算错。因为一旦嵌套循环驱动表的Rows估算错误,执行计划就错了。如果执行计划的入口Rows估算错误,那执行计划也就不用看了,后面全错。

现有如下执行计划:

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------

Plan hash value: 3215660883

-------------------------------------------------------------------------------------
| Id |Operation                     |Name                 |Rows | Bytes | Cost(%CPU)|
-------------------------------------------------------------------------------------
|  0 |SELECT STATEMENT              |                     |   78|  4212 | 15507  (1)|
|  1 | HASH GROUP BY                |                     |   78|  4212 | 15507  (1)|
|  2 |  NESTED LOOPS                |                     |     |       |           |
|  3 |   NESTED LOOPS               |                     | 3034|   159K| 15506  (1)|
|* 4 |    TABLE ACCESS FULL         |OPT_REF_UOM_TEMP_SDIM| 2967|   101K|   650 (14)|
|* 5 |    INDEX RANGE SCAN          |PROD_DIM_PK          |    3|       |     2  (0)|
|* 6 |   TABLE ACCESS BY INDEX ROWID|PROD_DIM             |    1|    19 |     5  (0)|
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter("UOM"."RELTV_CURR_QTY"=1)
   5 - access("PROD"."PROD_SKID"="UOM"."PROD_SKID")
   6 - filter("PROD"."BUOM_CURR_SKID" IS NOT NULL AND "PROD"."PROD_END_DATE"=TO_DATE('
              9999-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "PROD"."CURR_IND"='Y' AND
              "PROD"."BUOM_CURR_SKID"="UOM"."UOM_SKID")

22 rows selected.

执行计划中Id=4是嵌套循环的驱动表,同时也是执行计划的入口,CBO估算它只返回2 967行数据。Id=4前面有“*”号,表示有谓词过滤4 - filter(“UOM”.“RELTV_CURR_QTY”=1)。

根据执行计划中Id=4的谓词信息,手动计算Id=4应该返回真正的Rows如下:

SQL> select count(*) from OPT_REF_UOM_TEMP_SDIM where "RELTV_CURR_QTY"=1;

  COUNT(*)
----------
    946432

手动计算出的Rows返回了946 432行数据,与执行计划中的2967行相差巨大,所以本示例中,执行计划是错误的。

2、使用UNION代替OR

当SQL语句中同时有or和子查询,这种情况下子查询无法展开(unnest),只能走FILTER。遇到这种情况我们可以将SQL改写为union,从而消除FILTER。

带有or子查询的写法与执行计划如下:

SQL> select *
  2    from t1
  3   where owner = 'SCOTT'
  4      or object_id in (select object_id from t2);

72571 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 895956251

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |  3378 |   682K|   235   (1)| 00:00:03 |
|*  1 |  FILTER            |      |       |       |            |          |
|   2 |   TABLE ACCESS FULL| T1   | 56766 |    11M|   235   (1)| 00:00:03 |
|*  3 |   TABLE ACCESS FULL| T2   |   734 |  9542 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("OWNER"='SCOTT' OR  EXISTS (SELECT 0 FROM "T2" "T2" WHERE
              "OBJECT_ID"=:B1))
   3 - filter("OBJECT_ID"=:B1)

改写为union的写法如下:

SQL> select * from t1 where owner='SCOTT'
  2  union
  3  select * from t1 where object_id in(select object_id from t2);

72571 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 696035008

--------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      | 56778 |    11M|       |  4088  (95)|
|   1 |  SORT UNIQUE         |      | 56778 |    11M|    12M|  4088  (95)|
|   2 |   UNION-ALL          |      |       |       |       |            |
|*  3 |    TABLE ACCESS FULL | T1   |    12 |  2484 |       |   234   (1)|
|*  4 |    HASH JOIN         |      | 56766 |    11M|  1800K|  1146   (1)|
|   5 |     TABLE ACCESS FULL| T2   | 73407 |   931K|       |   234   (1)|
|   6 |     TABLE ACCESS FULL| T1   | 56766 |    11M|       |   235   (1)|
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("OWNER"='SCOTT')
   4 - access("OBJECT_ID"="OBJECT_ID")

改写为union之后,消除了FILTER。如果无法改写SQL,那么SQL就只能走FILTER,这时我们需要在子查询表的连接列(t2.object_id)建立索引。

3、分页语句优化思路

分页语句最能考察一个人究竟会不会SQL优化,因为分页语句优化几乎囊括了SQL优化必须具备的知识。

3.1、单表分页优化思路

我们先创建一个测试表T_PAGE:

SQL> create table t_page as select * from dba_objects;

Table created.

现有如下SQL(没有过滤条件,只有排序),要将查询结果分页显示,每页显示10条:

select * from t_page order by object_id;

大家可能会采用以下这种分页框架(错误的分页框架):

select *
  from (select t.*, rownum rn from (需要分页的SQL) t)
 where rn >= 1
   and rn <= 10;

采用这种分页框架会产生严重的性能问题。现在将SQL语句代入错误的分页框架中:

SQL> select *
  2    from (select t.*, rownum rn
  3            from (select * from t_page order by object_id) t)
  4   where rn >= 1
  5     and rn <= 10;

10 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3603170480

-----------------------------------------------------------------------------
| Id  | Operation             | Name   | Rows  | Bytes |TempSpc| Cost (%CPU)|
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |        | 61800 |    12M|       |  3020   (1)|
|*  1 |  VIEW                 |        | 61800 |    12M|       |  3020   (1)|
|   2 |   COUNT               |        |       |       |       |            |
|   3 |    VIEW               |        | 61800 |    12M|       |  3020   (1)|
|   4 |     SORT ORDER BY     |        | 61800 |    12M|    14M|  3020   (1)|
|   5 |      TABLE ACCESS FULL| T_PAGE | 61800 |    12M|       |   236   (1)|
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("RN"<=10 AND "RN">=1)

从执行计划中我们可以看到该SQL走了全表扫描,假如T_PAGE有上亿条数据,先要将该表(上亿条的表)进行排序(SORT ORDER BY),再取出其中10行数据,这时该SQL会产生严重的性能问题。所以该SQL不能走全表扫描,必须走索引扫描。

该SQL没有过滤条件,只有排序,我们可以利用索引已经排序这个特性来优化分页语句,也就是说要将分页语句中的SORT ORDER BY消除。一般分页语句中都有排序。

现在我们对排序列object_id建立索引,在索引中添加一个常量0,注意0不能放前面:

SQL> create index idx_page on t_page(object_id,0);

Index created.

为什么要在索引中添加一个常量0呢?这是因为object_id列允许为null,如果不添加常量(不一定是0,可以是1、2、3,也可以是英文字母),索引中就不能存储null值,然而SQL并没有写成以下写法:

select * from t_page where object_id is not null order by object_id;

因为SQL中并没有剔除null值,所以我们必须要添加一个常量,让索引存储null值,这样才能使SQL走索引。现在我们来看一下强制走索引的A-Rows执行计划(因为涉及到排版和美观,执行计划中删掉了A-Time等数据)。

SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------
SQL_ID  fw6ym4n8njxqf, child number 0
-------------------------------------
select *   from (select t.*, rownum rn           from (select
      / *+ index(t_page idx_page) */                  *
 from t_page                  order by object_id) t)  where rn >= 1
and rn <= 10

Plan hash value: 3119682446

-------------------------------------------------------------------------------------
| Id |Operation                      | Name     | Starts | E-Rows | A-Rows | Buffers |
-------------------------------------------------------------------------------------
|  0 |SELECT STATEMENT               |          |      1 |        |     10 |    1287 |
|* 1 | VIEW                          |          |      1 |  61800 |     10 |    1287 |
|  2 |  COUNT                        |          |      1 |        |  72608 |    1287 |
|  3 |   VIEW                        |          |      1 |  61800 |  72608 |    1287 |
|  4 |    TABLE ACCESS BY INDEX ROWID| T_PAGE   |      1 |  61800 |  72608 |    1287 |
|  5 |     INDEX FULL SCAN           | IDX_PAGE |      1 |  61800 |  72608 |     183 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(("RN"<=10 AND "RN">=1))

因为SQL语句中没有where过滤条件,强制走索引只能走INDEX FULL SCAN,无法走索引范围扫描(INDEX RANGE SCAN)。我们注意看执行计划中A-Rows这列,INDEX FULL SCAN扫描了索引中所有叶子块,因为INDEX FULL SCAN返回了72 608行数据(表的总行数),一共耗费了1 287个逻辑读(Buffers=1287)。理想的执行计划是:INDEX FULL SCAN只扫描1个(最多几个)索引叶子块,扫描10行数据(A-Rows=10)就停止了。为什么没有走最理想的执行计划呢?这是因为分页框架错了!

下面才是正确的分页框架:

select *
  from (select *
          from (select a.*, rownum rn
                  from (需要分页的SQL) a)
         where rownum <= 10)
 where rn >= 1;

现在将SQL代入正确的分页框架中,强制走索引,查看A-Rows的执行计划(因为涉及到排版和美观,执行计划中删掉了A-Time等数据):

SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------
SQL_ID  4vyrpd0h4w30z, child number 0
-------------------------------------
select *   from (select *           from (select a.*, rownum rn
          from (select / *+ index(t_page idx_page) */
      *                           from t_page
order by object_id) a)          where rownum <= 10)  where rn >= 1

Plan hash value: 1201925926

-------------------------------------------------------------------------------------
| Id |Operation                       | Name     | Starts | E-Rows | A-Rows |Buffers|
-------------------------------------------------------------------------------------
|  0 |SELECT STATEMENT                |          |      1 |        |     10 |      5|
|* 1 | VIEW                           |          |      1 |     10 |     10 |      5|
|* 2 |  COUNT STOPKEY                 |          |      1 |        |     10 |      5|
|  3 |   VIEW                         |          |      1 |  61800 |     10 |      5|
|  4 |    COUNT                       |          |      1 |        |     10 |      5|
|  5 |     VIEW                       |          |      1 |  61800 |     10 |      5|
|  6 |     TABLE ACCESS BY INDEX ROWID| T_PAGE   |      1 |  61800 |     10 |      5|
|  7 |      INDEX FULL SCAN           | IDX_PAGE |      1 |  61800 |     10 |      3|
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("RN">=1)
   2 - filter(ROWNUM<=10)

从执行计划中我们可以看到,SQL走了INDEX FULL SCAN,只扫描了10条数据(Id=7 A-Rows=10)就停止了(Id=2COUNT STOPKEY),一共只耗费了5个逻辑读(Buffers=5)。该执行计划利用索引已经排序特性(执行计划中没有SORT ORDER BY),扫描索引获取了10条数据;然后再利用了COUNT STOPKEY特性,获取到分页语句需要的数据,SQL立即停止运行,这才是最佳执行计划。

为什么错误的分页框架会导致性能很差呢?因为错误的分页框架这种写法没有COUNT STOPKEY(where rownum<=…)功能,COUNT STOPKEY就是当扫描到指定行数的数据之后,SQL就停止运行。

现在我们得到分页语句的优化思路:如果分页语句中有排序(order by),要利用索引已经排序特性,将order by的列包含在索引中,同时也要利用rownum的COUNT STOPKEY特性来优化分页SQL。如果分页中没有排序,可以直接利用rownum的COUNT STOPKEY特性来优化分页SQL。

现有如下SQL(注意,过滤条件是等值过滤,当然也有order by),现在要将查询结果分页显示,每页显示10条:

select * from t_page where owner = 'SCOTT' order by object_id;
select * from t_page where owner = 'SYS' order by object_id;

第一条SQL语句的过滤条件是where owner=‘SCOTT’,该过滤条件能过滤掉表中绝大部分数据。第二条SQL语句的过滤条件是where owner=‘SYS’,该过滤条件能过滤表中一半数据。

我们将上述SQL代入正确的分页框架中强制走索引(object_id列的索引,因为到目前为止t_page只有该列建立了索引),查看A-Rows的执行计划(因为涉及到排版和美观,执行计划中删掉了A-Time等数据):

SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------
SQL_ID  7s4mhq8sz19da, child number 0
-------------------------------------
select *   from (select *           from (select a.*, rownum rn
          from (select / *+ index(t_page idx_page) */
      *                           from t_page
where owner = 'SCOTT'                          order by object_id) a)
       where rownum <= 10)  where rn >= 1

Plan hash value: 1201925926

-------------------------------------------------------------------------------------
| Id | Operation                        |Name    | Starts | E-Rows | A-Rows |Buffers|
-------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                 |        |      1 |        |     10 |   1273|
|* 1 |  VIEW                            |        |      1 |     10 |     10 |   1273|
|* 2 |   COUNT STOPKEY                  |        |      1 |        |     10 |   1273|
|  3 |    VIEW                          |        |      1 |     57 |     10 |   1273|
|  4 |     COUNT                        |        |      1 |        |     10 |   1273|
|  5 |      VIEW                        |        |      1 |     57 |     10 |   1273|
|* 6 |       TABLE ACCESS BY INDEX ROWID|T_PAGE  |      1 |     57 |     10 |   1273|
|  7 |        INDEX FULL SCAN           |IDX_PAGE|      1 |  61800 |  72427 |    183|
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("RN">=1)
   2 - filter(ROWNUM<=10)
   6 - filter("OWNER"='SCOTT')

SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------
SQL_ID  bn5k602hpdcq1, child number 0
-------------------------------------
select *   from (select *           from (select a.*, rownum rn
          from (select / *+ index(t_page idx_page) */
      *                           from t_page
where owner = 'SYS'                          order by object_id) a)
     where rownum <= 10)  where rn >= 1

Plan hash value: 1201925926

-------------------------------------------------------------------------------------
| Id |Operation                       | Name     | Starts | E-Rows | A-Rows |Buffers|
-------------------------------------------------------------------------------------
|  0 |SELECT STATEMENT                |          |      1 |        |     10 |      5|
|* 1 | VIEW                           |          |      1 |     10 |     10 |      5|
|* 2 |  COUNT STOPKEY                 |          |      1 |        |     10 |      5|
|  3 |   VIEW                         |          |      1 |  28199 |     10 |      5|
|  4 |    COUNT                       |          |      1 |        |     10 |      5|
|  5 |     VIEW                       |          |      1 |  28199 |     10 |      5|
|* 6 |     TABLE ACCESS BY INDEX ROWID| T_PAGE   |      1 |  28199 |     10 |      5|
|  7 |       INDEX FULL SCAN          | IDX_PAGE |      1 |  61800 |     10 |      3|
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("RN">=1)
   2 - filter(ROWNUM<=10)
   6 - filter("OWNER"='SYS')

从执行计划中我们可以看到,两条SQL都走了index full scan,第一条SQL从索引中扫描了72 427条数据(Id=7 A-Rows=72427),在回表的时候对数据进行了大量过滤(Id=6),最后得到10条数据,耗费了1 273个逻辑读(Buffers=1273)。第二条SQL从索引中扫描了10条数据,耗费了5个逻辑读(Buffers=5)。显而易见,第二条SQL的执行计划是正确的,而第一条SQL的执行计划是错误的,应该尽量在索引扫描的时候就取得10行数据。

为什么仅仅是过滤条件不一样,两条SQL在效率上有这么大区别呢?这是因为第一条SQL过滤条件是owner=‘SCOTT’,owner='SCOTT’在表中只有很少数据,通过扫描object_id列的索引,然后回表再去匹配owner=‘SCOTT’,因为owner='SCOTT’数据量少,要搜索大量数据才能匹配上。而第二条SQL的过滤条件是owner=‘SYS’,因为owner='SYS’数据量多,只需要搜索少量数据就能匹配上。

想要优化第一条SQL,就需要让其在索引扫描的时候读取少量数据块就取得10行数据,这就需要将过滤列(owner)包含在索引中,排序列是object_id,那么现在我们创建组合索引:

SQL> create index idx_page_ownerid on t_page(owner,object_id);

Index created.

我们查看强制走索引(idx_page_ownerid)带有A-Rows的执行计划(省略了部分数据):

SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------
SQL_ID  a1g16uafr05qf, child number 0
-------------------------------------
select *   from (select *           from (select a.*, rownum rn
          from (select / *+ index(t_page idx_page_ownerid) */
              *                           from t_page
       where owner = 'SCOTT'                          order by
object_id) a)          where rownum <= 10)  where rn >= 1

Plan hash value: 4175643597

-------------------------------------------------------------------------------------
| Id |Operation                       |Name            |Starts|E-Rows|A-Rows|Buffers|
-------------------------------------------------------------------------------------
|  0 |SELECT STATEMENT                |                |     1|      |    10|      6|
|* 1 |VIEW                            |                |     1|    10|    10|      6|
|* 2 | COUNT STOPKEY                  |                |     1|      |    10|      6|
|  3 |  VIEW                          |                |     1|    57|    10|      6|
|  4 |   COUNT                        |                |     1|      |    10|      6|
|  5 |    VIEW                        |                |     1|    57|    10|      6|
|  6 |     TABLE ACCESS BY INDEX ROWID|T_PAGE          |     1|    57|    10|      6|
|* 7 |      INDEX RANGE SCAN          |IDX_PAGE_OWNERID|     1|    57|    10|      3|
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("RN">=1)
   2 - filter(ROWNUM<=10)
   7 - access("OWNER"='SCOTT')

从执行计划中我们可以看到,SQL走了索引范围扫描,从索引中扫描了10条数据,一共耗费了6个逻辑读。这说明该执行计划是正确的。大家可能会问:可不可以在创建索引的时候将object_id放在前面、owner放在后面?现在我们来创建另外一个索引,将object_id列放在前面,owner放在后面。

SQL> create index idx_page_idowner on t_page(object_id,owner);

Index created.

我们查看强制走索引(idx_page_idowner)带有A-Rows的执行计划(省略了部分数据):

SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------
SQL_ID  djdnfyyznp3tf, child number 0
-------------------------------------
select *   from (select *           from (select a.*, rownum rn
          from (select / *+ index(t_page idx_page_idowner) */ *
                 from t_page                          where owner =
'SCOTT'                          order by object_id) a)          where
rownum <= 10)  where rn >= 1

Plan hash value: 2811585238

-------------------------------------------------------------------------------------
| Id |Operation                       |Name            |Starts|E-Rows|A-Rows|Buffers|
-------------------------------------------------------------------------------------
|  0 |SELECT STATEMENT                |                |     1|      |    10|    224|
|* 1 | VIEW                           |                |     1|    10|    10|    224|
|* 2 |  COUNT STOPKEY                 |                |     1|      |    10|    224|
|  3 |   VIEW                         |                |     1|    57|    10|    224|
|  4 |    COUNT                       |                |     1|      |    10|    224|
|  5 |     VIEW                       |                |     1|    57|    10|    224|
|  6 |     TABLE ACCESS BY INDEX ROWID|T_PAGE          |     1|    57|    10|    224|
|* 7 |       INDEX FULL SCAN          |IDX_PAGE_IDOWNER|     1|   247|    10|    221|
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("RN">=1)
   2 - filter(ROWNUM<=10)
   7 - access("OWNER"='SCOTT')
       filter("OWNER"='SCOTT')

从执行计划中我们看到,SQL走了索引全扫描,从索引中扫描了10条数据,但是索引全扫描耗费了221个逻辑读,因为要边扫描索引边过滤数据(owner=‘SCOTT’),SQL一共耗费了224个逻辑读,与走object_id列的执行计划(耗费了1 273个逻辑读)相比,虽然也提升了性能,但是性能最好的是走idx_page_ownerid这个索引的执行计划(逻辑读为6)。

大家可能还会问,可不可以只在owner列创建索引呢?也就是说不将排序列包含在索引中。如果过滤条件能过滤掉大部分数据(owner=‘SCOTT’),那么这时不将排序列包含在索引中也是可以的,因为这时只需要对少量数据进行排序,少量数据排序几乎对性能没有什么影响。但是如果过滤条件只能过滤掉一部分数据,也就是说返回数据量很多(owner=‘SYS’),这时我们必须将排序列包含在索引中,如果不将排序列包含在索引中,就需要对大量数据进行排序。在实际生产环境中,过滤条件一般都是绑定变量,我们无法控制传参究竟传入哪个值,这就不能确定返回数据究竟是多还是少,所以为了保险起见,建议最好将排序列包含在索引中!

另外要注意,如果排序列有多个列,创建索引的时候,我们要将所有的排序列包含在索引中,并且要注意排序列先后顺序(语句中是怎么排序的,创建索引的时候就对应排序),而且还要注意列是升序还是降序。如果分页语句中排序列只有一个列,但是是降序显示的,创建索引的时候就没必要降序创建了,我们可以使用HINT: index_desc让索引降序扫描就行。

现有如下分页语句:

select *
  from (select *
          from (select a.*, rownum rn
                  from (select *
                          from t_page
                         order by object_id, object_name desc) a)
         where rownum <= 10)
 where rn >= 1;

创建索引的时候,只能是object_id列在前,object_name列在后面,另外object_name是降序显示的,那么在创建索引的时候,我们还要指定object_name列降序排序。此外该SQL没有过滤条件,在创建索引的时候,我们还要加个常量。现在我们创建如下索引:

SQL> create index idx_page_idname on t_page(object_id,object_name desc,0);

Index created.

我们查看强制走索引(idx_page_idname)带有A-Rows的执行计划(省略了部分数据):

SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------
SQL_ID  20yk62bptjrs9, child number 0
-------------------------------------
select *   from (select *           from (select a.*, rownum rn
          from (select   / *+ index(t_page idx_page_idname)*/     
            *                             from t_page
                order by object_id, object_name desc) a)          where
rownum <= 10)  where rn >= 1

Plan hash value: 445348578

-------------------------------------------------------------------------------------
| Id |Operation                       |Name           |Starts|E-Rows| A-Rows |Buffers|
-------------------------------------------------------------------------------------
|  0 |SELECT STATEMENT                |               |     1|      |     10 |      5|
|* 1 | VIEW                           |               |     1|    10|     10 |      5|
|* 2 |  COUNT STOPKEY                 |               |     1|      |     10 |      5|
|  3 |   VIEW                         |               |     1| 61800|     10 |      5|
|  4 |    COUNT                       |               |     1|      |     10 |      5|
|  5 |     VIEW                       |               |     1| 61800|     10 |      5|
|  6 |     TABLE ACCESS BY INDEX ROWID|T_PAGE         |     1| 61800|     10 |      5|
|  7 |       INDEX FULL SCAN          |IDX_PAGE_IDNAME|     1| 61800|     10 |      3|
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("RN">=1)
   2 - filter(ROWNUM<=10)

如果创建索引的时候将object_name放在前面,object_id放在后面,这个时候,索引中列先后顺序与分页语句中排序列先后顺序不一致,强制走索引的时候,执行计划中会出现SORT ORDER BY关键字。因为索引的顺序与排序的顺序不一致,所以需要从索引中获取数据之后再排序,有排序就会出现SORT ORDER BY。现在我们创建如下索引:

SQL> create index idx_page_nameid on t_page(object_name,object_id,0);

Index created.

现在查看强制走索引(idx_page_nameid)带有A-Rows的执行计划(省略了部分数据):

SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------
SQL_ID  8b8nwayah0z68, child number 0
-------------------------------------
select *   from (select *           from (select a.*, rownum rn
          from (select / *+ index(t_page idx_page_nameid)*/
            *                           from t_page
     order by object_id, object_name desc) a)          where rownum <=
10)  where rn >= 1

Plan hash value: 2869317785

-------------------------------------------------------------------------------------
| Id |Operation                         |Name           |Starts|E-Rows|A-Rows|Buffers|
-------------------------------------------------------------------------------------
|  0 |SELECT STATEMENT                  |               |     1|      |    10| 37397|
|* 1 | VIEW                             |               |     1|    10|    10| 37397|
|* 2 |  COUNT STOPKEY                   |               |     1|      |    10| 37397|
|  3 |   VIEW                           |               |     1| 61800|    10| 37397|
|  4 |    COUNT                         |               |     1|      |    10| 37397|
|  5 |     VIEW                         |               |     1| 61800|    10| 37397|
|  6 |      SORT ORDER BY               |               |     1| 61800|    10| 37397|
|  7 |       TABLE ACCESS BY INDEX ROWID|T_PAGE         |     1| 61800| 72608| 37397|
|  8 |        INDEX FULL SCAN           |IDX_PAGE_NAMEID|     1| 61800| 72608|   431|
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("RN">=1)
   2 - filter(ROWNUM<=10)

如果创建索引的时候没有指定object_name列降序排序,那么执行计划中也会出现SORT ORDER BY。因为索引中排序和分页语句中排序不一致。现在我们创建如下索引:

SQL> create index idx_page_idname1 on t_page(object_id,object_name,0);

Index created.

我们查看强制走索引(idx_page_idname1)带有A-Rows的执行计划(省略了部分数据):

SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------
SQL_ID  2dsmtc9b65a7v, child number 0
-------------------------------------
select *   from (select *           from (select a.*, rownum rn
          from (select / *+ index(t_page idx_page_idname1)*/
             *                           from t_page
      order by object_id, object_name desc) a)          where rownum <=
10)  where rn >= 1

Plan hash value: 170538223

-------------------------------------------------------------------------------------
| Id |Operation                        | Name           |Starts|E-Rows|A-Rows|Buffers|
-------------------------------------------------------------------------------------
|  0 |SELECT STATEMENT                 |                |     1|      |    10|  1533|
|* 1 | VIEW                            |                |     1|    10|    10|  1533|
|* 2 |  COUNT STOPKEY                  |                |     1|      |    10|  1533|
|  3 |   VIEW                          |                |     1| 61800|    10|  1533|
|  4 |    COUNT                        |                |     1|      |    10|  1533|
|  5 |     VIEW                        |                |     1| 61800|    10|  1533|
|  6 |      SORT ORDER BY              |                |     1| 61800|    10|  1533|
|  7 |      TABLE ACCESS BY INDEX ROWID|T_PAGE          |     1| 61800| 72608|  1533|
|  8 |        INDEX FULL SCAN          |IDX_PAGE_IDNAME1|     1| 61800| 72608|   430|
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("RN">=1)
   2 - filter(ROWNUM<=10)

分页语句中如果出现了SORT ORDER BY,这就意味着分页语句没有利用到索引已经排序的特性,执行计划一般是错误的,这时需要创建正确的索引。

现有如下SQL(注意,过滤条件有等值条件,也有非等值条件,当然也有order by),现在要将查询结果分页显示,每页显示10条:

select * from t_page where owner = 'SYS' and object_id > 1000 order by object_name;

大家请思考,应该怎么创建索引,从而优化上面的分页语句呢?上文提到,如果分页语句中有排序列,创建索引的时候,要将排序列包含在索引中。所以现在我们只需要将过滤列owner、object_id以及排序列object_name组合起来创建索引中即可。

因为owner是等值过滤,object_id是非等值过滤,创建索引的时候,我们要优先将等值过滤列和排序列组合在一起,然后再将非等值过滤列放到后面:

SQL> create index idx_ownernameid on t_page(owner,object_name,object_id);

Index created.

让我们查看强制走索引(idx_ownernameid)带有A-Rows的执行计划(省略了部分数据):

SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------
SQL_ID  07z0dkm4a9qdz, child number 0
-------------------------------------
select *   from (select *           from (select a.*, rownum rn
          from (select / *+ index(t_page idx_ownernameid) */
             *                           from t_page
      where owner = 'SYS'                            and object_id >
1000                          order by object_name) a)          where
rownum <= 10)  where rn >= 1

Plan hash value: 2090516350

-------------------------------------------------------------------------------------
| Id |Operation                       |Name           |Starts|E-Rows| A-Rows |Buffers|
-------------------------------------------------------------------------------------
|  0 |SELECT STATEMENT                |               |     1|      |     10 |    14|
|* 1 | VIEW                           |               |     1|    10|     10 |    14|
|* 2 |  COUNT STOPKEY                 |               |     1|      |     10 |    14|
|  3 |   VIEW                         |               |     1| 26937|     10 |    14|
|  4 |    COUNT                       |               |     1|      |     10 |    14|
|  5 |     VIEW                       |               |     1| 26937|     10 |    14|
|  6 |     TABLE ACCESS BY INDEX ROWID|T_PAGE         |     1| 26937|     10 |    14|
|* 7 |       INDEX RANGE SCAN         |IDX_OWNERNAMEID|     1|   254|     10 |     4|
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("RN">=1)
   2 - filter(ROWNUM<=10)
   7 - access("OWNER"='SYS' AND "OBJECT_ID">1000)
       filter("OBJECT_ID">1000)

执行计划中没有SORT ORDER BY,逻辑读也才14个,说明执行计划非常理想。也许大家会问,为何不创建如下这样索引呢?

SQL> create index idx_owneridname on t_page(owner,object_id,object_name);

Index created.

我们查看强制走索引(idx_owneridname)带有A-Rows的执行计划(省略了部分数据):

SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------
SQL_ID  7bm9sf2u94uxa, child number 0
-------------------------------------
select *   from (select *           from (select a.*, rownum rn
          from (select / *+ index(t_page idx_owneridname) */
             *                           from t_page
      where owner = 'SYS'                            and object_id >
1000                          order by object_name) a)          where
rownum <= 10)  where rn >= 1

Plan hash value: 2498002320

-------------------------------------------------------------------------------------
| Id |Operation                        |Name           |Starts|E-Rows|A-Rows|Buffers|
-------------------------------------------------------------------------------------
|  0 |SELECT STATEMENT                 |               |     1|      |    10|   1002|
|* 1 | VIEW                            |               |     1|    10|    10|   1002|
|* 2 |  COUNT STOPKEY                  |               |     1|      |    10|   1002|
|  3 |   VIEW                          |               |     1| 26937|    10|   1002|
|  4 |    COUNT                        |               |     1|      |    10|   1002|
|  5 |     VIEW                        |               |     1| 26937|    10|   1002|
|  6 |      SORT ORDER BY              |               |     1| 26937|    10|   1002|
|  7 |      TABLE ACCESS BY INDEX ROWID|T_PAGE         |     1| 26937| 29919|   1002|
|* 8 |        INDEX RANGE SCAN         |IDX_OWNERIDNAME|     1| 26937| 29919|    189|
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("RN">=1)
   2 - filter(ROWNUM<=10)
   8 - access("OWNER"='SYS' AND "OBJECT_ID">1000 AND "OBJECT_ID" IS NOT NULL)

该执行计划中有SORT ORDER BY,说明没有用到索引已经排序特性,而且逻辑读为1 002个,这说明该执行计划是错误的。为什么该执行计划是错误的呢?这是因为该分页语句是根据object_name进行排序的,但是创建索引的时候是按照owner、object_id、object_name顺序创建索引的,索引中前5条数据如下:

SQL> select *
  2    from (select rownum rn, owner, object_id, object_name
  3            from t_page
  4           where owner = 'SYS'
  5             and object_id > 1000
  6           order by owner, object_id, object_name)
  7   where rownum <= 5;

        RN OWNER  OBJECT_ID OBJECT_NAME
---------- ----- ---------- ------------
         1 SYS         1001 NOEXP$
         2 SYS         1002 EXPPKGOBJ$
         3 SYS         1003 I_OBJTYPE
         4 SYS         1004 EXPPKGACT$
         5 SYS         1005 I_ACTPACKAGE

在这前5条数据中,我们按照分页语句排序条件object_name进行排序,应该是第4行数据显示为第一行数据,但是它在索引中排到了第4行,所以索引中数据的顺序并不能满足分页语句中的排序要求,这就产生了SORT ORDER BY,进而导致执行计划错误。为什么按照owner、object_name、object_id顺序创建索引,执行计划是对的呢?现在我们取索引中前5条数据:

SQL> select *
  2    from (select rownum rn, owner, object_id, object_name
  3            from t_page
  4           where owner = 'SYS'
  5             and object_id > 1000
  6           order by owner,object_name,object_id)
  7   where rownum <= 5;

        RN OWNER  OBJECT_ID OBJECT_NAME
---------- ----- ---------- --------------------------------
         1 SYS        34042 /1000323d_DelegateInvocationHa
         2 SYS        44844 /1000e8d1_LinkedHashMapValueIt
         3 SYS        23397 /1005bd30_LnkdConstant
         4 SYS        19737 /10076b23_OraCustomDatumClosur
         5 SYS        45460 /100c1606_StandardMidiFileRead

索引中的数据顺序完全符合分页语句中的排序要求,这就不需要我们进行SORT ORDER BY了,所以该执行计划是对的。

现在我们继续完善分页语句的优化思路:如果分页语句中有排序(order by),要利用索引已经排序特性,将order by的列按照排序的先后顺序包含在索引中,同时要注意排序是升序还是降序。如果分页语句中有过滤条件,我们要注意过滤条件是否有等值过滤条件,如果有等值过滤条件,要将等值过滤条件优先组合在一起,然后将排序列放在等值过滤条件后面,最后将非等值过滤列放排序列后面。如果分页语句中没有等值过滤条件,我们应该先将排序列放在索引前面,将非等值过滤列放后面,最后利用rownum的COUNT STOPKEY特性来优化分页SQL。如果分页中没有排序,可以直接利用rownum的COUNT STOPKEY特性来优化分页SQL。

如果我们想一眼看出分页语句执行计划是正确还是错误的,先看分页语句有没有ORDER BY,再看执行计划有没有SORT ORDER BY,如果执行计划中有SORT ORDER BY,执行计划一般都是错误的。

请大家思考,如下分页语句应该如何建立索引(提示:该SQL没有等值过滤):

select *
  from (select *
          from (select a.*, rownum rn
                  from (select *
                          from t_page
                         where owner like 'SYS%'
                           and object_id > 1000
                         order by object_name) a)
         where rownum <= 10)
 where rn >= 1;

如果分页语句中排序的表是分区表,这时我们要看分页语句中是否有跨分区扫描,如果有跨分区扫描,创建索引一般都创建为global索引,如果不创建global索引,就无法保证分页的顺序与索引的顺序一致。如果就只扫描一个分区,这时可以创建local索引。

现在我们创建一个根据object_id范围分区的分区表p_test并且插入测试数据:

SQL> create table p_test(
  2  OWNER          VARCHAR2(30),
  3  OBJECT_NAME    VARCHAR2(128),
  4  SUBOBJECT_NAME VARCHAR2(30),
  5  OBJECT_ID      NUMBER,
  6  DATA_OBJECT_ID NUMBER,
  7  OBJECT_TYPE    VARCHAR2(19),
  8  CREATED        DATE,
  9  LAST_DDL_TIME  DATE,
 10  TIMESTAMP      VARCHAR2(19),
 11  STATUS         VARCHAR2(7),
 12  TEMPORARY      VARCHAR2(1),
 13  GENERATED      VARCHAR2(1),
 14  SECONDARY      VARCHAR2(1),
 15  NAMESPACE      NUMBER,
 16  EDITION_NAME   VARCHAR2(30)
 17  ) partition by range (object_id)
 18  (
 19  partition p1 values less than (10000),
 20  partition p2 values less than (20000),
 21  partition p3 values less than (30000),
 22  partition p4 values less than (40000),
 23  partition p5 values less than (50000),
 24  partition p6 values less than (60000),
 25  partition p7 values less than (70000),
 26  partition p8 values less than (80000),
 27  partition pmax values less than(maxvalue)
 28  );

Table created.

SQL> insert into p_test select * from dba_objects;

72662 rows created.

SQL> commit;

现有如下分页语句(根据范围分区列排序):

select *
  from (select *
          from (select a.*, rownum rn
                  from (select * from p_test order by object_id) a)
         where rownum <= 10)
 where rn >= 1;

该分页语句没有过滤条件,因此会扫描表中所有分区。因为排序列恰好是范围分区列,范围分区每个分区的数据也是递增的,这时我们创建索引可以创建为local索引。但是如果将范围分区改成LIST分区或者HASH分区,这时我们就必须创建global索引,因为LIST分区和HASH分区是无序的。

现在我们创建local索引:

SQL> create index idx_ptest_id on p_test(object_id,0) local;

Index created.

我们查看强制走索引(idx_ptest_id)带有A-Rows的执行计划(省略了部分数据):

SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------
SQL_ID  3rp1uz98fgggq, child number 0
-------------------------------------
select *   from (select *           from (select a.*, rownum rn
          from (select / *+ index(p_test idx_ptest_id) */
          *                           from p_test
   order by object_id) a)          where rownum <= 10)  where rn >= 1

Plan hash value: 1636704844

-------------------------------------------------------------------------------------
| Id |Operation                              |Name        |Starts|E-Rows|A-Rows|Buffers|
-------------------------------------------------------------------------------------
|  0 |SELECT STATEMENT                       |            |    1|      |    10|    5|
|* 1 | VIEW                                  |            |    1|    10|    10|    5|
|* 2 |  COUNT STOPKEY                        |            |    1|      |    10|    5|
|  3 |   VIEW                                |            |    1| 51888|    10|    5|
|  4 |    COUNT                              |            |    1|      |    10|    5|
|  5 |     VIEW                              |            |    1| 51888|    10|    5|
|  6 |      PARTITION RANGE ALL              |            |    1| 51888|    10|    5|
|  7 |      TABLE ACCESS BY LOCAL INDEX ROWID|P_TEST      |    1| 51888|    10|    5|
|  8 |        INDEX FULL SCAN                |IDX_PTEST_ID|    1| 51888|    10|    3|
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("RN">=1)
   2 - filter(ROWNUM<=10)

现有如下分页语句(根据object_name排序):

select *
  from (select *
          from (select a.*, rownum rn
                  from (select * from p_test order by object_name) a)
         where rownum <= 10)
 where rn >= 1;

该分页语句没有过滤条件,因此会扫描表中所有分区。因为排序列恰好是范围分区列,范围分区每个分区的数据也是递增的,这时我们创建索引可以创建为local索引。但是如果将范围分区改成LIST分区或者HASH分区,这时我们就必须创建global索引,因为LIST分区和HASH分区是无序的。

现在我们创建local索引:

SQL> create index idx_ptest_id on p_test(object_id,0) local;

Index created.

我们查看强制走索引(idx_ptest_id)带有A-Rows的执行计划(省略了部分数据):

SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------
SQL_ID  3rp1uz98fgggq, child number 0
-------------------------------------
select *   from (select *           from (select a.*, rownum rn
          from (select / *+ index(p_test idx_ptest_id) */
          *                           from p_test
   order by object_id) a)          where rownum <= 10)  where rn >= 1

Plan hash value: 1636704844

-------------------------------------------------------------------------------------
| Id |Operation                              |Name        |Starts|E-Rows|A-Rows|Buffers|
-------------------------------------------------------------------------------------
|  0 |SELECT STATEMENT                       |            |    1|      |    10|    5|
|* 1 | VIEW                                  |            |    1|    10|    10|    5|
|* 2 |  COUNT STOPKEY                        |            |    1|      |    10|    5|
|  3 |   VIEW                                |            |    1| 51888|    10|    5|
|  4 |    COUNT                              |            |    1|      |    10|    5|
|  5 |     VIEW                              |            |    1| 51888|    10|    5|
|  6 |      PARTITION RANGE ALL              |            |    1| 51888|    10|    5|
|  7 |      TABLE ACCESS BY LOCAL INDEX ROWID|P_TEST      |    1| 51888|    10|    5|
|  8 |        INDEX FULL SCAN                |IDX_PTEST_ID|    1| 51888|    10|    3|
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("RN">=1)
   2 - filter(ROWNUM<=10)

现有如下分页语句(根据object_name排序):

select *
  from (select *
          from (select a.*, rownum rn
                  from (select * from p_test order by object_name) a)
         where rownum <= 10)
 where rn >= 1;

这时我们就需要创建global索引,如果创建local索引会导致产生SORT ORDER BY:

SQL> create index idx_ptest_name on p_test(object_name,0) local;

Index created.

现在查看强制走索引(idx_ptest_name)带有A-Rows的执行计划(省略了部分数据):

SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------
SQL_ID  50hgw72gnvs83, child number 0
-------------------------------------
select *   from (select *           from (select a.*, rownum rn
          from (select / *+ index(p_test idx_ptest_name) */
            *                           from p_test
     order by object_name) a)          where rownum <= 10)  where rn >=1

Plan hash value: 2548872510

-------------------------------------------------------------------------------------
| Id |Operation                               |Name          |Starts|E-Rows|A-Rows|Buffers |
-------------------------------------------------------------------------------------
|  0 |SELECT STATEMENT                        |              |     1|      |    10|  35530 |
|* 1 | VIEW                                   |              |     1|    10|    10|  35530 |
|* 2 |  COUNT STOPKEY                         |              |     1|      |    10|  35530 |
|  3 |   VIEW                                 |              |     1| 51888|    10|  35530 |
|  4 |    COUNT                               |              |     1|      |    10|  35530 |
|  5 |     VIEW                               |              |     1| 51888|    10|  35530 |
|  6 |      SORT ORDER BY                     |              |     1| 51888|    10|  35530 |
|  7 |       PARTITION RANGE ALL              |              |     1| 51888| 72662|  35530 |
|  8 |       TABLE ACCESS BY LOCAL INDEX ROWID|P_TEST        |     9| 51888| 72662|  35530 |
|  9 |         INDEX FULL SCAN                |IDX_PTEST_NAME|     9| 51888| 72662|    392 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("RN">=1)
   2 - filter(ROWNUM<=10)

现在我们将索引idx_ptest_name重建为global索引:

SQL> drop index idx_ptest_name;

Index dropped.

SQL> create index idx_ptest_name on p_test(object_name,0);

Index created.

查看强制走索引(idx_ptest_name)带有A-Rows的执行计划(省略了部分数据):

SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------
SQL_ID  50hgw72gnvs83, child number 0
-------------------------------------
select *   from (select *           from (select a.*, rownum rn
          from (select / *+ index(p_test idx_ptest_name) */
            *                           from p_test
     order by object_name) a)          where rownum <= 10)  where rn >=1

Plan hash value: 4135902528

-------------------------------------------------------------------------------------
| Id |Operation                              |Name          |Starts|E-Rows|A-Rows|Buffers|
------------------------------------------------------------------------------------------
|  0 |SELECT STATEMENT                       |              |     1|      |    10|     10|
|* 1 | VIEW                                  |              |     1|    10|    10|     10|
|* 2 |  COUNT STOPKEY                        |              |     1|      |    10|     10|
|  3 |   VIEW                                |              |     1| 51888|    10|     10|
|  4 |    COUNT                              |              |     1|      |    10|     10|
|  5 |     VIEW                              |              |     1| 51888|    10|     10|
|  6 |     TABLE ACCESS BY GLOBAL INDEX ROWID|P_TEST        |     1| 51888|    10|     10|
|  7 |       INDEX FULL SCAN                 |IDX_PTEST_NAME|     1| 51888|    10|      4|
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("RN">=1)
   2 - filter(ROWNUM<=10)

3.2、多表关联分页优化思路

多表关联分页语句,要利用索引已经排序特性、ROWNUM的COUNT STOPKEY特性以及嵌套循环传值特性来优化。

现在我们创建另外一个测试表T_PAGE2:

SQL> create table t_page2 as select * from dba_objects;

Table created.

现有如下分页语句:

select *
  from (select *
          from (select a.owner,
                       a.object_id,
                       a.subobject_name,
                       a.object_name,
                       rownum rn
                  from (select t1.owner,
                               t1.object_id,
                               t1.subobject_name,
                               t2.object_name
                          from t_page t1, t_page2 t2
                         where t1.object_id = t2.object_id
                         order by t2.object_name) a)
         where rownum <= 10)
 where rn >= 1;

分页语句中排序列是t_page2的object_name,我们需要对其创建一个索引:

SQL> create index idx_page2_name on t_page2(object_name,0);

Index created.

现在强制t_page2走刚才创建的索引并且让其作为嵌套循环驱动表,t_page作为嵌套循环被驱动表,利用rownum的COUNT STOPKEY特性,扫描到10条数据,SQL就停止。现在我们查看强制走索引,强制走嵌套循环的A-ROWS执行计划:

SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------
SQL_ID  g0gpgftwrfwzt, child number 0
-------------------------------------
select *   from (select *           from (select
a.owner,a.object_id,a.subobject_name,a.object_name, rownum rn
        from (select / *+ index(t2 idx_page2_name) leading(t2) use_nl(t2,t1)  */
t1.owner,t1.object_id,t1.subobject_name,t2.object_name
         from t_page t1, t_page2 t2      where
t1.object_id = t2.object_id      order by
t2.object_name) a)    where rownum <= 10)  where rn >= 1

Plan hash value: 4182646763

-------------------------------------------------------------------------------------
| Id |Operation                        |Name          |Starts|E-Rows|A-Rows| Buffers 
-------------------------------------------------------------------------------------
|  0 |SELECT STATEMENT                 |              |     1|      |    10|     29 |
|* 1 | VIEW                            |              |     1|    10|    10|     29 |
|* 2 |  COUNT STOPKEY                  |              |     1|      |    10|     29 |
|  3 |   VIEW                          |              |     1| 61800|    10|     29 |
|  4 |    COUNT                        |              |     1|      |    10|     29 |
|  5 |     VIEW                        |              |     1| 61800|    10|     29 |
|  6 |      NESTED LOOPS               |              |     1| 61800|    10|     29 |
|  7 |      TABLE ACCESS BY INDEX ROWID|T_PAGE2       |     1| 66557|    10|     10 |
|  8 |        INDEX FULL SCAN          |IDX_PAGE2_NAME|     1| 66557|    10|      4 |
|  9 |      TABLE ACCESS BY INDEX ROWID|T_PAGE        |    10|     1|    10|     19 |
|*10 |        INDEX RANGE SCAN         |IDX_PAGE      |    10|     1|    10|     13 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("RN">=1)
   2 - filter(ROWNUM<=10)
  10 - access("T1"."OBJECT_ID"="T2"."OBJECT_ID")

从执行计划中我们看到,驱动表走的是排序列的索引,扫描了10行数据,传值10次给被驱动表,然后SQL停止运行,逻辑读一共29个,该执行计划是正确的,而且是最佳执行计划。

大家思考一下,对于上面的分页语句,能否走HASH连接?如果SQL走了HASH连接,这时两个表关联之后得到的结果无法保证是有序的,这就需要关联完成后再进行一次排序(SORT ORDER BY),所以不能走HASH连接,同理也不能走排序合并连接。

为什么多表关联的分页语句必须走嵌套循环呢?这是因为嵌套循环是驱动表传值给被驱动表,如果驱动表返回的数据是有序的,那么关联之后的结果集也是有序的,这样就可以消除SORT ORDER BY。

现有如下分页语句(排序列来自两个表):

select *
  from (select *
          from (select a.owner,
                       a.object_id,
                       a.subobject_name,
                       a.object_name,
                       rownum rn    
                  from (select t1.owner,
                               t1.object_id,
                               t1.subobject_name,
                               t2.object_name
                          from t_page t1, t_page2 t2
                         where t1.object_id = t2.object_id
                         order by t2.object_name ,t1.subobject_name) a)
         where rownum <= 10)
 where rn >= 1;

因为以上分页语句排序列来自多个表,这就需要等两表关联完之后再进行排序,这样无法消除SORT ORDER BY,所以以上SQL语句无法优化,两表之间也只能走HASH连接。如果想优化上面分页语句,我们可以与业务沟通,去掉一个表的排序列,这样就不需要等两表关联完之后再进行排序。

现有如下分页语句(根据外连接从表排序):

select *
  from (select *
          from (select a.owner,
                       a.object_id,
                       a.subobject_name,
                       a.object_name,
                       rownum rn
                  from (select t1.owner,
                               t1.object_id,
                               t1.subobject_name,
                               t2.object_name
                          from t_page t1 left join t_page2 t2
                         on t1.object_id = t2.object_id
                         order by t2.object_name) a)
         where rownum <= 10)
 where rn >= 1;

两表关联如果是外连接,当两表用嵌套循环进行连接的时候,驱动表只能是主表。这里主表是t1,但是排序列来自t2,在分页语句中,对哪个表排序,就应该让其作为嵌套循环驱动表。但是这里相互矛盾。所以该分页语句无法优化,t1与t2只能走HASH连接。如果想要优化以上分页语句,我们只能让t1表中的列作为排序列。

分页语句中也不能有distinct、group by、max、min、avg、union、union all等关键字。因为当分页语句中有这些关键字,我们需要等表关联完或者数据都跑完之后再来分页,这样性能很差。

最后,我们总结一下多表关联分页优化思路。多表关联分页语句,如果有排序,只能对其中一个表进行排序,让参与排序的表作为嵌套循环的驱动表,并且要控制驱动表返回的数据顺序与排序的顺序一致,其余表的连接列要创建好索引。如果有外连接,我们只能选择主表的列作为排序列,语句中不能有distinct、group by、max、min、avg、union、union all,执行计划中不能出现SORT ORDER BY。

4、使用分析函数优化自连接

现有如下SQL及其执行计划:

SQL> select ename,deptno,sal
  2    from emp a
  3   where sal = (select max(sal) from emp b where a.deptno = b.deptno);

Execution Plan
----------------------------------------------------------
Plan hash value: 1245077725
-------------------------------------------------------------------------------
| Id  | Operation            | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |         |     1 |    39 |     8  (25)| 00:00:01 |
|*  1 |  HASH JOIN           |         |     1 |    39 |     8  (25)| 00:00:01 |
|   2 |   VIEW               | VW_SQ_1 |     3 |    78 |     4  (25)| 00:00:01 |
|   3 |    HASH GROUP BY     |         |     3 |    21 |     4  (25)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| EMP     |    14 |    98 |     3   (0)| 00:00:01 |
|   5 |   TABLE ACCESS FULL  | EMP     |    14 |   182 |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("SAL"="MAX(SAL)" AND "A"."DEPTNO"="ITEM_1")

该SQL表示查询员工表中每个部门工资最高的员工的所有信息,访问了EMP表两次。

我们可以利用分析函数对上面SQL进行等价改写,使EMP只访问一次。

分析函数的写法如下:

SQL> select ename, deptno, sal
  2    from (select a.*, max(sal) over(partition by deptno) max_sal from emp a)
  3   where sal = max_sal;

Execution Plan
----------------------------------------------------------
Plan hash value: 4130734685
----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |    14 |   644 |     4  (25)| 00:00:01 |
|*  1 |  VIEW               |      |    14 |   644 |     4  (25)| 00:00:01 |
|   2 |   WINDOW SORT       |      |    14 |   182 |     4  (25)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| EMP  |    14 |   182 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("SAL"="MAX_SAL")

使用分析函数改写之后,减少了表扫描次数,EMP表越大,性能提升越明显。

5、超大表与超小表关联优化方法

现有如下SQL:

select * from a,b where a.object_id=b.object_id;

表a有30MB,表b有30GB,两表关联后返回大量数据,应该走HASH连接,因为a是小表所以a应该作为HASH JOIN的驱动表,大表b作为HASH JOIN的被驱动表。在进行HASH JOIN的时候,驱动表会被放到PGA中,这里,因为驱动表a只有30MB,PGA能够完全容纳下驱动表。因为被驱动表b特别大,想要加快SQL查询速度,必须开启并行查询。超大表与超小表在进行并行HASH连接的时候,可以将小表(驱动表)广播到所有的查询进程,然后对大表进行并行随机扫描,每个查询进程查询部分b表数据,然后再进行关联。假设对以上SQL启用6个并行进程对a表的并行广播,对b表进行随机并行扫描(每部分记为b1,b2,b3,b4,b5,b6)其实就相当于将以上SQL内部等价改写为下面SQL。

select * from a,b1 where a.object_id=b1.object_id  ---并行进行
union all
select * from a,b2 where a.object_id=b2.object_id  ---并行进行
union all
select * from a,b3 where a.object_id=b3.object_id  ---并行进行
union all
select * from a,b4 where a.object_id=b4.object_id  ---并行进行
union all
select * from a,b5 where a.object_id=b5.object_id  ---并行进行
union all
select * from a,b6 where a.object_id=b6.object_id; ---并行进行

怎么才能让a表进行广播呢?我们需要添加hint:pq_distribute(驱动表none,broadcast)。

现在我们来查看a表并行广播的执行计划(为了方便排版,执行计划中省略了部分数据):

SQL> explain plan for select
 / *+ parallel(6) use_hash(a,b) pq_distribute(a none,broadcast) */
  2   *
  3    from a, b
  4   where a.object_id = b.object_id;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------
Plan hash value: 3536517442
--------------------------------------------------------------------------------
| Id  | Operation               | Name     | Rows  | Bytes |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |  5064K|  1999M|      |            |
|   1 |  PX COORDINATOR         |          |       |       |      |            |
|   2 |   PX SEND QC (RANDOM)   | :TQ10001 |  5064K|  1999M| P->S | QC (RAND)  |
|*  3 |    HASH JOIN            |          |  5064K|  1999M| PCWP |            |
|   4 |     PX RECEIVE          |          | 74893 |    14M| PCWP |            |
|   5 |      PX SEND BROADCAST  | :TQ10000 | 74893 |    14M| P->P | BROADCAST  |
|   6 |       PX BLOCK ITERATOR |          | 74893 |    14M| PCWC |            |
|   7 |        TABLE ACCESS FULL| A        | 74893 |    14M| PCWP |            |
|   8 |     PX BLOCK ITERATOR   |          |  5064K|   999M| PCWC |            |
|   9 |      TABLE ACCESS FULL  | B        |  5064K|   999M| PCWP |            |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("A"."OBJECT_ID"="B"."OBJECT_ID")

如果小表进行了广播,执行计划Operation会出现PX SEND BROADCAST关键字,PQ Distrib会出现BROADCAST关键字。注意:如果是两个大表关联,千万不能让大表广播。

6、超大表与超大表关联优化方法

现有如下SQL:

select * from a,b where a.object_id=b.object_id;

表a有4GB,表b有6GB,两表关联后返回大量数据,应该走HASH连接。因为a比b小,所以a表应该作为HASH JOIN的驱动表。驱动表a有4GB,需要放入PGA中。因为PGA中work area不能超过2G,所以PGA不能完全容纳下驱动表,这时有部分数据会溢出到磁盘(TEMP)进行on-disk hash join。我们可以开启并行查询加快查询速度。超大表与超大表在进行并行HASH连接的时候,需要将两个表根据连接列进行HASH运算,然后将运算结果放到PGA中,再进行HASH连接,这种并行HASH连接就叫作并行HASH HASH连接。假设对上面SQL启用6个并行查询,a表会根据连接列进行HASH运算然后拆分为6份,记为a1,a2,a3,a4,a5,a6,b表也会根据连接列进行HASH运算然后拆分为6份,记为b1,b2,b3,b4,b5,b6。那么以上SQL开启并行就相当于被改写成如下SQL:

select * from a1,b1 where a1.object_id=b1.object_id  ---并行进行
union all
select * from a2,b2 where a2.object_id=b2.object_id  ---并行进行
union all
select * from a3,b3 where a3.object_id=b3.object_id  ---并行进行
union all
select * from a4,b4 where a4.object_id=b4.object_id  ---并行进行
union all
select * from a5,b5 where a5.object_id=b5.object_id  ---并行进行
union all
select * from a6,b6 where a6.object_id=b6.object_id; ---并行进行

对于上面SQL,开启并行查询就能避免on-disk hash join,因为表不是特别大,而且被拆分到内存中了。怎么写HINT实现并行HASH HASH呢?我们需要添加hint:pq_distribute(被驱动表hash,hash)。

现在我们来查看并行HASH HASH的执行计划(为了方便排版,执行计划中省略了部分数据):

SQL> explain plan for select 
/ *+ parallel(6) use_hash(a,b) pq_distribute(b hash,hash) */
  2   *
  3    from a, b
  4   where a.object_id = b.object_id;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------
Plan hash value: 728916813
-------------------------------------------------------------------------------------
| Id | Operation               | Name     | Rows  | Bytes |TempSpc|IN-OUT|PQ Distrib|
-------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT        |          |  3046M|  1174G|       |      |          |
|  1 |  PX COORDINATOR         |          |       |       |       |      |          |
|  2 |   PX SEND QC (RANDOM)   | :TQ10002 |  3046M|  1174G|       | P->S |QC (RAND) |
|* 3 |    HASH JOIN BUFFERED   |          |  3046M|  1174G|   324M| PCWP |          |
|  4 |     PX RECEIVE          |          |  9323K|  1840M|       | PCWP |          |
|  5 |      PX SEND HASH       | :TQ10000 |  9323K|  1840M|       | P->P |HASH      |
|  6 |       PX BLOCK ITERATOR |          |  9323K|  1840M|       | PCWC |          |
|  7 |        TABLE ACCESS FULL| A        |  9323K|  1840M|       | PCWP |          |
|  8 |     PX RECEIVE          |          |    20M|  4045M|       | PCWP |          |
|  9 |      PX SEND HASH       | :TQ10001 |    20M|  4045M|       | P->P |HASH      |
| 10 |       PX BLOCK ITERATOR |          |    20M|  4045M|       | PCWC |          |
| 11 |        TABLE ACCESS FULL| B        |    20M|  4045M|       | PCWP |          |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("A"."OBJECT_ID"="B"."OBJECT_ID")

两表如果进行的是并行HASH HASH关联,执行计划Operation会出现PX SEND HASH关键字,PQ Distrib会出现HASH关键字。

如果表a有20G,表b有30G,即使采用并行HASH HASH连接也很难跑出结果,因为要把两个表先映射到PGA中,这需要耗费一部分PGA,之后在进行HASH JOIN的时候也需要部分PGA,此时PGA根本就不够用,如果我们查看等待事件,会发现进程一直在做DIRECT PATH READ/WRITE TEMP。

如何解决超级大表(几十GB)与超级大表(几十GB)关联的性能问题呢?我们可以根据并行HASH HASH关联的思路,人工实现并行HASH HASH。下面就是人工实现并行HASH HASH的过程。

现在我们创建新表p1,在表a的结构上添加一个字段HASH_VALUE,同时根据HASH_VALUE进行LIST分区:

SQL> CREATE TABLE P1(
  2  HASH_VALUE NUMBER,
  3  OWNER VARCHAR2(30),
  4  OBJECT_NAME VARCHAR2(128),
  5  SUBOBJECT_NAME VARCHAR2(30),
  6  OBJECT_ID NUMBER,
  7  DATA_OBJECT_ID NUMBER,
  8  OBJECT_TYPE VARCHAR2(19),
  9  CREATED DATE,
 10  LAST_DDL_TIME DATE,
 11  TIMESTAMP VARCHAR2(19),
 12  STATUS VARCHAR2(7),
 13  TEMPORARY VARCHAR2(1),
 14  GENERATED VARCHAR2(1),
 15  SECONDARY VARCHAR2(1),
 16  NAMESPACE NUMBER,
 17  EDITION_NAME VARCHAR2(30)
 18  )
 19     PARTITION BY  list(HASH_VALUE)
 20  (
 21  partition p0 values (0),
 22  partition p1 values (1),
 23  partition p2 values (2),
 24  partition p3 values (3),
 25  partition p4 values (4)
 26  );

Table created.

然后我们创建新表p2,在表b的结构上添加一个字段HASH_VALUE,同时根据HASH_VALUE进行LIST分区:

SQL> CREATE TABLE P2(
  2  HASH_VALUE NUMBER,
  3  OWNER VARCHAR2(30),
  4  OBJECT_NAME VARCHAR2(128),
  5  SUBOBJECT_NAME VARCHAR2(30),
  6  OBJECT_ID NUMBER,
  7  DATA_OBJECT_ID NUMBER,
  8  OBJECT_TYPE VARCHAR2(19),
  9  CREATED DATE,
 10  LAST_DDL_TIME DATE,
 11  TIMESTAMP VARCHAR2(19),
 12  STATUS VARCHAR2(7),
 13  TEMPORARY VARCHAR2(1),
 14  GENERATED VARCHAR2(1),
 15  SECONDARY VARCHAR2(1),
 16  NAMESPACE NUMBER,
 17  EDITION_NAME VARCHAR2(30)
 18  )
 19     PARTITION BY  list(HASH_VALUE)
 20  (
 21  partition p0 values (0),
 22  partition p1 values (1),
 23  partition p2 values (2),
 24  partition p3 values (3),
 25  partition p4 values (4)
 26  );

Table created.

然后我们创建新表p2,在表b的结构上添加一个字段HASH_VALUE,同时根据HASH_VALUE进行LIST分区:

SQL> CREATE TABLE P2(
  2  HASH_VALUE NUMBER,
  3  OWNER VARCHAR2(30),
  4  OBJECT_NAME VARCHAR2(128),
  5  SUBOBJECT_NAME VARCHAR2(30),
  6  OBJECT_ID NUMBER,
  7  DATA_OBJECT_ID NUMBER,
  8  OBJECT_TYPE VARCHAR2(19),
  9  CREATED DATE,
 10  LAST_DDL_TIME DATE,
 11  TIMESTAMP VARCHAR2(19),
 12  STATUS VARCHAR2(7),
 13  TEMPORARY VARCHAR2(1),
 14  GENERATED VARCHAR2(1),
 15  SECONDARY VARCHAR2(1),
 16  NAMESPACE NUMBER,
 17  EDITION_NAME VARCHAR2(30)
 18  )
 19     PARTITION BY  list(HASH_VALUE)
 20  (
 21  partition p0 values (0),
 22  partition p1 values (1),
 23  partition p2 values (2),
 24  partition p3 values (3),
 25  partition p4 values (4)
 26  );

Table created.

请注意,两个表分区必须一模一样,如果分区不一样,就有数据无法关联上。

我们将a表的数据迁移到新表p1中:

insert into p1
  select ora_hash(object_id, 4), a.* from a; ---注意排除object_id为null的数据
commit;

然后我们将b表的数据迁移到新表p2中:

insert into p2
  select ora_hash(object_id, 4), b.* from b; ---注意排除object_id为null的数据
commit;

下面SQL就是并行 HASH HASH关联的人工实现:

select *
  from p1, p2
 where p1.object_id = p2.object_id
   and p1.hash_value = 0
   and p2.hash_value = 0;

select *
  from p1, p2
 where p1.object_id = p2.object_id
   and p1.hash_value = 1
   and p2.hash_value = 1;

select *
  from p1, p2
 where p1.object_id = p2.object_id
   and p1.hash_value = 2
   and p2.hash_value = 2;

select *
  from p1, p2
 where p1.object_id = p2.object_id
   and p1.hash_value = 3
   and p2.hash_value = 3;

select *
  from p1, p2
 where p1.object_id = p2.object_id
   and p1.hash_value = 4
   and p2.hash_value = 4;

此方法运用了ora_hash函数。Oracle中的HASH分区就是利用的ora_hash函数。

ora_hash使用方法如下:

  • ora_hash(列,HASH桶),HASH桶默认是4 294 967 295,可以设置0~4 294 967 295。
  • ora_hash(object_id,4)会把object_id的值进行HASH运算,然后放到0、1、2、3、4这些桶里面,也就是说ora_hash(object_id,4)只会产生0、1、2、3、4这几个值。
  • 将大表(a,b)拆分为分区表(p1,p2)之后,我们只需要依次关联对应的分区,这样就不会出现PGA不足的问题,从而解决了超级大表关联查询的效率问题。在实际生产环境中,需要添加多少分区,请自己判断。

7、LIKE语句优化方法

我们先创建测试表T:

SQL> create table t as select * from dba_objects;

Table created.

现在有如下语句:

select * from t where object_name like '%SEQ%';

因为需要对字符串两边进行模糊匹配,而索引根块和分支块存储的是前缀数据(也就是说object like 'SEQ%'才能走索引),所以上面SQL查询无法走索引。

如果强制走索引,会走INDEX FULL SCAN:

SQL> create index idx_ojbname on t(object_name);

Index created.

查看强制走索引的执行计划:

SQL> select / *+ index(t) */ * from t where object_name like '%SEQ%';

208 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3894507753

-------------------------------------------------------------------------------------
| Id | Operation                   |Name       | Rows  | Bytes | Cost(%CPU)|Time    |
-------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |           |   219 | 45333 |  2214  (1)|00:00:27|
|  1 |  TABLE ACCESS BY INDEX ROWID|T          |   219 | 45333 |  2214  (1)|00:00:27|
|* 2 |   INDEX FULL SCAN           |IDX_OJBNAME|  3395 |       |   362  (1)|00:00:05|
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("OBJECT_NAME" LIKE '%SEQ%')

INDEX FULL SCAN是单块读,性能不如全表扫描。大家可能会有疑问,可不可以走INDEX FAST FULL SCAN呢?答案是不可以,因为INDEX FAST FULL SCAN不能回表,而上面SQL查询需要回表(select *)。

我们可以创建一个表当索引用,用来代替INDEX FAST FULL SCAN不能回表的情况:

SQL> create table index_t as select object_name,rowid rid from t;

Table created.

现在将SQL查询改写为如下SQL:

select *
  from t
 where rowid in (select rid from index_t where object_name like '%SEQ%');

改写完SQL之后,需要让index_t与t走嵌套循环,同时让index_t作为嵌套循环驱动表,这样就达到了让index_t充当索引的目的。

现在我们来对比两个SQL的autotrace执行计划:

SQL> select * from t where object_name like '%SEQ%';

208 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   135 | 27945 |   235   (1)| 00:00:03 |
|*  1 |  TABLE ACCESS FULL| T    |   135 | 27945 |   235   (1)| 00:00:03 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("OBJECT_NAME" IS NOT NULL AND "OBJECT_NAME" LIKE '%SEQ%')

Note
-----
   - dynamic sampling used for this statement (level=2)

Statistics
----------------------------------------------------------
          5  recursive calls
          0  db block gets
       1117  consistent gets
          0  physical reads
          0  redo size
      12820  bytes sent via SQL*Net to client
        563  bytes received via SQL*Net from client
         15  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        208  rows processed

SQL> select / *+ leading(index_t@a) use_nl(index_t@a,t) */
  2   *
  3    from t
  4   where rowid in (select / *+ qb_name(a) */
  5                    rid
  6                     from index_t
  7                    where object_name like '%SEQ%');

208 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2608052908
-------------------------------------------------------------------------------------
| Id | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |         |    87 | 25839 |   140   (2)| 00:00:02 |
|  1 |  NESTED LOOPS               |         |    87 | 25839 |   140   (2)| 00:00:02 |
|  2 |   SORT UNIQUE               |         |    87 |  6786 |    95   (2)| 00:00:02 |
|* 3 |    TABLE ACCESS FULL        | INDEX_T |    87 |  6786 |    95   (2)| 00:00:02 |
|  4 |   TABLE ACCESS BY USER ROWID| T       |     1 |   219 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("OBJECT_NAME" IS NOT NULL AND "OBJECT_NAME" LIKE '%SEQ%')

Note
-----
   - dynamic sampling used for this statement (level=2)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        499  consistent gets
          0  physical reads
          0  redo size
      12820  bytes sent via SQL*Net to client
        563  bytes received via SQL*Net from client
         15  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
        208  rows processed

因为t表很小,表字段也不多,所以大家可能感觉性能提升不是特别大。当t表越大,性能提升就越明显。采用这个方法还需要对index_t进行数据同步,我们可以将index_t创建为物化视图,刷新方式采用on commit刷新。

8、DBLINK优化

现在有如下两个表,a表是远端表(1800万),b表是本地表(100行):

SQL> desc a@dblink
 Name          Null?    Type
 --------------------- -------- ------
 ID                     NUMBER
 NAME                   VARCHAR2(100)
 ADDRESS                VARCHAR2(100)

SQL> select count(*) from a@dblink;

  COUNT(*)
----------
  18550272

SQL> desc b
 Name          Null?    Type
 --------------------- -------- ------
 ID                     NUMBER
 NAME                   VARCHAR2(100)
 ADDRESS                VARCHAR2(100)

SQL> select count(*) from b;

  COUNT(*)
----------
       100

现有如下SQL:

select * from a@dblink, b where a.id = b.id;

默认情况下,会将远端表a的数据传输到本地,然后再进行关联,autotrace的执行计划如下:

SQL> set timi on
SQL> set autot trace
SQL> select * from a@dblink, b where a.id = b.id;

25600 rows selected.

Elapsed: 00:03:13.80

Execution Plan
----------------------------------------------------------
Plan hash value: 657970699
-------------------------------------------------------------------------------------
| Id | Operation          |Name|Rows | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
-------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT   |    |   82| 19188 |     6  (17)| 00:00:01 |        |      |
|* 1 |  HASH JOIN         |    |   82| 19188 |     6  (17)| 00:00:01 |        |      |
|  2 |   REMOTE           |A   |   82|  9594 |     2   (0)| 00:00:01 | DBLINK | R->S |
|  3 |   TABLE ACCESS FULL|B   |  100| 11700 |     3   (0)| 00:00:01 |        |      |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("A"."ID"="B"."ID")

Remote SQL Information (identified by operation id):
----------------------------------------------------

   2 - SELECT "ID","NAME","ADDRESS" FROM "A" "A" (accessing 'DBLINK' )

Statistics
----------------------------------------------------------
        769  recursive calls
          1  db block gets
         15  consistent gets
      91755  physical reads
        212  redo size
    1477532  bytes sent via SQL*Net to client
      19185  bytes received via SQL*Net from client
       1708  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      25600  rows processed

远端表a很大,对数据进行传输会耗费大量时间,本地表b表很小,而且a和b关联之后返回数据量很少,我们可以将本地表b传输到远端,在远端进行关联,然后再将结果集传回本地,这时需要使用hint:driving_site,下面SQL就是将b传递到远端关联的示例:

select / *+ driving_site(a) */ * from a@dblink, b  where a.id = b.id;

autotrace的执行计划如下:

SQL> select / *+ driving_site(a) */ * from a@dblink, b  where a.id = b.id;

25600 rows selected.

Elapsed: 00:00:06.08

Execution Plan
----------------------------------------------------------
Plan hash value: 4284963264

------------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Inst   |IN-OUT|
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT REMOTE|      | 20931 |  4783K| 25565   (2)|        |      |
|*  1 |  HASH JOIN             |      | 20931 |  4783K| 25565   (2)|        |      |
|   2 |   REMOTE               | B    |    82 |  9594 |     2   (0)|      ! | R->S |
|   3 |   TABLE ACCESS FULL    | A    |    19M|  2173M| 25466   (1)|   ORCL |      |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("A2"."ID"="A1"."ID")

Remote SQL Information (identified by operation id):
----------------------------------------------------

   2 - SELECT "ID","NAME","ADDRESS" FROM "B" "A1" (accessing '!' )

Note
-----
   - fully remote statement

Statistics
----------------------------------------------------------
          6  recursive calls
          0  db block gets
          8  consistent gets
          0  physical reads
          0  redo size
    1428836  bytes sent via SQL*Net to client
      19185  bytes received via SQL*Net from client
       1708  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      25600  rows processed

将本地小表传输到远端关联,再返回结果只需6秒,相比将大表传输到本地,在性能上有巨大提升。

现在我们在远端表a的连接列建立索引:

SQL> create index idx_id on a(id);

Index created.

因为b表只有100行数据,a表有1 800万行数据,两表关联之后返回2.5万行数据,我们可以让a与b走嵌套循环,b作为驱动表,a作为被驱动表,而且走连接索引:

SQL> select / *+ index(a) use_nl(a,b) leading(b) */ * from a@dblink, b  where a.id = b.id;

25600 rows selected.

Elapsed: 00:00:00.84

Execution Plan
----------------------------------------------------------
Plan hash value: 1489534455
-------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Inst   |IN-OUT|
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |  7614K|  1699M| 54680 (100)|        |      |
|   1 |  NESTED LOOPS      |      |  7614K|  1699M| 54680 (100)|        |      |
|   2 |   TABLE ACCESS FULL| B    |   100 | 11700 |     3   (0)|        |      |
|   3 |   REMOTE           | A    | 76146 |  8700K|     3   (0)| DBLINK | R->S |
-------------------------------------------------------------------------------

Remote SQL Information (identified by operation id):
----------------------------------------------------

   3 - SELECT / *+ USE_NL ("A") INDEX ("A") */ "ID","NAME","ADDRESS" FROM "A" "A"
       WHERE "ID"=:1 (accessing 'DBLINK' )

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        106  consistent gets
          0  physical reads
          0  redo size
     349986  bytes sent via SQL*Net to client
      19185  bytes received via SQL*Net from client
       1708  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      25600  rows processed

强制a表走索引之后,这时我们只需将索引过滤之后的数据传输到本地,而无需将a表所有数据传到本地,性能得到极大提升,SQL耗时不到1秒。

现在我们将b表传输到远端,强制b表作为嵌套循环驱动表:

SQL> select / *+ driving_site(a) use_nl(a,b) leading(b) */ * from a@dblink, b  where a.id = b.id;

25600 rows selected.

Elapsed: 00:00:02.92

Execution Plan
----------------------------------------------------------
Plan hash value: 557259519
-------------------------------------------------------------------------------------
| Id | Operation                    |Name  |Rows | Bytes | Cost (%CPU)| Inst  |IN-OUT|
-------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT REMOTE      |      |20931|  4783K| 20182   (1)|       |      |
|  1 |  NESTED LOOPS                |      |     |       |            |       |      |
|  2 |   NESTED LOOPS               |      |20931|  4783K| 20182   (1)|       |      |
|  3 |    REMOTE                    |B     |   82|  9594 |     2   (0)|     ! | R->S |
|* 4 |    INDEX RANGE SCAN          |IDX_ID|  255|       |     2   (0)|  ORCL |      |
|  5 |   TABLE ACCESS BY INDEX ROWID|A     |  255| 29835 |   246   (0)|  ORCL |      |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("A2"."ID"="A1"."ID")

Remote SQL Information (identified by operation id):
----------------------------------------------------

   3 - SELECT / *+ USE_NL ("A1") */ "ID","NAME","ADDRESS" FROM "B" "A1" (accessing '!' )

Note
-----
   - fully remote statement

Statistics
----------------------------------------------------------
          6  recursive calls
          0  db block gets
          8  consistent gets
          0  physical reads
          0  redo size
     426684  bytes sent via SQL*Net to client
      19185  bytes received via SQL*Net from client
       1708  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      25600  rows processed

该查询耗时2.9秒,主要开销耗费在网络传输上,首先我们要将b表传输到远端,然后将a与b的关联结果传输到本地,网络传输耗费了两次。我们可以设置arraysize减少网络交互次数,从而减少网络开销,如下所示:

SQL> set arraysize 1000
SQL> select / *+ driving_site(a) use_nl(a,b) leading(b) */ * from a@dblink, b  where a.id = b.id;

25600 rows selected.

Elapsed: 00:00:00.29

Execution Plan
----------------------------------------------------------
Plan hash value: 557259519
-------------------------------------------------------------------------------------
| Id | Operation                    |Name  |Rows | Bytes | Cost (%CPU)|Inst  |IN-OUT|
-------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT REMOTE      |      |20931|  4783K| 20182   (1)|      |      |
|  1 |  NESTED LOOPS                |      |     |       |            |      |      |
|  2 |   NESTED LOOPS               |      |20931|  4783K| 20182   (1)|      |      |
|  3 |    REMOTE                    |B     |   82|  9594 |     2   (0)|     !| R->S |
|* 4 |    INDEX RANGE SCAN          |IDX_ID|  255|       |     2   (0)|  ORCL|      |
|  5 |   TABLE ACCESS BY INDEX ROWID|A     |  255| 29835 |   246   (0)|  ORCL|      |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("A2"."ID"="A1"."ID")

Remote SQL Information (identified by operation id):
----------------------------------------------------

   3 - SELECT / *+ USE_NL ("A1") */ "ID","NAME","ADDRESS" FROM "B" "A1" (accessing '!' )

Note
-----
   - fully remote statement

Statistics
----------------------------------------------------------
          3  recursive calls
          0  db block gets
          8  consistent gets
          0  physical reads
          0  redo size
     137698  bytes sent via SQL*Net to client
        694  bytes received via SQL*Net from client
         27  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      25600  rows processed

注意观察执行计划中统计信息栏目SQL*Net roundtrips从1708减少到27。当需要将本地表传输到远端关联、再将关联结果传输到本地的时候,我们可以设置arraysize优化SQL。

如果远端表a很大,本地表b也很大,两表关联返回数据量多,这时既不能将远端表a传到本地,也不能将本地表b传到远端,因为无论采用哪种方法,SQL都很慢。我们可以在本地创建一个带有dblink的物化视图,将远端表a的数据刷新到本地,然后再进行关联。

如果SQL语句中有多个dblink源,最好在本地针对每个dblink源建立带有dblink的物化视图,因为多个dblink源之间进行数据传输,网络信息交换会导致严重性能问题。

有时候会使用dblink对数据进行迁移,如果要迁移的数据量很大,我们可以使用批量游标进行迁移。以下是使用批量游标迁移数据的示例(将a@dblink的数据迁移到b):

declare
  cursor cur is
    select id, name, address from a@dblink;
  type cur_type is table of cur%rowtype index by binary_integer; 
  v_cur cur_type;
begin
  open cur;
  loop
    fetch cur bulk collect
      into v_cur limit 100000; 
    forall i in 1 .. v_cur.count
      insert into b
        (id, name, address)
      values
        (v_cur(i).id, v_cur(i).name, v_cur(i).address);
    commit;
    exit when cur%notfound or cur%notfound is null;
  end loop;
  close cur;
  commit;
end;

9、对表进行ROWID切片

对一个很大的分区表进行UPDATE、DELETE,想要加快执行速度,可以按照分区,在不同的会话中对每个分区单独进行UPDATE、DELETE。但是对一个很大的非分区表进行UPDATE、DELETE,如果只在一个会话里面运行SQL,很容易引发UNDO不够,如果会话连接中断,会导致大量数据从UNDO回滚,这将是一场灾难。

对于非分区表,我们可以对表按照ROWID切片,然后开启多个窗口同时执行SQL,这样既能加快执行速度,还能减少对UNDO的占用。

Oracle提供了一个内置函数DBMS_ROWID.ROWID_CREATE()用于生成ROWID。对于一个非分区表,一个表就是一个段(Segment),段是由多个区组成,每个区里面的块物理上是连续的。因此,我们可以根据数据字典DBA_EXTENTS,DBA_OBJECTS关联,然后再利用生成ROWID的内置函数人工生成ROWID。

例如,我们对SCOTT账户下TEST表按照每个Extent进行ROWID切片:

select ' and rowid between ' || '''' ||
       dbms_rowid.rowid_create(1,
                               b.data_object_id,
                               a.relative_fno,
                               a.block_id,
                               0) || '''' || ' and ' || '''' ||
       dbms_rowid.rowid_create(1,
                               b.data_object_id,
                               a.relative_fno,
                               a.block_id + blocks - 1,
                               999) || ''';'
  from dba_extents a, dba_objects b
 where a.segment_name = b.object_name
   and a.owner = b.owner
   and b.object_name = 'TEST'
   and b.owner = 'SCOTT'
 order by a.relative_fno, a.block_id;

切片后生成的部分数据如下所示:

and rowid between 'AAASs5AAEAAB+SIAAA' and 'AAASs5AAEAAB+SPAPn';
and rowid between 'AAASs5AAEAAB+SQAAA' and 'AAASs5AAEAAB+SXAPn';
and rowid between 'AAASs5AAEAAB+SYAAA' and 'AAASs5AAEAAB+SfAPn';
and rowid between 'AAASs5AAEAAB+SgAAA' and 'AAASs5AAEAAB+SnAPn';
and rowid between 'AAASs5AAEAAB+SoAAA' and 'AAASs5AAEAAB+SvAPn';

假如要执行delete test where object_id>50000000,test表有1亿条数据,要删除其中5 000万行数据,我们根据上述方法对表按照ROWID切片:

delete test
 where object_id > 50000000
   and rowid between 'AAASs5AAEAAB+SIAAA' and 'AAASs5AAEAAB+SPAPn';
delete test
 where object_id > 50000000
   and rowid between 'AAASs5AAEAAB+SQAAA' and 'AAASs5AAEAAB+SXAPn';
delete test
 where object_id > 50000000
   and rowid between 'AAASs5AAEAAB+SYAAA' and 'AAASs5AAEAAB+SfAPn';
delete test
 where object_id > 50000000
   and rowid between 'AAASs5AAEAAB+SgAAA' and 'AAASs5AAEAAB+SnAPn';
delete test
 where object_id > 50000000
   and rowid between 'AAASs5AAEAAB+SoAAA' and 'AAASs5AAEAAB+SvAPn';

最后,我们将上述SQL在不同窗口中执行,这样就能加快delete速度,也能减少对UNDO的占用。

上述方法需要手动编辑大量SQL脚本,如果表的Extent很多,这将带来大工作量。我们可以编写存储过程简化上述操作。

因为存储过程需要访问数据字典,我们需要单独授权查询数据字典权限:

grant select on dba_extents to scott;

grant select on dba_objects to scott;

CREATE OR REPLACE PROCEDURE P_ROWID(RANGE NUMBER, ID NUMBER) IS
  CURSOR CUR_ROWID IS
    SELECT DBMS_ROWID.ROWID_CREATE(1,
                                   B.DATA_OBJECT_ID,
                                   A.RELATIVE_FNO,
                                   A.BLOCK_ID,
                                   0) ROWID1,
           DBMS_ROWID.ROWID_CREATE(1,
                                   B.DATA_OBJECT_ID,
                                   A.RELATIVE_FNO,
                                   A.BLOCK_ID + BLOCKS - 1,
                                   999) ROWID2
      FROM DBA_EXTENTS A, DBA_OBJECTS B
     WHERE A.SEGMENT_NAME = B.OBJECT_NAME
       AND A.OWNER = B.OWNER
       AND B.OBJECT_NAME = 'TEST'
       AND B.OWNER = 'SCOTT'
       AND MOD(A.EXTENT_ID, RANGE) = ID;
  V_SQL VARCHAR2(4000);
BEGIN
  FOR CUR IN CUR_ROWID LOOP
    V_SQL := 'delete test where object_id > 100 and rowid between :1 and :2';
    EXECUTE IMMEDIATE V_SQL
      USING CUR.ROWID1, CUR.ROWID2;
    COMMIT;
  END LOOP;
END;
/

如果要将表切分为6份,我们可以在6个窗口中依次执行:

begin
  p_rowid(6, 0);
end;
/
begin
  p_rowid(6, 1);
end;
/
begin
  p_rowid(6, 2);
end;
/
begin
  p_rowid(6, 3);
end;
/
begin
  p_rowid(6, 4);
end;
/
begin
  p_rowid(6, 5);
end;
/

这样就达到了将表按ROWID切片的目的。在工作中,大家可以根据自己的具体需求对存储过程稍作修改(阴影部分)。

10、SQL三段分拆法

如果要优化的SQL很长,我们可以将SQL拆分为三段,这样就能快速判断SQL在写法上是否容易产生性能问题。下面就是SQL三段拆分方法:

select ....第一段.... from ....第二段.... where ....第三段....

select与from之间最好不要有标量子查询,也不要有自定义函数。因为有标量子查询或者是自定义函数,会导致子查询或者函数中的表被反复扫描。

from与where之间要关注大表,因为大表很容易引起性能问题;同时要留意子查询和视图,如果有子查询或者视图,要单独运行,看运行得快或是慢,如果运行慢需要单独优化;另外要注意子查询/视图是否可以谓词推入,是否会视图合并;最后还要留意表与表之间是内连接还是外连接,因为外连接会导致嵌套循环无法改驱动表。

where后面需要特别注意子查询,要能判断各种子查询写法是否可以展开(unnest),同时也要注意where过滤条件,尽量不要在where过滤列上使用函数,这样会导致列不走索引。

在工作中,我们要养成利用SQL三段分拆方法的习惯,这样能大大提升SQL优化的速度。

你可能感兴趣的:(#,性能优化,#,关系型数据库,sql,数据库)