索引 的聚簇因子对能否用到该索引的影响的实验

    在理论学习中,我们了解到,索引的聚簇因子(clustering_factor)对CBO是否选择使用索引有很大的影响。所以,首先通过以下模拟实验来加深印象:

    创建测试表t0403a,共两列(ID列和COL1列),其中ID列为一个1000以内的随机数。然后在ID列上创建索引。这样做的目的就是想让该索引的聚簇因子较大。因为用这种方式创建的表中数据存放顺序与ID的大小是完全不相关的,即是混乱的,不是有序的。

SQL> create table t0403a as select ceil(dbms_random.value*1000) id,rpad(rownum,50,'a') col1 from dual connect by rownum<=1000;


Table created.


SQL> create index ind_t0403a on t0403a(id);


Index created.


SQL> exec dbms_stats.gather_table_stats(ownname=>'SYS',tabname=>'T0403A',estimate_percent=>100);


PL/SQL procedure successfully completed.


SQL> set autotrace on;


SQL> select * from t0403a where id<100;


ID COL1

---------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

57 12aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

12 36aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

47 38aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

40 39aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

69 42aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

59 47aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

32 48aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

31 50aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

32 58aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

69 67aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

77 68aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

--为节省篇幅,截短了输出


83 rows selected.



Execution Plan

----------------------------------------------------------

Plan hash value: 1941751419


----------------------------------------------------------------------------

| Id  | Operation  | Name   | Rows  | Bytes | Cost (%CPU)| Time   |

----------------------------------------------------------------------------

|   0 | SELECT STATEMENT  |   |97 |  5335 | 4   (0)| 00:00:01 |

|*  1 |  TABLE ACCESS FULL| T0403A |97 |  5335 | 4   (0)| 00:00:01 |

----------------------------------------------------------------------------


Predicate Information (identified by operation id):

---------------------------------------------------


   1 - filter("ID"<100)



Statistics

----------------------------------------------------------

 1  recursive calls

 0  db block gets

22  consistent gets

 0  physical reads

 0  redo size

       6183  bytes sent via SQL*Net to client

579  bytes received via SQL*Net from client

 7  SQL*Net roundtrips to/from client

 1  sorts (memory)

 0  sorts (disk)

83  rows processed


SQL> select * from t0403a where id<6;


ID COL1

---------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

3 433aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

5 704aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa



Execution Plan

----------------------------------------------------------

Plan hash value: 1941751419


----------------------------------------------------------------------------

| Id  | Operation  | Name   | Rows  | Bytes | Cost (%CPU)| Time   |

----------------------------------------------------------------------------

|   0 | SELECT STATEMENT  |   | 3 |   165 | 4   (0)| 00:00:01 |

|*  1 |  TABLE ACCESS FULL| T0403A | 3 |   165 | 4   (0)| 00:00:01 |

----------------------------------------------------------------------------


Predicate Information (identified by operation id):

---------------------------------------------------


   1 - filter("ID"<6)



Statistics

----------------------------------------------------------

 0  recursive calls

 0  db block gets

 5  consistent gets

 0  physical reads

 0  redo size

750  bytes sent via SQL*Net to client

524  bytes received via SQL*Net from client

 2  SQL*Net roundtrips to/from client

 0  sorts (memory)

 0  sorts (disk)

 2  rows processed


--不断实验,发现直至ID<5,实际输出行为1行时,即大约为总记录的千分之一时,才使用了索引。

SQL> select * from t0403a where id<5;


ID COL1

---------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

3 433aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa



Execution Plan

----------------------------------------------------------

Plan hash value: 2057097983


------------------------------------------------------------------------------------------

| Id  | Operation    | Name | Rows  | Bytes | Cost (%CPU)| Time |

------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT    | |     2 |   110 |     4   (0)| 00:00:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID| T0403A |     2 |   110 |     4   (0)| 00:00:01 |

|*  2 |   INDEX RANGE SCAN    | IND_T0403A |     2 | |     2   (0)| 00:00:01 |

------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):

---------------------------------------------------


   2 - access("ID"<5)



Statistics

----------------------------------------------------------

 1  recursive calls

 0  db block gets

 4  consistent gets

 0  physical reads

 0  redo size

641  bytes sent via SQL*Net to client

524  bytes received via SQL*Net from client

 2  SQL*Net roundtrips to/from client

 0  sorts (memory)

 0  sorts (disk)

 1  rows processed


SQL> select * from t0403a where id<6;


ID COL1

---------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

3 433aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

5 704aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa



Execution Plan

----------------------------------------------------------

Plan hash value: 1941751419


----------------------------------------------------------------------------

| Id  | Operation  | Name   | Rows  | Bytes | Cost (%CPU)| Time   |

----------------------------------------------------------------------------

|   0 | SELECT STATEMENT  |   | 3 |   165 | 4   (0)| 00:00:01 |

|*  1 |  TABLE ACCESS FULL| T0403A | 3 |   165 | 4   (0)| 00:00:01 |

----------------------------------------------------------------------------


Predicate Information (identified by operation id):

---------------------------------------------------


   1 - filter("ID"<6)



Statistics

----------------------------------------------------------

 0  recursive calls

 0  db block gets

 5  consistent gets

 0  physical reads

 0  redo size

750  bytes sent via SQL*Net to client

524  bytes received via SQL*Net from client

 2  SQL*Net roundtrips to/from client

 0  sorts (memory)

 0  sorts (disk)

 2  rows processed


SQL> set autotrace off

--查看索引的聚簇因子,为873

SQL> select clustering_factor from user_indexes where index_name='IND_T0403A';


CLUSTERING_FACTOR

-----------------

     873


--查看表中数据使用的数据块的数量,为9个。

SQL> select blocks from user_tables where table_name='T0403A';


    BLOCKS

----------

9


--再实验一下数据的存放顺序与索引的顺序高度一致的情况

SQL> drop table t0403a purge;


Table dropped.


SQL> create table t0403a as select rownum id,rpad(rownum,50,'a') col1 from dual connect by rownum<=1000;


Table created.


SQL> create index ind_t0403a on t0403a(id);


Index created.


SQL> exec dbms_stats.gather_table_stats(ownname=>'SYS',tabname=>'T0403A',estimate_percent=>100);


PL/SQL procedure successfully completed.


--此时索引的聚簇因子为9

SQL> select clustering_factor from user_indexes where index_name='IND_T0403A';


CLUSTERING_FACTOR

-----------------

9


--表大小没有变化,所以,表中数据所占用的数据块数仍为9

SQL> select blocks from user_tables where table_name='T0403A';


    BLOCKS

----------

9


--再看一下这时,索引的表现

SQL> set autotrace on;

SQL> select * from t0403a where id<100;


ID COL1

---------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

1 1aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

2 2aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

3 3aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

4 4aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

5 5aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

6 6aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

7 7aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

8 8aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

9 9aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

10 10aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

11 11aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

--为节省篇幅,截短了输出


99 rows selected.



Execution Plan

----------------------------------------------------------

Plan hash value: 2057097983


------------------------------------------------------------------------------------------

| Id  | Operation    | Name | Rows  | Bytes | Cost (%CPU)| Time |

------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT    | |    99 |  5445 |     3   (0)| 00:00:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID| T0403A |    99 |  5445 |     3   (0)| 00:00:01 |

|*  2 |   INDEX RANGE SCAN    | IND_T0403A |    99 | |     2   (0)| 00:00:01 |

------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):

---------------------------------------------------


   2 - access("ID"<100)



Statistics

----------------------------------------------------------

10  recursive calls

 0  db block gets

36  consistent gets

 0  physical reads

 0  redo size

       7589  bytes sent via SQL*Net to client

590  bytes received via SQL*Net from client

 8  SQL*Net roundtrips to/from client

 4  sorts (memory)

 0  sorts (disk)

99  rows processed


--从上可见,当输出行数为99行,占表中总行数的近10%时,仍可以使用索引。

--而且,继续不断尝试,发现直至id<223,输出行数为222行时,占表中总行数约22%时,仍可以使用索引。



SQL> select * from t0403a where id<223;


ID COL1

---------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

1 1aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

2 2aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

3 3aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

4 4aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

5 5aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

6 6aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

7 7aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

8 8aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

9 9aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

10 10aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

11 11aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

--为节省篇幅,截短了输出


222 rows selected.



Execution Plan

----------------------------------------------------------

Plan hash value: 2057097983


------------------------------------------------------------------------------------------

| Id  | Operation    | Name | Rows  | Bytes | Cost (%CPU)| Time |

------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT    | |   222 | 12210 |     4   (0)| 00:00:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID| T0403A |   222 | 12210 |     4   (0)| 00:00:01 |

|*  2 |   INDEX RANGE SCAN    | IND_T0403A |   222 | |     2   (0)| 00:00:01 |

------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):

---------------------------------------------------


   2 - access("ID"<223)



Statistics

----------------------------------------------------------

 1  recursive calls

 0  db block gets

34  consistent gets

 0  physical reads

 0  redo size

      16455  bytes sent via SQL*Net to client

678  bytes received via SQL*Net from client

16  SQL*Net roundtrips to/from client

 0  sorts (memory)

 0  sorts (disk)

222  rows processed


SQL> select * from t0403a where id<224;


ID COL1

---------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

1 1aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

2 2aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

3 3aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

4 4aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

5 5aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

6 6aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

7 7aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

8 8aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

9 9aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

10 10aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

11 11aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa


--为节省篇幅,截短了输出


223 rows selected.



Execution Plan

----------------------------------------------------------

Plan hash value: 1941751419


----------------------------------------------------------------------------

| Id  | Operation  | Name   | Rows  | Bytes | Cost (%CPU)| Time   |

----------------------------------------------------------------------------

|   0 | SELECT STATEMENT  |   |   223 | 12265 | 4   (0)| 00:00:01 |

|*  1 |  TABLE ACCESS FULL| T0403A |   223 | 12265 | 4   (0)| 00:00:01 |

----------------------------------------------------------------------------


Predicate Information (identified by operation id):

---------------------------------------------------


   1 - filter("ID"<224)



Statistics

----------------------------------------------------------

 1  recursive calls

 0  db block gets

26  consistent gets

 0  physical reads

 0  redo size

      15679  bytes sent via SQL*Net to client

678  bytes received via SQL*Net from client

16  SQL*Net roundtrips to/from client

 0  sorts (memory)

 0  sorts (disk)

223  rows processed


SQL> 


    通过以上实验,说明索引的聚簇因子,会严重影响索引能否被使用。当表中数据的存储顺序与索引的排列顺序差异较大时,几乎只有单行返回的查询语句才能用上索引。反之,当表中数据的存储顺序与索引的排列顺序高度一致时,即使返回的行数占总行数的超过20%,仍可以用到索引。


    但为什么会这样呢,这是因为当使用聚簇因子较高的索引时,其COST较高,当其高于全表扫描的代价时,CBO就会选择此时COST更小的全表扫描方法了。

    CBO在计算索引范围扫描(IRS)的成本时,使用如下的公式:

  IRS COST=I/O COST + CPU COST

  其中I/O COST=INDEX ACCESS I/O COST + TABLE ACCESS I/O COST

  进一步:

        INDEX ACCESS I/O COST=BLEVEL+CEIL(#LEAF_BLOCKS*IX_SEL)

        TABLE ACCESS I/O COST=CEIL(CLUSTERING_FACTOR*IX_SEL_WITH_FILTERS)

  

    这就可以看到,对于一个使用同样的SQL创建的索引,其IX_SEL(索引选择率)和IX_SEL_WITH_FILTERS(带过滤的索引选择率)(注1)是一样的。但如上面实验上所示,索引是一样的,但如果数据的存放顺序是不一样的,其聚簇因子是会相差很大的。所以,我们可以得到的第一个推论就是:索引的聚簇因子越大,其进行索引范围扫描的COST越大。

    


 注1:写此博文时,未找到对“IX_SEL_WITH_FILTERS”的明确解释,所以,仅从字面上进行了翻译和理解。  

你可能感兴趣的:(聚簇因子)