在理论学习中,我们了解到,索引的聚簇因子(clustering_factor)对CBO是否选择使用索引有很大的影响。所以,首先通过以下模拟实验来加深印象:
创建测试表t0403a,共两列(ID列和COL1列),其中ID列为一个1000以内的随机数。然后在ID列上创建索引。这样做的目的就是想让该索引的聚簇因子较大。因为用这种方式创建的表中数据存放顺序与ID的大小是完全不相关的,即是混乱的,不是有序的。
SQL> create table t0403a as select ceil(dbms_random.value*1000) id,rpad(rownum,50,'a') col1 from dual connect by rownum<=1000;
Table created.
SQL> create index ind_t0403a on t0403a(id);
Index created.
SQL> exec dbms_stats.gather_table_stats(ownname=>'SYS',tabname=>'T0403A',estimate_percent=>100);
PL/SQL procedure successfully completed.
SQL> set autotrace on;
SQL> select * from t0403a where id<100;
ID COL1
---------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
57 12aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
12 36aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
47 38aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
40 39aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
69 42aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
59 47aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
32 48aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
31 50aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
32 58aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
69 67aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
77 68aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
--为节省篇幅,截短了输出
83 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1941751419
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | |97 | 5335 | 4 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T0403A |97 | 5335 | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("ID"<100)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
22 consistent gets
0 physical reads
0 redo size
6183 bytes sent via SQL*Net to client
579 bytes received via SQL*Net from client
7 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
83 rows processed
SQL> select * from t0403a where id<6;
ID COL1
---------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
3 433aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
5 704aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Execution Plan
----------------------------------------------------------
Plan hash value: 1941751419
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 165 | 4 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T0403A | 3 | 165 | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("ID"<6)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
750 bytes sent via SQL*Net to client
524 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2 rows processed
--不断实验,发现直至ID<5,实际输出行为1行时,即大约为总记录的千分之一时,才使用了索引。
SQL> select * from t0403a where id<5;
ID COL1
---------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
3 433aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Execution Plan
----------------------------------------------------------
Plan hash value: 2057097983
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 110 | 4 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T0403A | 2 | 110 | 4 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IND_T0403A | 2 | | 2 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ID"<5)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
641 bytes sent via SQL*Net to client
524 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL> select * from t0403a where id<6;
ID COL1
---------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
3 433aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
5 704aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Execution Plan
----------------------------------------------------------
Plan hash value: 1941751419
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 165 | 4 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T0403A | 3 | 165 | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("ID"<6)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
750 bytes sent via SQL*Net to client
524 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2 rows processed
SQL> set autotrace off
--查看索引的聚簇因子,为873
SQL> select clustering_factor from user_indexes where index_name='IND_T0403A';
CLUSTERING_FACTOR
-----------------
873
--查看表中数据使用的数据块的数量,为9个。
SQL> select blocks from user_tables where table_name='T0403A';
BLOCKS
----------
9
--再实验一下数据的存放顺序与索引的顺序高度一致的情况
SQL> drop table t0403a purge;
Table dropped.
SQL> create table t0403a as select rownum id,rpad(rownum,50,'a') col1 from dual connect by rownum<=1000;
Table created.
SQL> create index ind_t0403a on t0403a(id);
Index created.
SQL> exec dbms_stats.gather_table_stats(ownname=>'SYS',tabname=>'T0403A',estimate_percent=>100);
PL/SQL procedure successfully completed.
--此时索引的聚簇因子为9
SQL> select clustering_factor from user_indexes where index_name='IND_T0403A';
CLUSTERING_FACTOR
-----------------
9
--表大小没有变化,所以,表中数据所占用的数据块数仍为9
SQL> select blocks from user_tables where table_name='T0403A';
BLOCKS
----------
9
--再看一下这时,索引的表现
SQL> set autotrace on;
SQL> select * from t0403a where id<100;
ID COL1
---------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 1aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
2 2aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
3 3aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
4 4aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
5 5aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
6 6aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
7 7aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
8 8aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
9 9aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
10 10aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
11 11aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
--为节省篇幅,截短了输出
99 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2057097983
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 99 | 5445 | 3 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T0403A | 99 | 5445 | 3 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IND_T0403A | 99 | | 2 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ID"<100)
Statistics
----------------------------------------------------------
10 recursive calls
0 db block gets
36 consistent gets
0 physical reads
0 redo size
7589 bytes sent via SQL*Net to client
590 bytes received via SQL*Net from client
8 SQL*Net roundtrips to/from client
4 sorts (memory)
0 sorts (disk)
99 rows processed
--从上可见,当输出行数为99行,占表中总行数的近10%时,仍可以使用索引。
--而且,继续不断尝试,发现直至id<223,输出行数为222行时,占表中总行数约22%时,仍可以使用索引。
SQL> select * from t0403a where id<223;
ID COL1
---------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 1aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
2 2aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
3 3aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
4 4aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
5 5aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
6 6aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
7 7aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
8 8aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
9 9aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
10 10aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
11 11aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
--为节省篇幅,截短了输出
222 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2057097983
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 222 | 12210 | 4 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T0403A | 222 | 12210 | 4 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IND_T0403A | 222 | | 2 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ID"<223)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
34 consistent gets
0 physical reads
0 redo size
16455 bytes sent via SQL*Net to client
678 bytes received via SQL*Net from client
16 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
222 rows processed
SQL> select * from t0403a where id<224;
ID COL1
---------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 1aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
2 2aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
3 3aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
4 4aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
5 5aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
6 6aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
7 7aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
8 8aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
9 9aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
10 10aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
11 11aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
--为节省篇幅,截短了输出
223 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1941751419
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 223 | 12265 | 4 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T0403A | 223 | 12265 | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("ID"<224)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
26 consistent gets
0 physical reads
0 redo size
15679 bytes sent via SQL*Net to client
678 bytes received via SQL*Net from client
16 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
223 rows processed
SQL>
通过以上实验,说明索引的聚簇因子,会严重影响索引能否被使用。当表中数据的存储顺序与索引的排列顺序差异较大时,几乎只有单行返回的查询语句才能用上索引。反之,当表中数据的存储顺序与索引的排列顺序高度一致时,即使返回的行数占总行数的超过20%,仍可以用到索引。
但为什么会这样呢,这是因为当使用聚簇因子较高的索引时,其COST较高,当其高于全表扫描的代价时,CBO就会选择此时COST更小的全表扫描方法了。
CBO在计算索引范围扫描(IRS)的成本时,使用如下的公式:
IRS COST=I/O COST + CPU COST
其中I/O COST=INDEX ACCESS I/O COST + TABLE ACCESS I/O COST
进一步:
INDEX ACCESS I/O COST=BLEVEL+CEIL(#LEAF_BLOCKS*IX_SEL)
TABLE ACCESS I/O COST=CEIL(CLUSTERING_FACTOR*IX_SEL_WITH_FILTERS)
这就可以看到,对于一个使用同样的SQL创建的索引,其IX_SEL(索引选择率)和IX_SEL_WITH_FILTERS(带过滤的索引选择率)(注1)是一样的。但如上面实验上所示,索引是一样的,但如果数据的存放顺序是不一样的,其聚簇因子是会相差很大的。所以,我们可以得到的第一个推论就是:索引的聚簇因子越大,其进行索引范围扫描的COST越大。
注1:写此博文时,未找到对“IX_SEL_WITH_FILTERS”的明确解释,所以,仅从字面上进行了翻译和理解。