今天在pub上看见网友提到了关于索引的顺序读,谈谈个人自己对索引的理解吧!走索引(这里的索引不包括全索引扫描和全索引快速扫描)和全表扫描一直是我们这些初学者对cbo执行计划的迷糊地方,何时走索引,为什么不走索引,群里的朋友经常问到这些问题,其实自己之前的blog也多多少少提到了索引的一些知识点,自己多索引的理解也是从最开始只知道索引效率高到现在慢慢知道clustering factor,结构,存储数据的原理,如何查找数据。
简要摘要一下clustering factor的理解。
Clutering factor
索引聚簇因子,也就是表中row存储的顺序,clustering factor越低,相应的rows存储越集中,相反则rows存储越分散。
全表扫描oracle采取的是多块读,而索引扫描采取的是单块读取,当clustering factor过大时,oracle会重复读取多个数据块,将导致I/O消耗,而sql的cost最重要的也就是I/O,cpu和network,所以cbo很有可能会选择它认为cost最小的执行计划,从而影响sql执行效率。
Clustering factor也可以认为是通过索引扫描一张表需要访问的表的数据块的数量,I/O影响。
Clustering factore的取值是如何计算出来的:
其实也是通过index,比较row的当前rowid和前一行的rowid,如果相邻两个rowid不属于同一数据块(在index中的rowid是受限rowid,由文件号block_id和行号组成)则cluster factor增加1.然后求和clutering factor的值,而计算的索引的cost则是由clustering factor乘以某个选择性参数即时访问索引开销。
SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod
PL/SQL Release 10.2.0.1.0 - Production
CORE10.2.0.1.0 Production
TNS for 32-bit Windows: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production
SQL> show parameter optimizer_index_cost_adj;
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
optimizer_index_cost_adj integer 50
SQL> declare
2 begin
3 for i in 1..5 loop
4 insert into test31 select * from dba_objects order by owner;
5 commit;
6 end loop;
7 end;
8 /
PL/SQL procedure successfully completed
SQL> create index index_test31 on test31(object_id);
Index created
SQL> execute dbms_stats.gather_table_stats('ashuang','test31',cascade=>true);
PL/SQL procedure successfully completed
SQL> select blocks from user_tables where table_name=upper('test31');
BLOCKS
----------
3520
SQL> select clustering_factor from user_indexes where index_name=upper('index_test31');
CLUSTERING_FACTOR
-----------------
255260
SQL> select blocks,num_rows from user_tables where table_name=upper('test31');
BLOCKS NUM_ROWS
---------- ----------
3520 255260
--此时num_rows和clustering_factor接近,数据已经完全分散
SQL> explain plan for select * from test31 where object_id in ('60998','8789','7889');
Explained
SQL> select * from table(dbms_xplan.display());
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 3334622187
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 1302 | 9 (0
| 1 | INLIST ITERATOR | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| TEST31 | 14 | 1302 | 9 (0
|* 3 | INDEX RANGE SCAN | INDEX_TEST31 | 14 | | 1 (0
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("OBJECT_ID"=7889 OR "OBJECT_ID"=8789 OR "OBJECT_ID"=60998)
15 rows selected
此时走索引,这个cost相对来说较小。
SQL> alter system flush buffer_cache;
系统已更改。
SQL> select /*+index(test31 index_test31)*/ * from test31 where object_id>5000 and object_id<6500;
已选择6950行。
执行计划
----------------------------------------------------------
Plan hash value: 2819306605
--------------------------------------------------------------------------------
------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)
| Time |
--------------------------------------------------------------------------------
------------
| 0 | SELECT STATEMENT | | 6957 | 631K| 3492 (1)
| 00:00:42 |
| 1 | TABLE ACCESS BY INDEX ROWID| TEST31 | 6957 | 631K| 3492 (1)
| 00:00:42 |
|* 2 | INDEX RANGE SCAN | INDEX_TEST31 | 6957 | | 9 (0)
| 00:00:01 |
--------------------------------------------------------------------------------
------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJECT_ID">5000 AND "OBJECT_ID"<6500)
统计信息
----------------------------------------------------------
1 recursive calls
0 db block gets
7429 consistent gets
183 physical reads
0 redo size
690491 bytes sent via SQL*Net to client
5478 bytes received via SQL*Net from client
465 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
6950 rows processed
--强制走索引cost 3492,物理读183
SQL> alter system flush buffer_cache;
系统已更改。
SQL> select * from test31 where object_id>5000 and object_id<6500;
已选择6950行。
执行计划
----------------------------------------------------------
Plan hash value: 1490571929
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6957 | 631K| 696 (3)| 00:00:09 |
|* 1 | TABLE ACCESS FULL| TEST31 | 6957 | 631K| 696 (3)| 00:00:09 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("OBJECT_ID"<6500 AND "OBJECT_ID">5000)
统计信息
----------------------------------------------------------
0 recursive calls
0 db block gets
3990 consistent gets
3521 physical reads
0 redo size
337756 bytes sent via SQL*Net to client
5478 bytes received via SQL*Net from client
465 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
6950 rows processed
--全表扫描cost只有696,但是物理读却是3521。
虽然cbo按照cost默认选择了全表扫描,但是实际物理读却是index比全表扫描小很多。
虽然在sql优化中我们一直是强调降低物理读,也就是即时读db block geys和consistent reads一致性读,因为降低了逻辑度,物理读自然而然的会降下来。
不过对于此种情况具体是强制走索引更适合了还是cbo默认的全表扫描了,虽然逻辑读上全表扫描会好点,但是实际的物理读我们差异巨大,由于全表扫描后buffer的存储在lru的lru末端,优先会被覆盖,如果下次再次查询还是会引起较大的物理读,此时强制索引可能会是更好地选择!毕竟I/O是很消耗资源的操作!
SQL> create table test32 as select * from test31;
Table created
SQL> truncate table test31;
表被截断。
SQL> insert into test31 select * from test32 order by object_id;
--让其test31有序排序。
SQL> alter index index_test31 rebuild;
Index altered
SQL> execute dbms_stats.gather_table_stats('ashuang','test31',cascade=>true);
PL/SQL procedure successfully completed
SQL> select blocks,num_rows from user_tables where table_name=upper('test31');
BLOCKS NUM_ROWS
---------- ----------
3520 255270
SQL> select clustering_factor from user_indexes where index_name=upper('index_test31');
CLUSTERING_FACTOR
-----------------
4137
--此时clustering factor和blocks大致接近
SQL> explain plan for select * from test31 where object_id>5000 and object_id<6500;
Explained
SQL> select * from table(dbms_xplan.display());
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2819306605
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6957 | 631K| 66 (0)
| 1 | TABLE ACCESS BY INDEX ROWID| TEST31 | 6957 | 631K| 66 (0)
|* 2 | INDEX RANGE SCAN | INDEX_TEST31 | 6957 | | 9 (0)
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJECT_ID">5000 AND "OBJECT_ID"<6500)
14 rows selected
--cbo此时默认选择了index range scan。
通过以上测试可以看出clutering factor是索引是否能在cbo中执行的一个标志,其值越低越让cbo高效利用索引,不过不要以为rebuild重建索引可以降低clustering factor,clustering factor只跟rows的顺序相关。还有一个distinct_keys也就是表中的不同记录的值,distinct_keys和表中的num_rows值越接近,则索引的选择性越高,可以收集列信息的直方图histogram,让cbo可以获得更多的统计分析信息,从而让cbo选择最正确的执行计划!