clustering factor索引聚簇因子和执行计划

今天在pub上看见网友提到了关于索引的顺序读,谈谈个人自己对索引的理解吧!走索引(这里的索引不包括全索引扫描和全索引快速扫描)和全表扫描一直是我们这些初学者对cbo执行计划的迷糊地方,何时走索引,为什么不走索引,群里的朋友经常问到这些问题,其实自己之前的blog也多多少少提到了索引的一些知识点,自己多索引的理解也是从最开始只知道索引效率高到现在慢慢知道clustering factor,结构,存储数据的原理,如何查找数据。

简要摘要一下clustering factor的理解。

 

Clutering factor

索引聚簇因子,也就是表中row存储的顺序,clustering factor越低,相应的rows存储越集中,相反则rows存储越分散。

全表扫描oracle采取的是多块读,而索引扫描采取的是单块读取,当clustering factor过大时,oracle会重复读取多个数据块,将导致I/O消耗,而sqlcost最重要的也就是I/Ocpunetwork,所以cbo很有可能会选择它认为cost最小的执行计划,从而影响sql执行效率。

Clustering factor也可以认为是通过索引扫描一张表需要访问的表的数据块的数量,I/O影响。

 

Clustering factore的取值是如何计算出来的:

其实也是通过index,比较row的当前rowid和前一行的rowid,如果相邻两个rowid不属于同一数据块(在index中的rowid是受限rowid,由文件号block_id和行号组成)则cluster factor增加1.然后求和clutering factor的值,而计算的索引的cost则是由clustering factor乘以某个选择性参数即时访问索引开销。

 

SQL> select * from v$version;

 

BANNER

----------------------------------------------------------------

Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod

PL/SQL Release 10.2.0.1.0 - Production

CORE10.2.0.1.0 Production

TNS for 32-bit Windows: Version 10.2.0.1.0 - Production

NLSRTL Version 10.2.0.1.0 - Production

 

SQL> show parameter optimizer_index_cost_adj;

 

NAME TYPE VALUE

------------------------------------ ----------- ------------------------------

optimizer_index_cost_adj integer 50

SQL> declare

2 begin

3 for i in 1..5 loop

4 insert into test31 select * from dba_objects order by owner;

5 commit;

6 end loop;

7 end;

8 /

 

PL/SQL procedure successfully completed

 

SQL> create index index_test31 on test31(object_id);

 

Index created

SQL> execute dbms_stats.gather_table_stats('ashuang','test31',cascade=>true);

 

PL/SQL procedure successfully completed

 

SQL> select blocks from user_tables where table_name=upper('test31');

 

BLOCKS

----------

3520

SQL> select clustering_factor from user_indexes where index_name=upper('index_test31');

 

CLUSTERING_FACTOR

-----------------

255260

SQL> select blocks,num_rows from user_tables where table_name=upper('test31');

 

BLOCKS NUM_ROWS

---------- ----------

3520 255260

--此时num_rowsclustering_factor接近,数据已经完全分散

SQL> explain plan for select * from test31 where object_id in ('60998','8789','7889');

 

Explained

 

SQL> select * from table(dbms_xplan.display());

 

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

Plan hash value: 3334622187

--------------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU

--------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 14 | 1302 | 9 (0

| 1 | INLIST ITERATOR | | | |

| 2 | TABLE ACCESS BY INDEX ROWID| TEST31 | 14 | 1302 | 9 (0

|* 3 | INDEX RANGE SCAN | INDEX_TEST31 | 14 | | 1 (0

--------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

3 - access("OBJECT_ID"=7889 OR "OBJECT_ID"=8789 OR "OBJECT_ID"=60998)

 

15 rows selected

 

此时走索引,这个cost相对来说较小。

SQL> alter system flush buffer_cache;

 

系统已更改。

 

SQL> select /*+index(test31 index_test31)*/ * from test31 where object_id>5000 and object_id<6500;

 

已选择6950行。

 

 

执行计划

----------------------------------------------------------

Plan hash value: 2819306605

 

--------------------------------------------------------------------------------

------------

 

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)

| Time |

 

--------------------------------------------------------------------------------

------------

 

| 0 | SELECT STATEMENT | | 6957 | 631K| 3492 (1)

| 00:00:42 |

 

| 1 | TABLE ACCESS BY INDEX ROWID| TEST31 | 6957 | 631K| 3492 (1)

| 00:00:42 |

 

|* 2 | INDEX RANGE SCAN | INDEX_TEST31 | 6957 | | 9 (0)

| 00:00:01 |

 

--------------------------------------------------------------------------------

------------

 

 

Predicate Information (identified by operation id):

---------------------------------------------------

 

2 - access("OBJECT_ID">5000 AND "OBJECT_ID"<6500)

 

 

统计信息

----------------------------------------------------------

1 recursive calls

0 db block gets

7429 consistent gets

183 physical reads

0 redo size

690491 bytes sent via SQL*Net to client

5478 bytes received via SQL*Net from client

465 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

6950 rows processed

--强制走索引cost 3492,物理读183

SQL> alter system flush buffer_cache;

 

系统已更改。

 

SQL> select * from test31 where object_id>5000 and object_id<6500;

 

已选择6950行。

 

 

执行计划

----------------------------------------------------------

Plan hash value: 1490571929

 

----------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

----------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 6957 | 631K| 696 (3)| 00:00:09 |

|* 1 | TABLE ACCESS FULL| TEST31 | 6957 | 631K| 696 (3)| 00:00:09 |

----------------------------------------------------------------------------

 

Predicate Information (identified by operation id):

---------------------------------------------------

 

1 - filter("OBJECT_ID"<6500 AND "OBJECT_ID">5000)

 

 

统计信息

----------------------------------------------------------

0 recursive calls

0 db block gets

3990 consistent gets

3521 physical reads

0 redo size

337756 bytes sent via SQL*Net to client

5478 bytes received via SQL*Net from client

465 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

6950 rows processed

--全表扫描cost只有696,但是物理读却是3521

虽然cbo按照cost默认选择了全表扫描,但是实际物理读却是index比全表扫描小很多。

虽然在sql优化中我们一直是强调降低物理读,也就是即时读db block geys和consistent reads一致性读,因为降低了逻辑度,物理读自然而然的会降下来

不过对于此种情况具体是强制走索引更适合了还是cbo默认的全表扫描了,虽然逻辑读上全表扫描会好点,但是实际的物理读我们差异巨大,由于全表扫描后buffer的存储在lru的lru末端,优先会被覆盖,如果下次再次查询还是会引起较大的物理读,此时强制索引可能会是更好地选择!毕竟I/O是很消耗资源的操作!

SQL> create table test32 as select * from test31;

 

Table created

SQL> truncate table test31;

 

表被截断。

 

SQL> insert into test31 select * from test32 order by object_id;

--让其test31有序排序。

SQL> alter index index_test31 rebuild;

 

Index altered

 

SQL> execute dbms_stats.gather_table_stats('ashuang','test31',cascade=>true);

 

PL/SQL procedure successfully completed

 

SQL> select blocks,num_rows from user_tables where table_name=upper('test31');

 

BLOCKS NUM_ROWS

---------- ----------

3520 255270

SQL> select clustering_factor from user_indexes where index_name=upper('index_test31');

 

CLUSTERING_FACTOR

-----------------

4137

--此时clustering factorblocks大致接近

SQL> explain plan for select * from test31 where object_id>5000 and object_id<6500;

 

Explained

 

SQL> select * from table(dbms_xplan.display());

 

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

Plan hash value: 2819306605

--------------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)

--------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 6957 | 631K| 66 (0)

| 1 | TABLE ACCESS BY INDEX ROWID| TEST31 | 6957 | 631K| 66 (0)

|* 2 | INDEX RANGE SCAN | INDEX_TEST31 | 6957 | | 9 (0)

--------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - access("OBJECT_ID">5000 AND "OBJECT_ID"<6500)

 

14 rows selected

--cbo此时默认选择了index range scan

 

通过以上测试可以看出clutering factor是索引是否能在cbo中执行的一个标志,其值越低越让cbo高效利用索引,不过不要以为rebuild重建索引可以降低clustering factorclustering factor只跟rows的顺序相关。还有一个distinct_keys也就是表中的不同记录的值,distinct_keys和表中的num_rows值越接近,则索引的选择性越高,可以收集列信息的直方图histogram,让cbo可以获得更多的统计分析信息,从而让cbo选择最正确的执行计划!

你可能感兴趣的:(执行计划,职场,clustering,休闲,factor,索引聚簇因子)