分析和收集统计信息
收集统计信息对于CBO来说是非常重要的,我们通过下面的列子来说明
1、创建测试环境
1)创建测试表
SQL> create table t as select * from dba_objects;
Table created
2)创建索引
SQL> create index ind_t_id on t(object_id);
Index created
3)查看表相关信息
SQL> select a.NUM_ROWS,a.AVG_ROW_LEN,a.BLOCKS,a.LAST_ANALYZED from user_tables a where a.TABLE_NAME='T';
NUM_ROWS AVG_ROW_LEN BLOCKS LAST_ANALYZED
---------- ----------- ---------- -------------
4)查看表相关信息
SQL> select b.blevel,b.leaf_blocks,b.distinct_keys,b.last_analyzed from user_indexes b where b.table_name='T';
BLEVEL LEAF_BLOCKS DISTINCT_KEYS LAST_ANALYZED
---------- ----------- ------------- -------------
1 112 50824 2011/12/9 16:
神马情况,怎么索引里面看到东西呢,再来一次,还是分析了,我去。再试一种方案
2、创建测试表
1)创建测试环境
SQL> drop table t;
Table dropped
SQL> create table t as select * from dba_objects where 1=2;
Table created
SQL> create index ind_t_id on t(object_id);
Index created
SQL> insert into t select * from dba_objects;
50825 rows inserted
SQL> commit;
Commit complete
2)查看相关信息
SQL> select a.NUM_ROWS,a.AVG_ROW_LEN,a.BLOCKS,a.LAST_ANALYZED from user_tables a where a.TABLE_NAME='T';
NUM_ROWS AVG_ROW_LEN BLOCKS LAST_ANALYZED
---------- ----------- ---------- -------------
SQL> select b.blevel,b.leaf_blocks,b.distinct_keys,b.last_analyzed from user_indexes b where b.table_name='T';
BLEVEL LEAF_BLOCKS DISTINCT_KEYS LAST_ANALYZED
---------- ----------- ------------- -------------
这个才是正常的执行步骤。
3、收集统计信息
SQL> exec dbms_stats.gather_table_stats(user,'t',cascade => true);
PL/SQL procedure successfully completed
SQL> select a.NUM_ROWS,a.AVG_ROW_LEN,a.BLOCKS,a.LAST_ANALYZED from user_tables a where a.TABLE_NAME='T';
NUM_ROWS AVG_ROW_LEN BLOCKS LAST_ANALYZED
---------- ----------- ---------- -------------
50825 93 748 2011/12/9 16:
SQL> select b.blevel,b.leaf_blocks,b.distinct_keys,b.last_analyzed from user_indexes b where b.table_name='T';
BLEVEL LEAF_BLOCKS DISTINCT_KEYS LAST_ANALYZED
---------- ----------- ------------- -------------
1 181 50825 2011/12/9 16:
3、直方图
直方图是数据分析当中的一个内容,它对CBO的影响非常大。
dbms_stats包对段表分析有三个层次:
* 表自身分析
包括表中的行数、数据块数、行长等信息。
* 列的分析
包括列值的重复数、列上的空值、数据在列上的分布情况。
* 索引的分析
包括索引叶块的数量、索引的深度、索引的聚合因子等。
直方图单指的是第二项的最后一种(数据在列上的分布情况)。oracle做直方图的时候,会将要分析的列上的数据分成很多数量相同的部分,每一个部分称为一个bucket
这样CBO就非常容易的知道这个列上的数值的分布情况,对于数据分布非常倾斜的表,做直方图是非常有用的。
举例说明
SQL> drop table t;
Table dropped
SQL> create table t as select 1 id,a.OBJECT_NAME name from all_objects a;
Table created
SQL> update t set id=99 where rownum=1;
1 row updated
SQL> create index ind_t_id on t(id);
Index created
SQL> exec dbms_stats.gather_table_stats(user,'T',cascade => true);
PL/SQL procedure successfully completed
我们创建了一个表,它的id字段数值严重倾斜,除了一条99以外,其他全部是1。默认情况下dbms_stats会对所有的列做直方图分析,可以从这个视图上查到相关的信息:
SQL> desc user_histograms;
Name Type Nullable Default Comments
--------------------- -------------- -------- ------- -----------------------------------------
TABLE_NAME VARCHAR2(30) Y Table name
COLUMN_NAME VARCHAR2(4000) Y Column name or attribute of object column
ENDPOINT_NUMBER NUMBER Y Endpoint number
ENDPOINT_VALUE NUMBER Y Normalized endpoint value
ENDPOINT_ACTUAL_VALUE VARCHAR2(1000) Y Actual endpoint value
SQL> select a.TABLE_NAME,a.COLUMN_NAME,a.ENDPOINT_NUMBER,a.ENDPOINT_VALUE from user_histograms a where a.TABLE_NAME='T';
TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE
------------------------------ -------------------------------------------------------------------------------- --------------- --------------
T ID 0 1
T NAME 0 2.450356082873
T ID 1 1
T NAME 1 6.251116591181
1)测试id分别为1和99的时候,CBO产生的执行计划的情况
SQL> set autotrace trace
SQL> select * from t where id=1;
50204 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 50205 | 1323K| 67 (2)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 50205 | 1323K| 67 (2)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("ID"=1)
SQL> select * from t where id=99;
Execution Plan
----------------------------------------------------------
Plan hash value: 4182247035
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 27 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 27 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IND_T_ID | 1 | | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ID"=99)
2)现在我们将直方图信息删除,但保留表和索引的分析信息:
SQL> exec dbms_stats.delete_column_stats(user,'t','id');
PL/SQL procedure successfully completed
看看id字段的信息还在不在?
SQL> select a.TABLE_NAME,a.COLUMN_NAME,a.ENDPOINT_NUMBER,a.ENDPOINT_VALUE from user_histograms a where a.TABLE_NAME='T';
TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE
------------------------------ -------------------------------------------------------------------------------- --------------- --------------
T NAME 0 2.450356082873
T NAME 1 6.251116591181
从上面看出id列的统计信息已经不在了。再看看表和索引信息仍然存在。
SQL> select a.NUM_ROWS,a.AVG_ROW_LEN,a.BLOCKS,a.LAST_ANALYZED from user_tables a where a.TABLE_NAME='T';
NUM_ROWS AVG_ROW_LEN BLOCKS LAST_ANALYZED
---------- ----------- ---------- -------------
50205 27 236 2011/12/9 16:
SQL> select b.blevel,b.leaf_blocks,b.distinct_keys,b.last_analyzed from user_indexes b where b.table_name='T';
BLEVEL LEAF_BLOCKS DISTINCT_KEYS LAST_ANALYZED
---------- ----------- ------------- -------------
1 99 2 2011/12/9 16:
3)看看删除后的执行计划
SQL> select * from t where id=1;
50204 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 4182247035
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 502 | 13554 | 50 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 502 | 13554 | 50 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IND_T_ID | 201 | | 49 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ID"=1)
SQL> select * from t where id=99;
Execution Plan
----------------------------------------------------------
Plan hash value: 4182247035
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 502 | 13554 | 50 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 502 | 13554 | 50 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IND_T_ID | 201 | | 49 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ID"=99)
出问题了,都走索引了,到这里应该明白直方图的重要了吧。会导致CBO的统计信息不准确。
但是是不是所有列上建立直方图,个人觉得对于oltp系统来说最好全部都建,对于olap系统来说,考虑建,因为直方图也会消耗。