收集统计信息(一)

分析和收集统计信息
收集统计信息对于CBO来说是非常重要的,我们通过下面的列子来说明
1、创建测试环境
1)创建测试表
SQL> create table t as select * from dba_objects;
 
Table created
2)创建索引

SQL> create index ind_t_id on t(object_id);
 
Index created

3)查看表相关信息
SQL> select a.NUM_ROWS,a.AVG_ROW_LEN,a.BLOCKS,a.LAST_ANALYZED from user_tables a where a.TABLE_NAME='T';
 
  NUM_ROWS AVG_ROW_LEN     BLOCKS LAST_ANALYZED
---------- ----------- ---------- -------------
4)查看表相关信息
SQL> select b.blevel,b.leaf_blocks,b.distinct_keys,b.last_analyzed  from user_indexes b where b.table_name='T';
 
    BLEVEL LEAF_BLOCKS DISTINCT_KEYS LAST_ANALYZED
---------- ----------- ------------- -------------
         1         112         50824 2011/12/9 16:
神马情况,怎么索引里面看到东西呢,再来一次,还是分析了,我去。再试一种方案

2、创建测试表

1)创建测试环境

SQL> drop table t;
 
Table dropped
 
SQL> create table t as select * from dba_objects where 1=2;
 
Table created
 
SQL> create index ind_t_id on t(object_id);
 
Index created
 
SQL> insert into t select * from dba_objects;
 
50825 rows inserted
 
SQL> commit;
 
Commit complete

2)查看相关信息

SQL> select a.NUM_ROWS,a.AVG_ROW_LEN,a.BLOCKS,a.LAST_ANALYZED from user_tables a where a.TABLE_NAME='T';
 
  NUM_ROWS AVG_ROW_LEN     BLOCKS LAST_ANALYZED
---------- ----------- ---------- -------------
 
SQL> select b.blevel,b.leaf_blocks,b.distinct_keys,b.last_analyzed  from user_indexes b where b.table_name='T';
 
    BLEVEL LEAF_BLOCKS DISTINCT_KEYS LAST_ANALYZED
---------- ----------- ------------- -------------
 这个才是正常的执行步骤。
 
 3、收集统计信息
 SQL> exec dbms_stats.gather_table_stats(user,'t',cascade => true);
 
PL/SQL procedure successfully completed

SQL> select a.NUM_ROWS,a.AVG_ROW_LEN,a.BLOCKS,a.LAST_ANALYZED from user_tables a where a.TABLE_NAME='T';
 
  NUM_ROWS AVG_ROW_LEN     BLOCKS LAST_ANALYZED
---------- ----------- ---------- -------------
     50825          93        748 2011/12/9 16:
 
SQL> select b.blevel,b.leaf_blocks,b.distinct_keys,b.last_analyzed  from user_indexes b where b.table_name='T';
 
    BLEVEL LEAF_BLOCKS DISTINCT_KEYS LAST_ANALYZED
---------- ----------- ------------- -------------
         1         181         50825 2011/12/9 16:
 
 3、直方图
 直方图是数据分析当中的一个内容,它对CBO的影响非常大。
 dbms_stats包对段表分析有三个层次:
 * 表自身分析
    包括表中的行数、数据块数、行长等信息。
 * 列的分析
    包括列值的重复数、列上的空值、数据在列上的分布情况。
 * 索引的分析
    包括索引叶块的数量、索引的深度、索引的聚合因子等。
 直方图单指的是第二项的最后一种(数据在列上的分布情况)。oracle做直方图的时候,会将要分析的列上的数据分成很多数量相同的部分,每一个部分称为一个bucket
 这样CBO就非常容易的知道这个列上的数值的分布情况,对于数据分布非常倾斜的表,做直方图是非常有用的。
 举例说明

SQL> drop table t;
 
Table dropped
 
SQL> create table t as select 1 id,a.OBJECT_NAME name from all_objects a;
 
Table created
 
SQL> update t set id=99 where rownum=1;
 
1 row updated
 
SQL> create index ind_t_id on t(id);
 
Index created
 
SQL> exec dbms_stats.gather_table_stats(user,'T',cascade => true);
 
PL/SQL procedure successfully completed

我们创建了一个表,它的id字段数值严重倾斜,除了一条99以外,其他全部是1。默认情况下dbms_stats会对所有的列做直方图分析,可以从这个视图上查到相关的信息:
SQL> desc user_histograms;
Name                  Type           Nullable Default Comments                                  
--------------------- -------------- -------- ------- -----------------------------------------
TABLE_NAME            VARCHAR2(30)   Y                Table name                                
COLUMN_NAME           VARCHAR2(4000) Y                Column name or attribute of object column
ENDPOINT_NUMBER       NUMBER         Y                Endpoint number                           
ENDPOINT_VALUE        NUMBER         Y                Normalized endpoint value                 
ENDPOINT_ACTUAL_VALUE VARCHAR2(1000) Y                Actual endpoint value   

SQL> select a.TABLE_NAME,a.COLUMN_NAME,a.ENDPOINT_NUMBER,a.ENDPOINT_VALUE  from user_histograms a where a.TABLE_NAME='T';
 
TABLE_NAME                     COLUMN_NAME                                                                      ENDPOINT_NUMBER ENDPOINT_VALUE
------------------------------ -------------------------------------------------------------------------------- --------------- --------------
T                              ID                                                                                             0              1
T                              NAME                                                                                           0 2.450356082873
T                              ID                                                                                             1              1
T                              NAME                                                                                           1 6.251116591181

1)测试id分别为1和99的时候,CBO产生的执行计划的情况
SQL> set autotrace trace
SQL> select * from t where id=1;

50204 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 50205 |  1323K|    67   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    | 50205 |  1323K|    67   (2)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("ID"=1)
   
   
SQL> select * from t where id=99;


Execution Plan
----------------------------------------------------------
Plan hash value: 4182247035

----------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |     1 |    27 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T        |     1 |    27 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T_ID |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID"=99)

2)现在我们将直方图信息删除,但保留表和索引的分析信息:
SQL> exec dbms_stats.delete_column_stats(user,'t','id');
 
PL/SQL procedure successfully completed

看看id字段的信息还在不在?

SQL> select a.TABLE_NAME,a.COLUMN_NAME,a.ENDPOINT_NUMBER,a.ENDPOINT_VALUE  from user_histograms a where a.TABLE_NAME='T';
 
TABLE_NAME                     COLUMN_NAME                                                                      ENDPOINT_NUMBER ENDPOINT_VALUE
------------------------------ -------------------------------------------------------------------------------- --------------- --------------
T                              NAME                                                                                           0 2.450356082873
T                              NAME                                                                                           1 6.251116591181

从上面看出id列的统计信息已经不在了。再看看表和索引信息仍然存在。
 
SQL> select a.NUM_ROWS,a.AVG_ROW_LEN,a.BLOCKS,a.LAST_ANALYZED from user_tables a where a.TABLE_NAME='T';
 
  NUM_ROWS AVG_ROW_LEN     BLOCKS LAST_ANALYZED
---------- ----------- ---------- -------------
     50205          27        236 2011/12/9 16:
 
SQL> select b.blevel,b.leaf_blocks,b.distinct_keys,b.last_analyzed  from user_indexes b where b.table_name='T';
 
    BLEVEL LEAF_BLOCKS DISTINCT_KEYS LAST_ANALYZED
---------- ----------- ------------- -------------
         1          99             2 2011/12/9 16:
 

3)看看删除后的执行计划
SQL> select * from t where id=1;

50204 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 4182247035

----------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |   502 | 13554 |    50   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T        |   502 | 13554 |    50   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T_ID |   201 |       |    49   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID"=1)

SQL> select * from t where id=99;


Execution Plan
----------------------------------------------------------
Plan hash value: 4182247035

----------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |   502 | 13554 |    50   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T        |   502 | 13554 |    50   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T_ID |   201 |       |    49   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID"=99)
出问题了,都走索引了,到这里应该明白直方图的重要了吧。会导致CBO的统计信息不准确。
但是是不是所有列上建立直方图,个人觉得对于oltp系统来说最好全部都建,对于olap系统来说,考虑建,因为直方图也会消耗。
 

你可能感兴趣的:(收集统计信息(一))