dbms_stats和analyze的选择(一)

dbms_stats和analyze的选择(一)

dbms_stats在很久以前ORACLE就推出了,
那么就意味着我们不使用analyze来分析吗?
其实.analyze还是有着它存在的理由.
我们的客户常常会问:我到底使用哪一个分析我的表呢?
很多人使用analyze命令的方式仅限于compute和estimate以及生成一些直方图.然后把生成的命令置于job以.以便定期作表分析.
对于选择这两者之一将很大程度上影响你收集的准确性以及效果.

这里我们选择oracle 9i为例


我们首先来看一下dbms_stats这包.这个包下面一共有40多个存储过程.通常使用的比较多的主要是DBMS_STATS.GATHER_TABLE_STATS 和 DBMS_STATS.GATHER_INDEX_STATS,
来看一下这两个过程的参数:


PROCEDURE GATHER_TABLE_STATS
参数名称 类型 输入/输出默认值?
------------------------------ ----------------------- ------ --------
OWNNAME VARCHAR2 IN
TABNAME VARCHAR2 IN
PARTNAME VARCHAR2 IN DEFAULT
ESTIMATE_PERCENT NUMBER IN DEFAULT
BLOCK_SAMPLE BOOLEAN IN DEFAULT
METHOD_OPT VARCHAR2 IN DEFAULT
DEGREE NUMBER IN DEFAULT
GRANULARITY VARCHAR2 IN DEFAULT
CASCADE BOOLEAN IN DEFAULT
STATTAB VARCHAR2 IN DEFAULT
STATID VARCHAR2 IN DEFAULT
STATOWN VARCHAR2 IN DEFAULT
NO_INVALIDATE BOOLEAN IN DEFAULT


PROCEDURE GATHER_INDEX_STATS
参数名称 类型 输入/输出默认值
------------------------------ ----------------------- ------ --------
OWNNAME VARCHAR2 IN
INDNAME VARCHAR2 IN
PARTNAME VARCHAR2 IN DEFAULT
ESTIMATE_PERCENT NUMBER IN DEFAULT
STATTAB VARCHAR2 IN DEFAULT
STATID VARCHAR2 IN DEFAULT
STATOWN VARCHAR2 IN DEFAULT
DEGREE NUMBER IN DEFAULT
GRANULARITY VARCHAR2 IN DEFAULT
NO_INVALIDATE BOOLEAN IN DEFAULT


estimate_percent:这个参数相当于analyze中的"estimate statistics sample x percent".为总行数的百分比来估算.如果该参数为空.则为compute
当然也可以使用DBMS_STATS.AUTO_SAMPLE_SIZE 让oracle 来决定估算百分比数


block_sample:该参数是一个boolean.在决定是否进行随机采样.

method_opt :该参数表明数据颁布不均衡的情况下是否使用直方图.可选值为"for all columns 或者for all indexed columns.
当然也可以使用auto.让oracle来决定收集

cascade :决定是否收集相关表的索引的统计信息.

我们来看看不同的值下产生的性能:


首先来看表结构:

SQL> desc test;
Name Type Nullable Default Comments
---- ------------ -------- ------- --------
ID NUMBER
NAME VARCHAR2(20) Y
SEX VARCHAR2(2) Y

SQL>

其中这张表有20480001条数据. 78208个块, 611MB大小.
索引创建情况:
SQL> /

INDEX_NAME UNIQUENES DISTINCT_KEYS SAMPLE_SIZE
------------------------------ --------- ------------- -----------
B_IDX NONUNIQUE <---bitmap(sex列)
II UNIQUE 5120000 5120000 <---btree(id 列)

我们使用analyze的estimate 来分析表


SQL> analyze table test estimate statistics;

表已分析。

已用时间: 00: 00: 01.00

然后执行下面这个查询:
SQL> select * from test where id=1111111111111111;

未选定行

已用时间: 00: 00: 00.01

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=3 Card=1 Bytes=9)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST' (Cost=3 Card=1 Bytes=9)
2 1 INDEX (UNIQUE SCAN) OF 'II' (UNIQUE) (Cost=2 Card=5120000)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
3 consistent gets
3 physical reads
0 redo size
318 bytes sent via SQL*Net to client
372 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
0 rows processed


SQL> @index

TABLE_NAME UNIQUENESS INDEX_ COLUMN_N DISTINCT_KEYS SAMPLE_SIZE
---------- ---------- ------ -------- ------------- -----------
TEST NONUNIQUE B_IDX SEX
TEST UNIQUE II ID 5120000 5120000

因为id是一个主键.而这里distinct_keys却只是5120000.而真实的情况却是20480001.实际情况差得很远.

我们再看一下DBA_TAB_COLUMNS

SQL> @d:col

TABLE_NAME COLUMN_N DATA_TYPE NUM_DISTINCT SAMPLE_SIZE LAST_ANALYZED BUCKETS
---------- -------- --------------- ------------ ----------- ------------- ----------
TEST ID NUMBER 5120000 5120000 16:17:16 1
TEST NAME VARCHAR2 1 5120000 16:17:16 1
TEST SEX VARCHAR2

继续进行采样:

SQL> analyze table test estimate statistics sample 5 percent;

表已分析。


SQL> select * from test where id=11111111111111;


未选定行


Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=3 Card=1 Bytes=9)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST' (Cost=3 Card=1 Bytes=9)

2 1 INDEX (UNIQUE SCAN) OF 'II' (UNIQUE) (Cost=2 Card=5120000)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
3 consistent gets
0 physical reads
0 redo size
318 bytes sent via SQL*Net to client
372 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
0 rows processed

SQL>

同样的:
SQL> @index

TABLE_NAME UNIQUENESS INDEX_ COLUMN_N DISTINCT_KEYS SAMPLE_SIZE
---------- ---------- ------ -------- ------------- -----------
TEST NONUNIQUE B_IDX SEX
TEST UNIQUE II ID 5120000 5120000

SQL> @col

TABLE_NAME COLUMN_N DATA_TYPE NUM_DISTINCT SAMPLE_SIZE LAST_ANALYZED BUCKETS
---------- -------- --------------- ------------ ----------- ------------- ----------
TEST ID NUMBER 5120000 5120000 16:17:16 1
TEST NAME VARCHAR2 1 5120000 16:17:16 1
TEST SEX VARCHAR2

没有什么变化.与开始是一样的;

SQL> analyze table test compute statistics;

表已分析。

这个过程花了30多分钟.

再来看分析结果.
SQL> @index

TABLE_NAME UNIQUENESS INDEX_ COLUMN_N DISTINCT_KEYS SAMPLE_SIZE
---------- ---------- ------ -------- ------------- -----------
TEST NONUNIQUE B_IDX SEX 2 834
TEST UNIQUE II ID 20480001 20480001

SQL> @col

TABLE_NAME COLUMN_N DATA_TYPE NUM_DISTINCT SAMPLE_SIZE LAST_ANALYZED BUCKETS
---------- -------- --------------- ------------ ----------- ------------- ----------
TEST ID NUMBER 20480001 20480001 00:18:53 1
TEST NAME VARCHAR2 2 20480001 00:18:53 1
TEST SEX VARCHAR2 2 20480001 00:18:53 1
这一次是完全匹配了.

我们先将原来的统计结果删除:
SQL> analyze table test delete statistics;

Table analyzed

SQL> @col

TABLE_NAME COLUMN_N DATA_TYPE NUM_DISTINCT SAMPLE_SIZE LAST_ANALYZED BUCKETS
---------- -------- --------------- ------------ ----------- ------------- ----------
TEST ID NUMBER
TEST NAME VARCHAR2
TEST SEX VARCHAR2

SQL> @index

TABLE_NAME UNIQUENESS INDEX_ COLUMN_N DISTINCT_KEYS SAMPLE_SIZE
---------- ---------- ------ -------- ------------- -----------
TEST NONUNIQUE B_IDX SEX
TEST UNIQUE II ID


目前没有任何统计信息.


再来看看dbms_stats,先按1%来进行取样收集

SQL> EXECUTE dbms_stats.gather_table_stats (ownname=>'TEST', tabname=>'TEST',estimate_percent=>1,cascade=>true)

PL/SQL procedure successfully completed


SQL> @col

TABLE_NAME COLUMN_N DATA_TYPE NUM_DISTINCT SAMPLE_SIZE LAST_ANALYZED BUCKETS
---------- -------- --------------- ------------ ----------- ------------- ----------
TEST ID NUMBER 20559400 205594 01:36:50 1
TEST NAME VARCHAR2 1 205594 01:36:50 1
TEST SEX VARCHAR2 2 205594 01:36:50 1

SQL>

SQL> @index

TABLE_NAME UNIQUENESS INDEX_ COLUMN_N DISTINCT_KEYS SAMPLE_SIZE
---------- ---------- ------ -------- ------------- -----------
TEST NONUNIQUE B_IDX SEX 2 834
TEST UNIQUE II ID 20585910 2058591

SQL>

统计结果是20585910,而正确的应该是20480001,但相差得并不多.


再来按5%进行取样:


SQL> EXECUTE dbms_stats.gather_table_stats (ownname=>'TEST', tabname=>'TEST',estimate_percent=>5,cascade=>true)

PL/SQL procedure successfully completed

SQL> @col

TABLE_NAME COLUMN_N DATA_TYPE NUM_DISTINCT SAMPLE_SIZE LAST_ANALYZED BUCKETS
---------- -------- --------------- ------------ ----------- ------------- ----------
TEST ID NUMBER 20481740 1024087 02:37:35 1
TEST NAME VARCHAR2 1 1024087 02:37:35 1
TEST SEX VARCHAR2 2 1024087 02:37:35 1

SQL> @index

TABLE_NAME UNIQUENESS INDEX_ COLUMN_N DISTINCT_KEYS SAMPLE_SIZE
---------- ---------- ------ -------- ------------- -----------
TEST NONUNIQUE B_IDX SEX 2 834
TEST UNIQUE II ID 19941800 997090

SQL>

20481740,已经比开始接近准确的信息了.

SQL> EXECUTE dbms_stats.gather_table_stats (ownname=>'TEST', tabname=>'TEST',estimate_percent=>null,cascade=>true)

PL/SQL procedure successfully completed

SQL> @col

TABLE_NAME COLUMN_N DATA_TYPE NUM_DISTINCT SAMPLE_SIZE LAST_ANALYZED BUCKETS
---------- -------- --------------- ------------ ----------- ------------- ----------
TEST ID NUMBER 20480001 20480001 02:41:59 1
TEST NAME VARCHAR2 2 20480001 02:41:59 1
TEST SEX VARCHAR2 2 20480001 02:41:59 1

SQL> @index

TABLE_NAME UNIQUENESS INDEX_ COLUMN_N DISTINCT_KEYS SAMPLE_SIZE
---------- ---------- ------ -------- ------------- -----------
TEST NONUNIQUE B_IDX SEX 2 834
TEST UNIQUE II ID 20480001 20480001

from:http://cc59.itpub.net/post/1845/210124

你可能感兴趣的:(DB)