dbms_stats和analyze的选择(一)
dbms_stats在很久以前ORACLE就推出了,
那么就意味着我们不使用analyze来分析吗?
其实.analyze还是有着它存在的理由.
我们的客户常常会问:我到底使用哪一个分析我的表呢?
很多人使用analyze命令的方式仅限于compute和estimate以及生成一些直方图.然后把生成的命令置于job以.以便定期作表分析.
对于选择这两者之一将很大程度上影响你收集的准确性以及效果.
这里我们选择oracle 9i为例
我们首先来看一下dbms_stats这包.这个包下面一共有40多个存储过程.通常使用的比较多的主要是DBMS_STATS.GATHER_TABLE_STATS 和 DBMS_STATS.GATHER_INDEX_STATS,
来看一下这两个过程的参数:
PROCEDURE GATHER_TABLE_STATS
参数名称 类型 输入/输出默认值?
------------------------------ ----------------------- ------ --------
OWNNAME VARCHAR2 IN
TABNAME VARCHAR2 IN
PARTNAME VARCHAR2 IN DEFAULT
ESTIMATE_PERCENT NUMBER IN DEFAULT
BLOCK_SAMPLE BOOLEAN IN DEFAULT
METHOD_OPT VARCHAR2 IN DEFAULT
DEGREE NUMBER IN DEFAULT
GRANULARITY VARCHAR2 IN DEFAULT
CASCADE BOOLEAN IN DEFAULT
STATTAB VARCHAR2 IN DEFAULT
STATID VARCHAR2 IN DEFAULT
STATOWN VARCHAR2 IN DEFAULT
NO_INVALIDATE BOOLEAN IN DEFAULT
PROCEDURE GATHER_INDEX_STATS
参数名称 类型 输入/输出默认值
------------------------------ ----------------------- ------ --------
OWNNAME VARCHAR2 IN
INDNAME VARCHAR2 IN
PARTNAME VARCHAR2 IN DEFAULT
ESTIMATE_PERCENT NUMBER IN DEFAULT
STATTAB VARCHAR2 IN DEFAULT
STATID VARCHAR2 IN DEFAULT
STATOWN VARCHAR2 IN DEFAULT
DEGREE NUMBER IN DEFAULT
GRANULARITY VARCHAR2 IN DEFAULT
NO_INVALIDATE BOOLEAN IN DEFAULT
estimate_percent:这个参数相当于analyze中的"estimate statistics sample x percent".为总行数的百分比来估算.如果该参数为空.则为compute
当然也可以使用DBMS_STATS.AUTO_SAMPLE_SIZE 让oracle 来决定估算百分比数
block_sample:该参数是一个boolean.在决定是否进行随机采样.
method_opt :该参数表明数据颁布不均衡的情况下是否使用直方图.可选值为"for all columns 或者for all indexed columns.
当然也可以使用auto.让oracle来决定收集
cascade :决定是否收集相关表的索引的统计信息.
我们来看看不同的值下产生的性能:
首先来看表结构:
SQL> desc test;
Name Type Nullable Default Comments
---- ------------ -------- ------- --------
ID NUMBER
NAME VARCHAR2(20) Y
SEX VARCHAR2(2) Y
SQL>
其中这张表有20480001条数据. 78208个块, 611MB大小.
索引创建情况:
SQL> /
INDEX_NAME UNIQUENES DISTINCT_KEYS SAMPLE_SIZE
------------------------------ --------- ------------- -----------
B_IDX NONUNIQUE <---bitmap(sex列)
II UNIQUE 5120000 5120000 <---btree(id 列)
我们使用analyze的estimate 来分析表
SQL> analyze table test estimate statistics;
表已分析。
已用时间: 00: 00: 01.00
然后执行下面这个查询:
SQL> select * from test where id=1111111111111111;
未选定行
已用时间: 00: 00: 00.01
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=3 Card=1 Bytes=9)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST' (Cost=3 Card=1 Bytes=9)
2 1 INDEX (UNIQUE SCAN) OF 'II' (UNIQUE) (Cost=2 Card=5120000)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
3 consistent gets
3 physical reads
0 redo size
318 bytes sent via SQL*Net to client
372 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
0 rows processed
SQL> @index
TABLE_NAME UNIQUENESS INDEX_ COLUMN_N DISTINCT_KEYS SAMPLE_SIZE
---------- ---------- ------ -------- ------------- -----------
TEST NONUNIQUE B_IDX SEX
TEST UNIQUE II ID 5120000 5120000
因为id是一个主键.而这里distinct_keys却只是5120000.而真实的情况却是20480001.实际情况差得很远.
我们再看一下DBA_TAB_COLUMNS
SQL> @d:col
TABLE_NAME COLUMN_N DATA_TYPE NUM_DISTINCT SAMPLE_SIZE LAST_ANALYZED BUCKETS
---------- -------- --------------- ------------ ----------- ------------- ----------
TEST ID NUMBER 5120000 5120000 16:17:16 1
TEST NAME VARCHAR2 1 5120000 16:17:16 1
TEST SEX VARCHAR2
继续进行采样:
SQL> analyze table test estimate statistics sample 5 percent;
表已分析。
SQL> select * from test where id=11111111111111;
未选定行
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=3 Card=1 Bytes=9)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST' (Cost=3 Card=1 Bytes=9)
2 1 INDEX (UNIQUE SCAN) OF 'II' (UNIQUE) (Cost=2 Card=5120000)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
3 consistent gets
0 physical reads
0 redo size
318 bytes sent via SQL*Net to client
372 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
0 rows processed
SQL>
同样的:
SQL> @index
TABLE_NAME UNIQUENESS INDEX_ COLUMN_N DISTINCT_KEYS SAMPLE_SIZE
---------- ---------- ------ -------- ------------- -----------
TEST NONUNIQUE B_IDX SEX
TEST UNIQUE II ID 5120000 5120000
SQL> @col
TABLE_NAME COLUMN_N DATA_TYPE NUM_DISTINCT SAMPLE_SIZE LAST_ANALYZED BUCKETS
---------- -------- --------------- ------------ ----------- ------------- ----------
TEST ID NUMBER 5120000 5120000 16:17:16 1
TEST NAME VARCHAR2 1 5120000 16:17:16 1
TEST SEX VARCHAR2
没有什么变化.与开始是一样的;
SQL> analyze table test compute statistics;
表已分析。
这个过程花了30多分钟.
再来看分析结果.
SQL> @index
TABLE_NAME UNIQUENESS INDEX_ COLUMN_N DISTINCT_KEYS SAMPLE_SIZE
---------- ---------- ------ -------- ------------- -----------
TEST NONUNIQUE B_IDX SEX 2 834
TEST UNIQUE II ID 20480001 20480001
SQL> @col
TABLE_NAME COLUMN_N DATA_TYPE NUM_DISTINCT SAMPLE_SIZE LAST_ANALYZED BUCKETS
---------- -------- --------------- ------------ ----------- ------------- ----------
TEST ID NUMBER 20480001 20480001 00:18:53 1
TEST NAME VARCHAR2 2 20480001 00:18:53 1
TEST SEX VARCHAR2 2 20480001 00:18:53 1
这一次是完全匹配了.
我们先将原来的统计结果删除:
SQL> analyze table test delete statistics;
Table analyzed
SQL> @col
TABLE_NAME COLUMN_N DATA_TYPE NUM_DISTINCT SAMPLE_SIZE LAST_ANALYZED BUCKETS
---------- -------- --------------- ------------ ----------- ------------- ----------
TEST ID NUMBER
TEST NAME VARCHAR2
TEST SEX VARCHAR2
SQL> @index
TABLE_NAME UNIQUENESS INDEX_ COLUMN_N DISTINCT_KEYS SAMPLE_SIZE
---------- ---------- ------ -------- ------------- -----------
TEST NONUNIQUE B_IDX SEX
TEST UNIQUE II ID
目前没有任何统计信息.
再来看看dbms_stats,先按1%来进行取样收集
SQL> EXECUTE dbms_stats.gather_table_stats (ownname=>'TEST', tabname=>'TEST',estimate_percent=>1,cascade=>true)
PL/SQL procedure successfully completed
SQL> @col
TABLE_NAME COLUMN_N DATA_TYPE NUM_DISTINCT SAMPLE_SIZE LAST_ANALYZED BUCKETS
---------- -------- --------------- ------------ ----------- ------------- ----------
TEST ID NUMBER 20559400 205594 01:36:50 1
TEST NAME VARCHAR2 1 205594 01:36:50 1
TEST SEX VARCHAR2 2 205594 01:36:50 1
SQL>
SQL> @index
TABLE_NAME UNIQUENESS INDEX_ COLUMN_N DISTINCT_KEYS SAMPLE_SIZE
---------- ---------- ------ -------- ------------- -----------
TEST NONUNIQUE B_IDX SEX 2 834
TEST UNIQUE II ID 20585910 2058591
SQL>
统计结果是20585910,而正确的应该是20480001,但相差得并不多.
再来按5%进行取样:
SQL> EXECUTE dbms_stats.gather_table_stats (ownname=>'TEST', tabname=>'TEST',estimate_percent=>5,cascade=>true)
PL/SQL procedure successfully completed
SQL> @col
TABLE_NAME COLUMN_N DATA_TYPE NUM_DISTINCT SAMPLE_SIZE LAST_ANALYZED BUCKETS
---------- -------- --------------- ------------ ----------- ------------- ----------
TEST ID NUMBER 20481740 1024087 02:37:35 1
TEST NAME VARCHAR2 1 1024087 02:37:35 1
TEST SEX VARCHAR2 2 1024087 02:37:35 1
SQL> @index
TABLE_NAME UNIQUENESS INDEX_ COLUMN_N DISTINCT_KEYS SAMPLE_SIZE
---------- ---------- ------ -------- ------------- -----------
TEST NONUNIQUE B_IDX SEX 2 834
TEST UNIQUE II ID 19941800 997090
SQL>
20481740,已经比开始接近准确的信息了.
SQL> EXECUTE dbms_stats.gather_table_stats (ownname=>'TEST', tabname=>'TEST',estimate_percent=>null,cascade=>true)
PL/SQL procedure successfully completed
SQL> @col
TABLE_NAME COLUMN_N DATA_TYPE NUM_DISTINCT SAMPLE_SIZE LAST_ANALYZED BUCKETS
---------- -------- --------------- ------------ ----------- ------------- ----------
TEST ID NUMBER 20480001 20480001 02:41:59 1
TEST NAME VARCHAR2 2 20480001 02:41:59 1
TEST SEX VARCHAR2 2 20480001 02:41:59 1
SQL> @index
TABLE_NAME UNIQUENESS INDEX_ COLUMN_N DISTINCT_KEYS SAMPLE_SIZE
---------- ---------- ------ -------- ------------- -----------
TEST NONUNIQUE B_IDX SEX 2 834
TEST UNIQUE II ID 20480001 20480001
from:http://cc59.itpub.net/post/1845/210124