oracle 柱状图(histogram)
oracle中的柱状图是用于记录表中的数据分布质量情况的描述,当每次使用analyze或者dbms_stat包分析数据表及列后,该表的分布情况会呗保存在统计表
(user_tab_columns/user_histograms)里面,当多表连接时,CBO优化器会根据柱状图提供的信息评估多表连接时将产生的成本(cost)或技术(cardinality),决定是否使用该列的索引,当然,导致CBO不能选择最优执行计划的因素有多种情况,而柱状图只是协助CBO优化器选择最优的执行计划,在一个数据分布不均匀的表列上建立柱状图将有力地保证优化器做出正确合理的选择。
其他因素后面在进行探讨。 (直方图的使用不受索引的限制,可以在表的任何列上构建直方图)
1. 搜集柱状图
SQL> conn scott/tiger
Connected to Oracle Database 10g Enterprise Edition Release 10.2.0.1.0
Connected as scott
SQL> select table_name from user_tables;
TABLE_NAME
------------------------------
DEPT
EMP
BONUS
SALGRADE
SQL> exec dbms_stats.gather_table_stats(ownname=>'scott',tabname => 'dept',estimate_percent => null,method_opt => 'for all indexed columns',cascade => true);
PL/SQL procedure successfully completed
SQL> select column_name,density,num_buckets,histogram from user_tab_col_statistics where table_name='DEPT';
COLUMN_NAME DENSITY NUM_BUCKETS HISTOGRAM
------------------------------ ---------- ----------- ---------------
DEPTNO 0.125 4 HEIGHT BALANCED
DNAME 0.25 1 NONE
LOC 0.25 1 NONE
SQL> select * from dept;
DEPTNO DNAME LOC
------ -------------- -------------
10 ACCOUNTING NEW YORK
20 RESEARCH DALLAS
30 SALES CHICAGO
40 OPERATIONS BOSTON
SQL> select * from user_tab_histograms where table_name='DEPT';
TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE
------------------------------ -------------------------------------------- --------------- -----------------------------
DEPT DEPTNO 1 10
DEPT DEPTNO 2 20
DEPT DEPTNO 3 30
DEPT DEPTNO 4 40
DEPT DNAME 0 3.388635500875
DEPT LOC 0 3.443005050520
DEPT DNAME 1 4.322850386777
DEPT LOC 1 4.064055440899
8 rows selected
柱状图的搜集有三个参数,for columns SIZE
N的大小;SKEWONLY
在上面柱状图搜集中,histogram字段有三个值,NONE,FREQUENCY或者HEIGHT BALANCED
a. NONE:就是没有直方图
b. FREQUENCY: 当该列的distinct值数量<=bucket数量时,为此类型。在user_tab_histograms表中记录有相关的值
c. HEIGHT BALANCED:当该列的distinct值数量>bucket数量时,为此类型。
d. density字段值的含义 Density的含义是“密度”。DENSITY值是会影响CBO判断执行计划的
2. 并不是所有柱状图信息都有存在的必要,产生直方图的成本是很高的,频繁分析一个表,若该表数据量非常大,做一次分析可能导致严重的性能问题,但是那些列的柱状图应该存在呢,建
议如下:
a. 第一次收集统计信息时,设置method_opt=>FOR ALL COLUMNS SIZE 1,这意味删除所有列上的直方图。
b. 在测试阶段或者在真实生产环境中,在调优SQL的过程中,DBA将会逐渐得知每个需要直方图信息的字段,在这些字段上人工收集统计信息,method_opt=>FOR COLUMNS SIZE AUTO
[COLUMN_NAME]
c. 在每次数据分布有所变化的时候,更新统计信息,使用method_opt=>FOR ALL COLUMNS SIZE REPEAT,这样只会收集已经存在了直方图信息的字段。
重复2,3步骤,直到系统稳定。
3. 柱状图是如何影响执行计划的,下面通过示例来查看
SQL> show user
User is "colin"
SQL> drop table tmp_liuhc_1;
Table dropped
SQL> create table tmp_liuhc_1 as select * from dba_objects;
Table created
SQL> desc tmp_liuhc_1;
Name Type Nullable Default Comments
-------------- ------------- -------- ------- --------
OWNER VARCHAR2(30) Y
OBJECT_NAME VARCHAR2(128) Y
SUBOBJECT_NAME VARCHAR2(30) Y
OBJECT_ID NUMBER Y
DATA_OBJECT_ID NUMBER Y
OBJECT_TYPE VARCHAR2(19) Y
CREATED DATE Y
LAST_DDL_TIME DATE Y
TIMESTAMP VARCHAR2(19) Y
STATUS VARCHAR2(7) Y
TEMPORARY VARCHAR2(1) Y
GENERATED VARCHAR2(1) Y
SECONDARY VARCHAR2(1) Y
SQL> select owner,count(*) from tmp_liuhc_1 group by owner;
OWNER COUNT(*)
------------------------------ ----------
MDSYS 885
TSMSYS 3
DMSYS 189
LINK 3
PUBLIC 19987
OUTLN 8
CTXSYS 339
OLAPSYS 720
HR 34
SYSTEM 454
EXFSYS 281
SCOTT 6
DBSNMP 46
ORDSYS 1669
ORDPLUGINS 10
SYSMAN 1321
OE 127
PM 26
SH 306
XDB 682
OWNER COUNT(*)
------------------------------ ----------
IX 53
BI 8
SYS 22912
WMSYS 242
SI_INFORMTN_SCHEMA 8
COLIN 6
26 rows selected
SQL> create index idx_tmp_liuhc on tmp_liuhc_1(owner);
Index created
SQL> select sysdate from dual;
SYSDATE
-----------
2011-10-30
删除柱状图信息,bucket为1时,相当于一个普通分析,没有柱状图信息,执行计划按绝大多数ID字段内容来选择走索引,删除之后刷新shared_pool
SQL> exec dbms_stats.gather_table_stats(ownname => 'colin',tabname => 'tmp_liuhc_1',estimate_percent => null ,method_opt => 'for all columns size 1' ,cascade => true);
PL/SQL procedure successfully completed
SQL> alter system flush shared_pool;
System altered
SQL> select * from user_tab_col_statistics where column_name='OWNER';
TABLE_NAME COLUMN_NAME NUM_DISTINCT LOW_VALUE HIGH_VALUE
DENSITY NUM_NULLS NUM_BUCKETS LAST_ANALYZED SAMPLE_SIZE GLOBAL_STATS USER_STATS AVG_COL_LEN HISTOGRAM
------------------------------ ------------------------------ ------------ ---------------------------------------------------------------- --------------------------
-------------------------------------- ---------- ---------- ----------- ------------- ----------- ------------ ---------- ----------- ---------------
TMP_LIUHC_1 OWNER 26 4249 584442
0.03846153 0 1 2011-10-30 9: 50325 YES NO 6 NONE
SQL> explain plan for select * from tmp_liuhc_1 where owner='COLIN';
Explained
SQL> select * From table(Dbms_Xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 3774022813
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1936 | 175K| 57 (0
| 1 | TABLE ACCESS BY INDEX ROWID| TMP_LIUHC_1 | 1936 | 175K| 57 (0
|* 2 | INDEX RANGE SCAN | IDX_TMP_LIUHC | 1936 | | 5 (0
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OWNER"='COLIN')
14 rows selected
SQL> explain plan for select * from tmp_liuhc_1 where owner='SYS';
Explained
SQL> select * From table(Dbms_Xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 3774022813
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1936 | 175K| 57 (0
| 1 | TABLE ACCESS BY INDEX ROWID| TMP_LIUHC_1 | 1936 | 175K| 57 (0
|* 2 | INDEX RANGE SCAN | IDX_TMP_LIUHC | 1936 | | 5 (0
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OWNER"='SYS')
14 rows selected
SQL> exec dbms_stats.gather_table_stats(ownname => 'colin',tabname => 'tmp_liuhc_1',estimate_percent => null ,method_opt => 'for all columns size auto' ,cascade =>
true);
PL/SQL procedure successfully completed
SQL> select * from user_tab_col_statistics where column_name='OWNER';
TABLE_NAME COLUMN_NAME NUM_DISTINCT LOW_VALUE HIGH_VALUE
DENSITY NUM_NULLS NUM_BUCKETS LAST_ANALYZED SAMPLE_SIZE GLOBAL_STATS USER_STATS AVG_COL_LEN HISTOGRAM
------------------------------ ------------------------------ ------------ ---------------------------------------------------------------- --------------------------
-------------------------------------- ---------- ---------- ----------- ------------- ----------- ------------ ---------- ----------- ---------------
TMP_LIUHC_1 OWNER 26 4249 584442
9.93541977 0 26 2011-10-30 9: 50325 YES NO 6 FREQUENCY
SQL>
SQL> explain plan for select * from tmp_liuhc_1 where owner='COLIN';
Explained
SQL> select * From table(Dbms_Xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 3774022813
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6 | 558 | 2 (0
| 1 | TABLE ACCESS BY INDEX ROWID| TMP_LIUHC_1 | 6 | 558 | 2 (0
|* 2 | INDEX RANGE SCAN | IDX_TMP_LIUHC | 6 | | 1 (0
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OWNER"='COLIN')
14 rows selected
SQL> explain plan for select * from tmp_liuhc_1 where owner='SYS';
Explained
SQL> select * From table(Dbms_Xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1961695573
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 22912 | 2080K| 160 (2)| 00:00:02
|* 1 | TABLE ACCESS FULL| TMP_LIUHC_1 | 22912 | 2080K| 160 (2)| 00:00:02
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("OWNER"='SYS')
13 rows selected
从以上可以看出,当删除柱状图时,查询SYS用户时,CBO按大多数ID字段的内容,选择走索引;当搜集柱状图后,CBO选择了正确的执行计划,走全表扫描,因为前面已经查询了,SYS用户下的表占用了决大部分。
附录:附带两张表的解释信息,此处的表DBA_TAB_COLUMNS和表user_tab_col_statistics是同样效果