ORACLE柱状图(histogram)

 

oracle 柱状图(histogram)

oracle中的柱状图是用于记录表中的数据分布质量情况的描述,当每次使用analyze或者dbms_stat包分析数据表及列后,该表的分布情况会呗保存在统计表

(user_tab_columns/user_histograms)里面,当多表连接时,CBO优化器会根据柱状图提供的信息评估多表连接时将产生的成本(cost)或技术(cardinality),决定是否使用该列的索引,当然,导致CBO不能选择最优执行计划的因素有多种情况,而柱状图只是协助CBO优化器选择最优的执行计划,在一个数据分布不均匀的表列上建立柱状图将有力地保证优化器做出正确合理的选择。

其他因素后面在进行探讨。 (直方图的使用不受索引的限制,可以在表的任何列上构建直方图)

1. 搜集柱状图

SQL> conn scott/tiger
Connected to Oracle Database 10g Enterprise Edition Release 10.2.0.1.0
Connected as scott
 
SQL> select table_name from user_tables;
 
TABLE_NAME
------------------------------
DEPT
EMP
BONUS
SALGRADE
 
SQL> exec dbms_stats.gather_table_stats(ownname=>'scott',tabname => 'dept',estimate_percent => null,method_opt => 'for all indexed columns',cascade => true);
 
PL/SQL procedure successfully completed
 
SQL> select column_name,density,num_buckets,histogram from user_tab_col_statistics where table_name='DEPT';
 
COLUMN_NAME                       DENSITY NUM_BUCKETS HISTOGRAM
------------------------------ ---------- ----------- ---------------
DEPTNO                              0.125           4 HEIGHT BALANCED
DNAME                                0.25           1 NONE
LOC                                  0.25           1 NONE

SQL> select * from dept;
 
DEPTNO DNAME          LOC
------ -------------- -------------
    10 ACCOUNTING     NEW YORK
    20 RESEARCH       DALLAS
    30 SALES          CHICAGO
    40 OPERATIONS     BOSTON

SQL> select * from user_tab_histograms where table_name='DEPT';
 
TABLE_NAME                     COLUMN_NAME                 ENDPOINT_NUMBER     ENDPOINT_VALUE   ENDPOINT_ACTUAL_VALUE
------------------------------ --------------------------------------------  ---------------   -----------------------------
DEPT                           DEPTNO                             1             10
DEPT                           DEPTNO                             2             20
DEPT                           DEPTNO                             3             30
DEPT                           DEPTNO                             4             40
DEPT                           DNAME                              0    3.388635500875
DEPT                           LOC                                0   3.443005050520
DEPT                           DNAME                              1   4.322850386777
DEPT                           LOC                                1   4.064055440899
  
8 rows selected

柱状图的搜集有三个参数,for columns SIZE | REPEAT | AUTO | SKEWONLY:统计指定列的histograms.N的取值范围[1,254]; REPEAT上次统计过的histograms;AUTO由oracle决定

N的大小;SKEWONLY

在上面柱状图搜集中,histogram字段有三个值,NONE,FREQUENCY或者HEIGHT BALANCED
  a. NONE:就是没有直方图
  b. FREQUENCY: 当该列的distinct值数量<=bucket数量时,为此类型。在user_tab_histograms表中记录有相关的值
  c. HEIGHT BALANCED:当该列的distinct值数量>bucket数量时,为此类型。
  d. density字段值的含义 Density的含义是“密度”。DENSITY值是会影响CBO判断执行计划的


2. 并不是所有柱状图信息都有存在的必要,产生直方图的成本是很高的,频繁分析一个表,若该表数据量非常大,做一次分析可能导致严重的性能问题,但是那些列的柱状图应该存在呢,建

议如下:

  a. 第一次收集统计信息时,设置method_opt=>FOR ALL COLUMNS SIZE 1,这意味删除所有列上的直方图。
  b. 在测试阶段或者在真实生产环境中,在调优SQL的过程中,DBA将会逐渐得知每个需要直方图信息的字段,在这些字段上人工收集统计信息,method_opt=>FOR COLUMNS SIZE AUTO

[COLUMN_NAME]
  c. 在每次数据分布有所变化的时候,更新统计信息,使用method_opt=>FOR ALL COLUMNS SIZE REPEAT,这样只会收集已经存在了直方图信息的字段。
重复2,3步骤,直到系统稳定。

3. 柱状图是如何影响执行计划的,下面通过示例来查看

SQL> show user
User is "colin"
 
SQL> drop table tmp_liuhc_1;
 
Table dropped

SQL> create table tmp_liuhc_1 as select * from dba_objects;
 
Table created
 
SQL> desc tmp_liuhc_1;
Name           Type          Nullable Default Comments
-------------- ------------- -------- ------- --------
OWNER          VARCHAR2(30)  Y                        
OBJECT_NAME    VARCHAR2(128) Y                        
SUBOBJECT_NAME VARCHAR2(30)  Y                        
OBJECT_ID      NUMBER        Y                        
DATA_OBJECT_ID NUMBER        Y                        
OBJECT_TYPE    VARCHAR2(19)  Y                        
CREATED        DATE          Y                        
LAST_DDL_TIME  DATE          Y                        
TIMESTAMP      VARCHAR2(19)  Y                        
STATUS         VARCHAR2(7)   Y                        
TEMPORARY      VARCHAR2(1)   Y                        
GENERATED      VARCHAR2(1)   Y                        
SECONDARY      VARCHAR2(1)   Y                        
 
SQL> select owner,count(*) from tmp_liuhc_1 group by owner;
 
OWNER                            COUNT(*)
------------------------------ ----------
MDSYS                                 885
TSMSYS                                  3
DMSYS                                 189
LINK                                    3
PUBLIC                              19987
OUTLN                                   8
CTXSYS                                339
OLAPSYS                               720
HR                                     34
SYSTEM                                454
EXFSYS                                281
SCOTT                                   6
DBSNMP                                 46
ORDSYS                               1669
ORDPLUGINS                             10
SYSMAN                               1321
OE                                    127
PM                                     26
SH                                    306
XDB                                   682
 
OWNER                            COUNT(*)
------------------------------ ----------
IX                                     53
BI                                      8
SYS                                 22912
WMSYS                                 242
SI_INFORMTN_SCHEMA                      8
COLIN                                   6
 
26 rows selected

SQL>  create index idx_tmp_liuhc on tmp_liuhc_1(owner);
 
Index created
 
SQL> select sysdate from dual;
 
SYSDATE
-----------
2011-10-30

删除柱状图信息,bucket为1时,相当于一个普通分析,没有柱状图信息,执行计划按绝大多数ID字段内容来选择走索引,删除之后刷新shared_pool
SQL> exec dbms_stats.gather_table_stats(ownname => 'colin',tabname => 'tmp_liuhc_1',estimate_percent => null ,method_opt => 'for all columns size 1' ,cascade => true);
 
PL/SQL procedure successfully completed
 
SQL> alter system flush shared_pool;
 
System altered
 
SQL>  select * from user_tab_col_statistics where column_name='OWNER';
 
TABLE_NAME                     COLUMN_NAME                    NUM_DISTINCT LOW_VALUE                                                        HIGH_VALUE                 

                                        DENSITY  NUM_NULLS NUM_BUCKETS LAST_ANALYZED SAMPLE_SIZE GLOBAL_STATS USER_STATS AVG_COL_LEN HISTOGRAM
------------------------------ ------------------------------ ------------ ---------------------------------------------------------------- --------------------------

-------------------------------------- ---------- ---------- ----------- ------------- ----------- ------------ ---------- ----------- ---------------
TMP_LIUHC_1                    OWNER                                    26 4249                                                             584442                     

                                     0.03846153          0           1 2011-10-30 9:       50325 YES          NO                   6 NONE
 
SQL> explain plan for select * from tmp_liuhc_1 where owner='COLIN';
 
Explained
 
SQL> select * From table(Dbms_Xplan.display);
 
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 3774022813
--------------------------------------------------------------------------------
| Id  | Operation                   | Name          | Rows  | Bytes | Cost (%CPU
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |               |  1936 |   175K|    57   (0
|   1 |  TABLE ACCESS BY INDEX ROWID| TMP_LIUHC_1   |  1936 |   175K|    57   (0
|*  2 |   INDEX RANGE SCAN          | IDX_TMP_LIUHC |  1936 |       |     5   (0
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("OWNER"='COLIN')
 
14 rows selected
 
SQL> explain plan for select * from tmp_liuhc_1 where owner='SYS';
 
Explained
 
SQL> select * From table(Dbms_Xplan.display);
 
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 3774022813
--------------------------------------------------------------------------------
| Id  | Operation                   | Name          | Rows  | Bytes | Cost (%CPU
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |               |  1936 |   175K|    57   (0
|   1 |  TABLE ACCESS BY INDEX ROWID| TMP_LIUHC_1   |  1936 |   175K|    57   (0
|*  2 |   INDEX RANGE SCAN          | IDX_TMP_LIUHC |  1936 |       |     5   (0
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("OWNER"='SYS')
 
14 rows selected
 
SQL> exec dbms_stats.gather_table_stats(ownname => 'colin',tabname => 'tmp_liuhc_1',estimate_percent => null ,method_opt => 'for all columns size auto' ,cascade =>

true);
 
PL/SQL procedure successfully completed

SQL> select * from user_tab_col_statistics where column_name='OWNER';
 
TABLE_NAME                     COLUMN_NAME                    NUM_DISTINCT LOW_VALUE                                                        HIGH_VALUE                 

                                        DENSITY  NUM_NULLS NUM_BUCKETS LAST_ANALYZED SAMPLE_SIZE GLOBAL_STATS USER_STATS AVG_COL_LEN HISTOGRAM
------------------------------ ------------------------------ ------------ ---------------------------------------------------------------- --------------------------

-------------------------------------- ---------- ---------- ----------- ------------- ----------- ------------ ---------- ----------- ---------------
TMP_LIUHC_1                    OWNER                                    26 4249                                                             584442                     

                                     9.93541977          0          26 2011-10-30 9:       50325 YES          NO                   6 FREQUENCY
 
SQL>
SQL> explain plan for select * from tmp_liuhc_1 where owner='COLIN';
 
Explained
 
SQL> select * From table(Dbms_Xplan.display);
 
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 3774022813
--------------------------------------------------------------------------------
| Id  | Operation                   | Name          | Rows  | Bytes | Cost (%CPU
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |               |     6 |   558 |     2   (0
|   1 |  TABLE ACCESS BY INDEX ROWID| TMP_LIUHC_1   |     6 |   558 |     2   (0
|*  2 |   INDEX RANGE SCAN          | IDX_TMP_LIUHC |     6 |       |     1   (0
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("OWNER"='COLIN')
 
14 rows selected
 
SQL> explain plan for select * from tmp_liuhc_1 where owner='SYS';
 
Explained
 
SQL> select * From table(Dbms_Xplan.display);
 
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1961695573
--------------------------------------------------------------------------------
| Id  | Operation         | Name        | Rows  | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |             | 22912 |  2080K|   160   (2)| 00:00:02
|*  1 |  TABLE ACCESS FULL| TMP_LIUHC_1 | 22912 |  2080K|   160   (2)| 00:00:02
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("OWNER"='SYS')
 
13 rows selected

从以上可以看出,当删除柱状图时,查询SYS用户时,CBO按大多数ID字段的内容,选择走索引;当搜集柱状图后,CBO选择了正确的执行计划,走全表扫描,因为前面已经查询了,SYS用户下的表占用了决大部分。

 

附录:附带两张表的解释信息,此处的表DBA_TAB_COLUMNS和表user_tab_col_statistics是同样效果

DBA_TAB_HISTOGRAMS
该视图列出了所有表上列的柱状图信息。
 
字段名称              描述
---------------------------------------------------------
OWNER                 表的所有者
TABLE_NAME            表名
COLUMN_NAME           列名(字段名)
ENDPOINT_NUMBER       终点数值,也就是bucket的编号值
ENDPOINT_VALUE        根据该bucket的编号值,规格化的终点字段值,
                      如果字段类型是数字,则显示该字段值,
                      如果是其他类型的,则是被规格化的数值。
 
DBA_TAB_COLUMNS
该视图包含了所有表上列的描述信息。
(注意:虽然视图和簇的信息也包含在该视图中,但没有相关的柱状图信息)
 
字段名称              描述
----------------------------------------------------------
OWNER                 表的所有者
TABLE_NAME            表名
COLUMN_NAME           列名(字段名)
DATA_TYPE             字段的数据类型
DATA_LENGTH           该列定义的长度
DATA_PRECISION        NUMBER或FLOAT数值类型的精度
DATA_SCALE            定义的小数位数
NULLABLE              是否允许NULL值
COLUMN_ID             该列在表中的顺序的位置,表中第几个字段
DEFAULT_LENGTH        字段默认值的长度
DATA_DEFAULT          字段默认值
NUM_DISTINCT          该字段不同值的数量
LOW_VALUE             该字段的最小值,
                      该值是内部按16进制存储的该值的前32 bytes内容
HIGH_VALUE            该字段的最大值,
                      该值是内部按16进制存储的该值的前32 bytes内容
DENSITY               该字段的密度(不同值的比例值,
                      比如一个字段只有2个不同值,那么该字段密度为0.5)
NUM_NULLS             该字段NULL值的数量
NUM_BUCKETS           该字段柱状图中bucket的数量
LAST_ANALYZED         最近一次分析表的时间
SAMPLE_SIZE           采样数据的数量(根据分析时的百分比得出的数量,
                      例如100%分析,该采样值就是行的数量)

 

你可能感兴趣的:(SQL,TUNING)