实验直方图对倾斜列的影响


实验直方图对倾斜列的影响
一、 收集统计信息 无直方图

analyze table &tab delete statistics;
analyze table &tab compute statistics;
user_tab_histograms:
    TABLE_NAME  COLUMN_NAME  endpoint_number  endpoint_value
1 TAB_01_03 B 0           1 
2 TAB_01_03 B 1    10000 

user_tab_histograms:
TABLE_NAME TAB_01_03
COLUMN_NAME B
NUM_DISTINCT 5     ---distinct value 是 5
LOW_VALUE C102
HIGH_VALUE C302
DENSITY 0.2           --密度 1/5
NUM_NULLS 0
NUM_BUCKETS 1
LAST_ANALYZED 2010-7-4 9:51:38
SAMPLE_SIZE 11116
GLOBAL_STATS NO
USER_STATS NO
AVG_COL_LEN 2
HISTOGRAM NONE

trace 10053事件--转储优化策略
 alter session set events '10053 trace name context forever ,level 1';
执行:
select count(1) from tab_01_03 t where b=1;

***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: TAB_01_03  Alias:  T
    #Rows: 11116  #Blks:  20  AvgRowLen:  9.00
Index Stats::
  Index: IDX_B  Col#: 2
    LVLS: 1  #LB: 22  #DK: 5  LB/K: 4.00  DB/K: 3.00  CLUF: 18.00
***************************************
SINGLE TABLE ACCESS PATH
  Column (#2): B(NUMBER)
    AvgLen: 2.00 NDV: 5 Nulls: 0 Density: 0.2 Min: 1 Max: 10000--ndv=number of distinct values
  Table: TAB_01_03  Alias: T    
    Card: Original: 11116  Rounded: 2223  Computed: 2223.20  Non Adjusted: 2223.20
  Access Path: TableScan
    Cost:  6.44  Resp: 6.44  Degree: 0
      Cost_io: 6.00  Cost_cpu: 2587949
      Resp_io: 6.00  Resp_cpu: 2587949
  Access Path: index (index (FFS))
    Index: IDX_B
    resc_io: 7.00  resc_cpu: 2046392
    ix_sel: 0.0000e+000  ix_sel_with_filters: 1
  Access Path: index (FFS)
    Cost:  7.35  Resp: 7.35  Degree: 1
      Cost_io: 7.00  Cost_cpu: 2046392
      Resp_io: 7.00  Resp_cpu: 2046392
  Access Path: index (AllEqRange)
    Index: IDX_B
    resc_io: 5.00  resc_cpu: 481257
    ix_sel: 0.2  ix_sel_with_filters: 0.2
    Cost: 5.08  Resp: 5.08  Degree: 1
  Best:: AccessPath: IndexRange  Index: IDX_B
         Cost: 5.08  Degree: 1  Resp: 5.08  Card: 2223.20  Bytes: 0
对比四种访问方式 oracle决定选择IndexRange
Current SQL statement for this session:
select count(1) from tab_01_03 t where b=1
 
============
Plan Table
============
-------------------------------------+-----------------------------------+
| Id  | Operation          | Name    | Rows  | Bytes | Cost  | Time      |
-------------------------------------+-----------------------------------+
| 0   | SELECT STATEMENT   |         |       |       |     5 |           |
| 1   |  SORT AGGREGATE    |         |     1 |     2 |       |           |
| 2   |   INDEX RANGE SCAN | IDX_B   |  2223 |  4446 |     5 |  00:00:01 |
-------------------------------------+-----------------------------------+
Predicate Information:
----------------------
2 - access("B"=1)
这个也验证上述的选择。
再一次执行INDEX RANGE SCAN
select count(1) from tab_01_03 t where b=10000;
仍然选择了
select count(1) from tab_01_03 t where b=10000
 
============
Plan Table
============
-------------------------------------+-----------------------------------+
| Id  | Operation          | Name    | Rows  | Bytes | Cost  | Time      |
-------------------------------------+-----------------------------------+
| 0   | SELECT STATEMENT   |         |       |       |     5 |           |
| 1   |  SORT AGGREGATE    |         |     1 |     2 |       |           |
| 2   |   INDEX RANGE SCAN | IDX_B   |  2223 |  4446 |     5 |  00:00:01 |
-------------------------------------+-----------------------------------+
Predicate Information:
----------------------
2 - access("B"=10000)

DENS = 1/NDV.
 二、收集直方图
exec dbms_stats.gather_table_stats(ownname=>'doteng',tabname=>'tab_01_03',
method_opt => 'for columns size auto b')
user_tab_histograms:
TABLE_NAME TAB_01_03
COLUMN_NAME B
NUM_DISTINCT 5
LOW_VALUE C102
HIGH_VALUE C302
DENSITY 4.49802087081684E-5
NUM_NULLS 0
NUM_BUCKETS 5
LAST_ANALYZED 2010-7-4 10:30:55
SAMPLE_SIZE 11116
GLOBAL_STATS YES
USER_STATS NO
AVG_COL_LEN 3
HISTOGRAM FREQUENCY
user_tab_histograms:
    TABLE_NAME  COLUMN_NAME  endpoint_number  endpoint_value
1 TAB_01_03 B 2 1 
2 TAB_01_03 B 13 10 
3 TAB_01_03 B 114 100 
4 TAB_01_03 B 1115 1000 
5 TAB_01_03 B 11116 10000 
conclusion:method_opt => 'for columns size auto b' size =auto 由oracle决定更合适的size 得出的是
频率直方图
当histogram buckets>=列的distinct values时,那么Oracle会使用基于值的histogram,每个值将会
占据一个bucket 。
观察trace信息

BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: TAB_01_03  Alias:  T
    #Rows: 11116  #Blks:  20  AvgRowLen:  6.00
Index Stats::
  Index: IDX_B  Col#: 2
    LVLS: 1  #LB: 22  #DK: 5  LB/K: 4.00  DB/K: 3.00  CLUF: 18.00
***************************************
SINGLE TABLE ACCESS PATH
  Column (#2): B(NUMBER)
    AvgLen: 3.00 NDV: 5 Nulls: 0 Density: 4.4980e-005 Min: 1 Max: 10000
    Histogram: Freq  #Bkts: 5  UncompBkts: 11116  EndPtVals: 5
  Table: TAB_01_03  Alias: T    
    Card: Original: 11116  Rounded: 2  Computed: 2.00  Non Adjusted: 2.00
  Access Path: TableScan
    Cost:  6.44  Resp: 6.44  Degree: 0
      Cost_io: 6.00  Cost_cpu: 2587949
      Resp_io: 6.00  Resp_cpu: 2587949
  Access Path: index (index (FFS))
    Index: IDX_B
    resc_io: 7.00  resc_cpu: 2046392
    ix_sel: 0.0000e+000  ix_sel_with_filters: 1
  Access Path: index (FFS)
    Cost:  7.35  Resp: 7.35  Degree: 1
      Cost_io: 7.00  Cost_cpu: 2046392
      Resp_io: 7.00  Resp_cpu: 2046392
  Access Path: index (AllEqRange)
    Index: IDX_B
    resc_io: 1.00  resc_cpu: 8371
    ix_sel: 1.7992e-004  ix_sel_with_filters: 1.7992e-004
    Cost: 1.00  Resp: 1.00  Degree: 1
  Best:: AccessPath: IndexRange  Index: IDX_B
         Cost: 1.00  Degree: 1  Resp: 1.00  Card: 2.00  Bytes: 0

跟上次的对比 很明显的区别是 Rounded: 2 << Rounded: 2223
select count(1) from tab_01_03 t where b=1
 
============
Plan Table
============
-------------------------------------+-----------------------------------+
| Id  | Operation          | Name    | Rows  | Bytes | Cost  | Time      |
-------------------------------------+-----------------------------------+
| 0   | SELECT STATEMENT   |         |       |       |     1 |           |
| 1   |  SORT AGGREGATE    |         |     1 |     3 |       |           |
| 2   |   INDEX RANGE SCAN | IDX_B   |     2 |     6 |     1 |  00:00:01 |
-------------------------------------+-----------------------------------+
Predicate Information:
----------------------
2 - access("B"=1)
再一次执行:select count(1) from tab_01_03 t where b=10000

BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: TAB_01_03  Alias:  T
    #Rows: 11116  #Blks:  20  AvgRowLen:  6.00
Index Stats::
  Index: IDX_B  Col#: 2
    LVLS: 1  #LB: 22  #DK: 5  LB/K: 4.00  DB/K: 3.00  CLUF: 18.00
***************************************
SINGLE TABLE ACCESS PATH
  Column (#2): B(NUMBER)
    AvgLen: 3.00 NDV: 5 Nulls: 0 Density: 4.4980e-005 Min: 1 Max: 10000
    Histogram: Freq  #Bkts: 5  UncompBkts: 11116  EndPtVals: 5
  Table: TAB_01_03  Alias: T    
    Card: Original: 11116  Rounded: 10001  Computed: 10000.50  Non Adjusted: 10000.50
  Access Path: TableScan
    Cost:  6.44  Resp: 6.44  Degree: 0
      Cost_io: 6.00  Cost_cpu: 2587949
      Resp_io: 6.00  Resp_cpu: 2587949
  Access Path: index (index (FFS))
    Index: IDX_B
    resc_io: 7.00  resc_cpu: 2046392
    ix_sel: 0.0000e+000  ix_sel_with_filters: 1
  Access Path: index (FFS)
    Cost:  7.35  Resp: 7.35  Degree: 1
      Cost_io: 7.00  Cost_cpu: 2046392
      Resp_io: 7.00  Resp_cpu: 2046392
  Access Path: index (AllEqRange)
    Index: IDX_B
    resc_io: 20.00  resc_cpu: 2143479
    ix_sel: 0.89965  ix_sel_with_filters: 0.89965
    Cost: 20.37  Resp: 20.37  Degree: 1
  Best:: AccessPath: TableScan
         Cost: 6.44  Degree: 1  Resp: 6.44  Card: 10000.50  Bytes: 0

 select count(1) from tab_01_03 t where b=10000
 
============
Plan Table
============
---------------------------------------+-----------------------------------+
| Id  | Operation           | Name     | Rows  | Bytes | Cost  | Time      |
---------------------------------------+-----------------------------------+
| 0   | SELECT STATEMENT    |          |       |       |     6 |           |
| 1   |  SORT AGGREGATE     |          |     1 |     3 |       |           |
| 2   |   TABLE ACCESS FULL | TAB_01_03|   10K |   29K |     6 |  00:00:01 |
---------------------------------------+-----------------------------------+
Predicate Information:
----------------------
2 - filter("B"=10000)

三、采集高度直方图
验证点:
(1)当histogram buckets的数量少于列的distinct value时,oracle会采用基于高度的直方图反映数据分布,
每个bucket容纳相同数量的值。
exec dbms_stats.gather_table_stats(ownname=>'doteng',tabname=>'tab_01_03',
method_opt => 'for columns size 3 b');
查看统计信息
user_tab_col_statistics
TABLE_NAME TAB_01_03
COLUMN_NAME B
NUM_DISTINCT 5
LOW_VALUE C102
HIGH_VALUE C302
DENSITY 0.81764217339189
NUM_NULLS 0
NUM_BUCKETS 3
LAST_ANALYZED 2010-7-4 11:23:40
SAMPLE_SIZE 11116
GLOBAL_STATS YES
USER_STATS NO
AVG_COL_LEN 3
HISTOGRAM HEIGHT BALANCED
user_tab_histograms:
    TABLE_NAME  COLUMN_NAME  endpoint_number  endpoint_value
1 TAB_01_03 B 3 10000 
2 TAB_01_03 B 0 1 
执行select count(1) from tab_01_03 t where b=1;
***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: TAB_01_03  Alias:  T
    #Rows: 11116  #Blks:  20  AvgRowLen:  6.00
Index Stats::
  Index: IDX_B  Col#: 2
    LVLS: 1  #LB: 22  #DK: 5  LB/K: 4.00  DB/K: 3.00  CLUF: 18.00
***************************************
SINGLE TABLE ACCESS PATH
  Column (#2): B(NUMBER)
    AvgLen: 3.00 NDV: 5 Nulls: 0 Density: 0.81764 Min: 1 Max: 10000
    Histogram: HtBal  #Bkts: 3  UncompBkts: 3  EndPtVals: 2
  Using prorated density: 0.81764 of col #2 as selectivity of out-of-range value pred
  Table: TAB_01_03  Alias: T    
    Card: Original: 11116  Rounded: 9089  Computed: 9088.91  Non Adjusted: 9088.91
  Access Path: TableScan
    Cost:  6.44  Resp: 6.44  Degree: 0
      Cost_io: 6.00  Cost_cpu: 2587949
      Resp_io: 6.00  Resp_cpu: 2587949
  Access Path: index (index (FFS))
    Index: IDX_B
    resc_io: 7.00  resc_cpu: 2046392
    ix_sel: 0.0000e+000  ix_sel_with_filters: 1
  Access Path: index (FFS)
    Cost:  7.35  Resp: 7.35  Degree: 1
      Cost_io: 7.00  Cost_cpu: 2046392
      Resp_io: 7.00  Resp_cpu: 2046392
  Using prorated density: 0.81764 of col #2 as selectivity of out-of-range value pred
  Access Path: index (AllEqRange)
    Index: IDX_B
    resc_io: 18.00  resc_cpu: 1946836
    ix_sel: 0.81764  ix_sel_with_filters: 0.81764
    Cost: 18.33  Resp: 18.33  Degree: 1
  Best:: AccessPath: TableScan
         Cost: 6.44  Degree: 1  Resp: 6.44  Card: 9088.91  Bytes: 0
***************************************

select count(1) from tab_01_03 t where b=1
 
============
Plan Table
============
---------------------------------------+-----------------------------------+
| Id  | Operation           | Name     | Rows  | Bytes | Cost  | Time      |
---------------------------------------+-----------------------------------+
| 0   | SELECT STATEMENT    |          |       |       |     6 |           |
| 1   |  SORT AGGREGATE     |          |     1 |     3 |       |           |
| 2   |   TABLE ACCESS FULL | TAB_01_03|  9089 |   27K |     6 |  00:00:01 |
---------------------------------------+-----------------------------------+
Predicate Information:
----------------------
2 - filter("B"=1)

再一次执行
select count(1) from tab_01_03 t where b=10000;
***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: TAB_01_03  Alias:  T
    #Rows: 11116  #Blks:  20  AvgRowLen:  6.00
Index Stats::
  Index: IDX_B  Col#: 2
    LVLS: 1  #LB: 22  #DK: 5  LB/K: 4.00  DB/K: 3.00  CLUF: 18.00
***************************************
SINGLE TABLE ACCESS PATH
  Column (#2): B(NUMBER)
    AvgLen: 3.00 NDV: 5 Nulls: 0 Density: 0.81764 Min: 1 Max: 10000
    Histogram: HtBal  #Bkts: 3  UncompBkts: 3  EndPtVals: 2
  Table: TAB_01_03  Alias: T    
    Card: Original: 11116  Rounded: 9263  Computed: 9263.33  Non Adjusted: 9263.33
  Access Path: TableScan
    Cost:  6.44  Resp: 6.44  Degree: 0
      Cost_io: 6.00  Cost_cpu: 2587949
      Resp_io: 6.00  Resp_cpu: 2587949
  Access Path: index (index (FFS))
    Index: IDX_B
    resc_io: 7.00  resc_cpu: 2046392
    ix_sel: 0.0000e+000  ix_sel_with_filters: 1
  Access Path: index (FFS)
    Cost:  7.35  Resp: 7.35  Degree: 1
      Cost_io: 7.00  Cost_cpu: 2046392
      Resp_io: 7.00  Resp_cpu: 2046392
  Access Path: index (AllEqRange)
    Index: IDX_B
    resc_io: 19.00  resc_cpu: 1988957
    ix_sel: 0.83333  ix_sel_with_filters: 0.83333
    Cost: 19.34  Resp: 19.34  Degree: 1
  Best:: AccessPath: TableScan
         Cost: 6.44  Degree: 1  Resp: 6.44  Card: 9263.33  Bytes: 0
---Density 是怎么算的???
---Using prorated density: 0.81764 of col #2 as selectivity of out-of-range value pred
---这句话怎么理解???
四、无统计信息 无直方图
执行select count(1) from tab_01_03 t where b=10000;
***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: TAB_01_03  Alias:  T  (NOT ANALYZED)
    #Rows: 1634  #Blks:  20  AvgRowLen:  100.00
Index Stats::
  Index: IDX_B  Col#: 2    (NOT ANALYZED)
    LVLS: 1  #LB: 25  #DK: 100  LB/K: 1.00  DB/K: 1.00  CLUF: 800.00
***************************************
SINGLE TABLE ACCESS PATH
*** 2010-07-04 11:00:14.585
** Performing dynamic sampling initial checks. **
  Column (#2): B(NUMBER)  NO STATISTICS (using defaults)
    AvgLen: 22.00 NDV: 51 Nulls: 0 Density: 0.019584
** Dynamic sampling initial checks returning TRUE (level = 2).
** Dynamic sampling updated index stats.: IDX_B, blocks=27
** Dynamic sampling index access candidate : IDX_B
** Dynamic sampling updated table stats.: blocks=20
*** 2010-07-04 11:00:14.601
** Generated dynamic sampling query:
    query text :
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled', 'false') NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),0), NVL(SUM(C2),0) FROM (SELECT /*+ IGNORE_WHERE_CLAUSE NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ 1 AS C1, CASE WHEN "T"."B"=10000 THEN 1 ELSE 0 END AS C2 FROM "TAB_01_03" "T") SAMPLESUB
*** 2010-07-04 11:00:14.616
** Executed dynamic sampling query:
    level : 2
    sample pct. : 100.000000
    actual sample size : 11116
    filtered sample card. : 10001
    orig. card. : 1634
    block cnt. table stat. : 20
    block cnt. for sampling: 20
    max. sample block cnt. : 64
    sample block cnt. : 20
    min. sel. est. : 0.01000000
** Using recursive dynamic sampling card. est. : 11116.000000
*** 2010-07-04 11:00:14.616
** Generated dynamic sampling query:
    query text :
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS opt_param('parallel_execution_enabled', 'false') NO_PARALLEL(SAMPLESUB) NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),0), NVL(SUM(C2),0), NVL(SUM(C3),0) FROM (SELECT /*+ NO_PARALLEL("T") INDEX("T" IDX_B) NO_PARALLEL_INDEX("T") */ 1 AS C1, 1 AS C2, 1 AS C3  FROM "TAB_01_03" "T" WHERE "T"."B"=10000 AND ROWNUM <= 2500) SAMPLESUB
** Increasing dynamic sampling filtered
   sample card. for predicate 0 from 2500 to 10001.
*** 2010-07-04 11:00:14.616
** Executed dynamic sampling query:
    level : 2
    sample pct. : 100.000000
    actual sample size : 11116
    filtered sample card. : 10001
    filtered sample card. (index IDX_B): 2500
    orig. card. : 11116
    block cnt. table stat. : 20
    block cnt. for sampling: 20
    max. sample block cnt. : 4294967295
    sample block cnt. : 20
    min. sel. est. : 0.01000000
** Increasing dynamic sampling filtered
   sample card. for predicate 1 from 2500 to 10001.
    index IDX_B selectivity est.: 0.89969413
** Using dynamic sampling card. : 11116
** Dynamic sampling updated table card.
** Using single table dynamic sel. est. : 0.89969413
  Table: TAB_01_03  Alias: T    
    Card: Original: 11116  Rounded: 10001  Computed: 10001.00  Non Adjusted: 10001.00
  Access Path: TableScan
    Cost:  6.44  Resp: 6.44  Degree: 0
      Cost_io: 6.00  Cost_cpu: 2587949
      Resp_io: 6.00  Resp_cpu: 2587949
  Access Path: index (index (FFS))
    Index: IDX_B
    resc_io: 8.00  resc_cpu: 617279
    ix_sel: 0.0000e+000  ix_sel_with_filters: 1
  Access Path: index (FFS)
    Cost:  8.11  Resp: 8.11  Degree: 1
      Cost_io: 8.00  Cost_cpu: 617279
      Resp_io: 8.00  Resp_cpu: 617279
  Access Path: index (AllEqRange)
    Index: IDX_B
    resc_io: 25.00  resc_cpu: 628886
    ix_sel: 0.89969  ix_sel_with_filters: 0.89969
    Cost: 25.11  Resp: 25.11  Degree: 1
  Best:: AccessPath: TableScan
         Cost: 6.44  Degree: 1  Resp: 6.44  Card: 10001.00  Bytes: 0
无统计信息 无直方图是oracle使用动态抽样或内部默认值决定选择值。
重要参数optimizer_dynamic_sampling
SQL> show parameter optimizer
 
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
optimizer_dynamic_sampling           integer     2

对于数据倾斜的列 适合做频率直方图 method_opt默认是for all columns size auto。
频率直方图把每个不同的value切割到各自的bucket中。
上面几个实验 始终涉及到选择性 selectivity
仅存在统计信息:selectivity=n/number of distinct value
存在统计信息和频率直方图  selectivity= 某个BUCKET中行数/总行数
存在统计信息和高度直方图  selectivity= ???
。。。。。

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/21993926/viewspace-667125/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/21993926/viewspace-667125/

你可能感兴趣的:(实验直方图对倾斜列的影响)