实验直方图对倾斜列的影响
一、 收集统计信息 无直方图
analyze table &tab delete statistics;
analyze table &tab compute statistics;
user_tab_histograms:
TABLE_NAME COLUMN_NAME endpoint_number endpoint_value
1 TAB_01_03 B 0 1
2 TAB_01_03 B 1 10000
user_tab_histograms:
TABLE_NAME TAB_01_03
COLUMN_NAME B
NUM_DISTINCT 5 ---distinct value 是 5
LOW_VALUE C102
HIGH_VALUE C302
DENSITY 0.2 --密度 1/5
NUM_NULLS 0
NUM_BUCKETS 1
LAST_ANALYZED 2010-7-4 9:51:38
SAMPLE_SIZE 11116
GLOBAL_STATS NO
USER_STATS NO
AVG_COL_LEN 2
HISTOGRAM NONE
trace 10053事件--转储优化策略
alter session set events '10053 trace name context forever ,level 1';
执行:
select count(1) from tab_01_03 t where b=1;
***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
Table: TAB_01_03 Alias: T
#Rows: 11116 #Blks: 20 AvgRowLen: 9.00
Index Stats::
Index: IDX_B Col#: 2
LVLS: 1 #LB: 22 #DK: 5 LB/K: 4.00 DB/K: 3.00 CLUF: 18.00
***************************************
SINGLE TABLE ACCESS PATH
Column (#2): B(NUMBER)
AvgLen: 2.00 NDV: 5 Nulls: 0 Density: 0.2 Min: 1 Max: 10000--ndv=number of distinct values
Table: TAB_01_03 Alias: T
Card: Original: 11116 Rounded: 2223 Computed: 2223.20 Non Adjusted: 2223.20
Access Path: TableScan
Cost: 6.44 Resp: 6.44 Degree: 0
Cost_io: 6.00 Cost_cpu: 2587949
Resp_io: 6.00 Resp_cpu: 2587949
Access Path: index (index (FFS))
Index: IDX_B
resc_io: 7.00 resc_cpu: 2046392
ix_sel: 0.0000e+000 ix_sel_with_filters: 1
Access Path: index (FFS)
Cost: 7.35 Resp: 7.35 Degree: 1
Cost_io: 7.00 Cost_cpu: 2046392
Resp_io: 7.00 Resp_cpu: 2046392
Access Path: index (AllEqRange)
Index: IDX_B
resc_io: 5.00 resc_cpu: 481257
ix_sel: 0.2 ix_sel_with_filters: 0.2
Cost: 5.08 Resp: 5.08 Degree: 1
Best:: AccessPath: IndexRange Index: IDX_B
Cost: 5.08 Degree: 1 Resp: 5.08 Card: 2223.20 Bytes: 0
对比四种访问方式 oracle决定选择IndexRange
Current SQL statement for this session:
select count(1) from tab_01_03 t where b=1
============
Plan Table
============
-------------------------------------+-----------------------------------+
| Id | Operation | Name | Rows | Bytes | Cost | Time |
-------------------------------------+-----------------------------------+
| 0 | SELECT STATEMENT | | | | 5 | |
| 1 | SORT AGGREGATE | | 1 | 2 | | |
| 2 | INDEX RANGE SCAN | IDX_B | 2223 | 4446 | 5 | 00:00:01 |
-------------------------------------+-----------------------------------+
Predicate Information:
----------------------
2 - access("B"=1)
这个也验证上述的选择。
再一次执行INDEX RANGE SCAN
select count(1) from tab_01_03 t where b=10000;
仍然选择了
select count(1) from tab_01_03 t where b=10000
============
Plan Table
============
-------------------------------------+-----------------------------------+
| Id | Operation | Name | Rows | Bytes | Cost | Time |
-------------------------------------+-----------------------------------+
| 0 | SELECT STATEMENT | | | | 5 | |
| 1 | SORT AGGREGATE | | 1 | 2 | | |
| 2 | INDEX RANGE SCAN | IDX_B | 2223 | 4446 | 5 | 00:00:01 |
-------------------------------------+-----------------------------------+
Predicate Information:
----------------------
2 - access("B"=10000)
DENS = 1/NDV.
二、收集直方图
exec dbms_stats.gather_table_stats(ownname=>'doteng',tabname=>'tab_01_03',
method_opt => 'for columns size auto b')
user_tab_histograms:
TABLE_NAME TAB_01_03
COLUMN_NAME B
NUM_DISTINCT 5
LOW_VALUE C102
HIGH_VALUE C302
DENSITY 4.49802087081684E-5
NUM_NULLS 0
NUM_BUCKETS 5
LAST_ANALYZED 2010-7-4 10:30:55
SAMPLE_SIZE 11116
GLOBAL_STATS YES
USER_STATS NO
AVG_COL_LEN 3
HISTOGRAM FREQUENCY
user_tab_histograms:
TABLE_NAME COLUMN_NAME endpoint_number endpoint_value
1 TAB_01_03 B 2 1
2 TAB_01_03 B 13 10
3 TAB_01_03 B 114 100
4 TAB_01_03 B 1115 1000
5 TAB_01_03 B 11116 10000
conclusion:method_opt => 'for columns size auto b' size =auto 由oracle决定更合适的size 得出的是
频率直方图
当histogram buckets>=列的distinct values时,那么Oracle会使用基于值的histogram,每个值将会
占据一个bucket 。
观察trace信息
BASE STATISTICAL INFORMATION
***********************
Table Stats::
Table: TAB_01_03 Alias: T
#Rows: 11116 #Blks: 20 AvgRowLen: 6.00
Index Stats::
Index: IDX_B Col#: 2
LVLS: 1 #LB: 22 #DK: 5 LB/K: 4.00 DB/K: 3.00 CLUF: 18.00
***************************************
SINGLE TABLE ACCESS PATH
Column (#2): B(NUMBER)
AvgLen: 3.00 NDV: 5 Nulls: 0 Density: 4.4980e-005 Min: 1 Max: 10000
Histogram: Freq #Bkts: 5 UncompBkts: 11116 EndPtVals: 5
Table: TAB_01_03 Alias: T
Card: Original: 11116 Rounded: 2 Computed: 2.00 Non Adjusted: 2.00
Access Path: TableScan
Cost: 6.44 Resp: 6.44 Degree: 0
Cost_io: 6.00 Cost_cpu: 2587949
Resp_io: 6.00 Resp_cpu: 2587949
Access Path: index (index (FFS))
Index: IDX_B
resc_io: 7.00 resc_cpu: 2046392
ix_sel: 0.0000e+000 ix_sel_with_filters: 1
Access Path: index (FFS)
Cost: 7.35 Resp: 7.35 Degree: 1
Cost_io: 7.00 Cost_cpu: 2046392
Resp_io: 7.00 Resp_cpu: 2046392
Access Path: index (AllEqRange)
Index: IDX_B
resc_io: 1.00 resc_cpu: 8371
ix_sel: 1.7992e-004 ix_sel_with_filters: 1.7992e-004
Cost: 1.00 Resp: 1.00 Degree: 1
Best:: AccessPath: IndexRange Index: IDX_B
Cost: 1.00 Degree: 1 Resp: 1.00 Card: 2.00 Bytes: 0
跟上次的对比 很明显的区别是 Rounded: 2 << Rounded: 2223
select count(1) from tab_01_03 t where b=1
============
Plan Table
============
-------------------------------------+-----------------------------------+
| Id | Operation | Name | Rows | Bytes | Cost | Time |
-------------------------------------+-----------------------------------+
| 0 | SELECT STATEMENT | | | | 1 | |
| 1 | SORT AGGREGATE | | 1 | 3 | | |
| 2 | INDEX RANGE SCAN | IDX_B | 2 | 6 | 1 | 00:00:01 |
-------------------------------------+-----------------------------------+
Predicate Information:
----------------------
2 - access("B"=1)
再一次执行:select count(1) from tab_01_03 t where b=10000
BASE STATISTICAL INFORMATION
***********************
Table Stats::
Table: TAB_01_03 Alias: T
#Rows: 11116 #Blks: 20 AvgRowLen: 6.00
Index Stats::
Index: IDX_B Col#: 2
LVLS: 1 #LB: 22 #DK: 5 LB/K: 4.00 DB/K: 3.00 CLUF: 18.00
***************************************
SINGLE TABLE ACCESS PATH
Column (#2): B(NUMBER)
AvgLen: 3.00 NDV: 5 Nulls: 0 Density: 4.4980e-005 Min: 1 Max: 10000
Histogram: Freq #Bkts: 5 UncompBkts: 11116 EndPtVals: 5
Table: TAB_01_03 Alias: T
Card: Original: 11116 Rounded: 10001 Computed: 10000.50 Non Adjusted: 10000.50
Access Path: TableScan
Cost: 6.44 Resp: 6.44 Degree: 0
Cost_io: 6.00 Cost_cpu: 2587949
Resp_io: 6.00 Resp_cpu: 2587949
Access Path: index (index (FFS))
Index: IDX_B
resc_io: 7.00 resc_cpu: 2046392
ix_sel: 0.0000e+000 ix_sel_with_filters: 1
Access Path: index (FFS)
Cost: 7.35 Resp: 7.35 Degree: 1
Cost_io: 7.00 Cost_cpu: 2046392
Resp_io: 7.00 Resp_cpu: 2046392
Access Path: index (AllEqRange)
Index: IDX_B
resc_io: 20.00 resc_cpu: 2143479
ix_sel: 0.89965 ix_sel_with_filters: 0.89965
Cost: 20.37 Resp: 20.37 Degree: 1
Best:: AccessPath: TableScan
Cost: 6.44 Degree: 1 Resp: 6.44 Card: 10000.50 Bytes: 0
select count(1) from tab_01_03 t where b=10000
============
Plan Table
============
---------------------------------------+-----------------------------------+
| Id | Operation | Name | Rows | Bytes | Cost | Time |
---------------------------------------+-----------------------------------+
| 0 | SELECT STATEMENT | | | | 6 | |
| 1 | SORT AGGREGATE | | 1 | 3 | | |
| 2 | TABLE ACCESS FULL | TAB_01_03| 10K | 29K | 6 | 00:00:01 |
---------------------------------------+-----------------------------------+
Predicate Information:
----------------------
2 - filter("B"=10000)
三、采集高度直方图
验证点:
(1)当histogram buckets的数量少于列的distinct value时,oracle会采用基于高度的直方图反映数据分布,
每个bucket容纳相同数量的值。
exec dbms_stats.gather_table_stats(ownname=>'doteng',tabname=>'tab_01_03',
method_opt => 'for columns size 3 b');
查看统计信息
user_tab_col_statistics
TABLE_NAME TAB_01_03
COLUMN_NAME B
NUM_DISTINCT 5
LOW_VALUE C102
HIGH_VALUE C302
DENSITY 0.81764217339189
NUM_NULLS 0
NUM_BUCKETS 3
LAST_ANALYZED 2010-7-4 11:23:40
SAMPLE_SIZE 11116
GLOBAL_STATS YES
USER_STATS NO
AVG_COL_LEN 3
HISTOGRAM HEIGHT BALANCED
user_tab_histograms:
TABLE_NAME COLUMN_NAME endpoint_number endpoint_value
1 TAB_01_03 B 3 10000
2 TAB_01_03 B 0 1
执行select count(1) from tab_01_03 t where b=1;
***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
Table: TAB_01_03 Alias: T
#Rows: 11116 #Blks: 20 AvgRowLen: 6.00
Index Stats::
Index: IDX_B Col#: 2
LVLS: 1 #LB: 22 #DK: 5 LB/K: 4.00 DB/K: 3.00 CLUF: 18.00
***************************************
SINGLE TABLE ACCESS PATH
Column (#2): B(NUMBER)
AvgLen: 3.00 NDV: 5 Nulls: 0 Density: 0.81764 Min: 1 Max: 10000
Histogram: HtBal #Bkts: 3 UncompBkts: 3 EndPtVals: 2
Using prorated density: 0.81764 of col #2 as selectivity of out-of-range value pred
Table: TAB_01_03 Alias: T
Card: Original: 11116 Rounded: 9089 Computed: 9088.91 Non Adjusted: 9088.91
Access Path: TableScan
Cost: 6.44 Resp: 6.44 Degree: 0
Cost_io: 6.00 Cost_cpu: 2587949
Resp_io: 6.00 Resp_cpu: 2587949
Access Path: index (index (FFS))
Index: IDX_B
resc_io: 7.00 resc_cpu: 2046392
ix_sel: 0.0000e+000 ix_sel_with_filters: 1
Access Path: index (FFS)
Cost: 7.35 Resp: 7.35 Degree: 1
Cost_io: 7.00 Cost_cpu: 2046392
Resp_io: 7.00 Resp_cpu: 2046392
Using prorated density: 0.81764 of col #2 as selectivity of out-of-range value pred
Access Path: index (AllEqRange)
Index: IDX_B
resc_io: 18.00 resc_cpu: 1946836
ix_sel: 0.81764 ix_sel_with_filters: 0.81764
Cost: 18.33 Resp: 18.33 Degree: 1
Best:: AccessPath: TableScan
Cost: 6.44 Degree: 1 Resp: 6.44 Card: 9088.91 Bytes: 0
***************************************
select count(1) from tab_01_03 t where b=1
============
Plan Table
============
---------------------------------------+-----------------------------------+
| Id | Operation | Name | Rows | Bytes | Cost | Time |
---------------------------------------+-----------------------------------+
| 0 | SELECT STATEMENT | | | | 6 | |
| 1 | SORT AGGREGATE | | 1 | 3 | | |
| 2 | TABLE ACCESS FULL | TAB_01_03| 9089 | 27K | 6 | 00:00:01 |
---------------------------------------+-----------------------------------+
Predicate Information:
----------------------
2 - filter("B"=1)
再一次执行
select count(1) from tab_01_03 t where b=10000;
***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
Table: TAB_01_03 Alias: T
#Rows: 11116 #Blks: 20 AvgRowLen: 6.00
Index Stats::
Index: IDX_B Col#: 2
LVLS: 1 #LB: 22 #DK: 5 LB/K: 4.00 DB/K: 3.00 CLUF: 18.00
***************************************
SINGLE TABLE ACCESS PATH
Column (#2): B(NUMBER)
AvgLen: 3.00 NDV: 5 Nulls: 0 Density: 0.81764 Min: 1 Max: 10000
Histogram: HtBal #Bkts: 3 UncompBkts: 3 EndPtVals: 2
Table: TAB_01_03 Alias: T
Card: Original: 11116 Rounded: 9263 Computed: 9263.33 Non Adjusted: 9263.33
Access Path: TableScan
Cost: 6.44 Resp: 6.44 Degree: 0
Cost_io: 6.00 Cost_cpu: 2587949
Resp_io: 6.00 Resp_cpu: 2587949
Access Path: index (index (FFS))
Index: IDX_B
resc_io: 7.00 resc_cpu: 2046392
ix_sel: 0.0000e+000 ix_sel_with_filters: 1
Access Path: index (FFS)
Cost: 7.35 Resp: 7.35 Degree: 1
Cost_io: 7.00 Cost_cpu: 2046392
Resp_io: 7.00 Resp_cpu: 2046392
Access Path: index (AllEqRange)
Index: IDX_B
resc_io: 19.00 resc_cpu: 1988957
ix_sel: 0.83333 ix_sel_with_filters: 0.83333
Cost: 19.34 Resp: 19.34 Degree: 1
Best:: AccessPath: TableScan
Cost: 6.44 Degree: 1 Resp: 6.44 Card: 9263.33 Bytes: 0
---Density 是怎么算的???
---Using prorated density: 0.81764 of col #2 as selectivity of out-of-range value pred
---这句话怎么理解???
四、无统计信息 无直方图
执行select count(1) from tab_01_03 t where b=10000;
***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
Table: TAB_01_03 Alias: T (NOT ANALYZED)
#Rows: 1634 #Blks: 20 AvgRowLen: 100.00
Index Stats::
Index: IDX_B Col#: 2 (NOT ANALYZED)
LVLS: 1 #LB: 25 #DK: 100 LB/K: 1.00 DB/K: 1.00 CLUF: 800.00
***************************************
SINGLE TABLE ACCESS PATH
*** 2010-07-04 11:00:14.585
** Performing dynamic sampling initial checks. **
Column (#2): B(NUMBER) NO STATISTICS (using defaults)
AvgLen: 22.00 NDV: 51 Nulls: 0 Density: 0.019584
** Dynamic sampling initial checks returning TRUE (level = 2).
** Dynamic sampling updated index stats.: IDX_B, blocks=27
** Dynamic sampling index access candidate : IDX_B
** Dynamic sampling updated table stats.: blocks=20
*** 2010-07-04 11:00:14.601
** Generated dynamic sampling query:
query text :
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled', 'false') NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),0), NVL(SUM(C2),0) FROM (SELECT /*+ IGNORE_WHERE_CLAUSE NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ 1 AS C1, CASE WHEN "T"."B"=10000 THEN 1 ELSE 0 END AS C2 FROM "TAB_01_03" "T") SAMPLESUB
*** 2010-07-04 11:00:14.616
** Executed dynamic sampling query:
level : 2
sample pct. : 100.000000
actual sample size : 11116
filtered sample card. : 10001
orig. card. : 1634
block cnt. table stat. : 20
block cnt. for sampling: 20
max. sample block cnt. : 64
sample block cnt. : 20
min. sel. est. : 0.01000000
** Using recursive dynamic sampling card. est. : 11116.000000
*** 2010-07-04 11:00:14.616
** Generated dynamic sampling query:
query text :
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS opt_param('parallel_execution_enabled', 'false') NO_PARALLEL(SAMPLESUB) NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),0), NVL(SUM(C2),0), NVL(SUM(C3),0) FROM (SELECT /*+ NO_PARALLEL("T") INDEX("T" IDX_B) NO_PARALLEL_INDEX("T") */ 1 AS C1, 1 AS C2, 1 AS C3 FROM "TAB_01_03" "T" WHERE "T"."B"=10000 AND ROWNUM <= 2500) SAMPLESUB
** Increasing dynamic sampling filtered
sample card. for predicate 0 from 2500 to 10001.
*** 2010-07-04 11:00:14.616
** Executed dynamic sampling query:
level : 2
sample pct. : 100.000000
actual sample size : 11116
filtered sample card. : 10001
filtered sample card. (index IDX_B): 2500
orig. card. : 11116
block cnt. table stat. : 20
block cnt. for sampling: 20
max. sample block cnt. : 4294967295
sample block cnt. : 20
min. sel. est. : 0.01000000
** Increasing dynamic sampling filtered
sample card. for predicate 1 from 2500 to 10001.
index IDX_B selectivity est.: 0.89969413
** Using dynamic sampling card. : 11116
** Dynamic sampling updated table card.
** Using single table dynamic sel. est. : 0.89969413
Table: TAB_01_03 Alias: T
Card: Original: 11116 Rounded: 10001 Computed: 10001.00 Non Adjusted: 10001.00
Access Path: TableScan
Cost: 6.44 Resp: 6.44 Degree: 0
Cost_io: 6.00 Cost_cpu: 2587949
Resp_io: 6.00 Resp_cpu: 2587949
Access Path: index (index (FFS))
Index: IDX_B
resc_io: 8.00 resc_cpu: 617279
ix_sel: 0.0000e+000 ix_sel_with_filters: 1
Access Path: index (FFS)
Cost: 8.11 Resp: 8.11 Degree: 1
Cost_io: 8.00 Cost_cpu: 617279
Resp_io: 8.00 Resp_cpu: 617279
Access Path: index (AllEqRange)
Index: IDX_B
resc_io: 25.00 resc_cpu: 628886
ix_sel: 0.89969 ix_sel_with_filters: 0.89969
Cost: 25.11 Resp: 25.11 Degree: 1
Best:: AccessPath: TableScan
Cost: 6.44 Degree: 1 Resp: 6.44 Card: 10001.00 Bytes: 0
无统计信息 无直方图是oracle使用动态抽样或内部默认值决定选择值。
重要参数optimizer_dynamic_sampling
SQL> show parameter optimizer
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
optimizer_dynamic_sampling integer 2
对于数据倾斜的列 适合做频率直方图 method_opt默认是for all columns size auto。
频率直方图把每个不同的value切割到各自的bucket中。
上面几个实验 始终涉及到选择性 selectivity
仅存在统计信息:selectivity=n/number of distinct value
存在统计信息和频率直方图 selectivity= 某个BUCKET中行数/总行数
存在统计信息和高度直方图 selectivity= ???
。。。。。
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/21993926/viewspace-667125/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/21993926/viewspace-667125/