1,Viewing Histograms
Column statistics may be stored as histograms. These histograms provide accurate estimates of the distribution of column data. Histograms provide improved selectivity estimates in the presence of data skew, resulting in optimal execution plans with nonuniform data distributions.
列的统计信息有可能包含柱状图.柱状图提供列数据的精确分布的情况.它在列数据倾斜的时提供更加的选择性评估,从而使optimizer能够选择出正确的执行计划.
Oracle uses two types of histograms for column statistics: height-balanced histograms and frequency histograms. The type of histogram is stored in the HISTOGRAM column of the *TAB_COL_STATISTICS views (USER and DBA). This column can have values of HEIGHT BALANCED, FREQUENCY, or NONE.
2,Height-Balanced Histograms
In a height-balanced histogram, the column values are divided into bands so that each band contains approximately the same number of rows. The useful information that the histogram provides is where in the range of values the endpoints fall.
Consider a column C with values between 1 and 100 and a histogram with 10 buckets. If the data in C is uniformly distributed, then the histogram looks similar to Figure 14-1, where the numbers are the endpoint values.
Figure 14-1 height-Balanced Histogram with Uniform Distribution
Description of "Figure 14-1 height-Balanced Histogram with Uniform Distribution"
The number of rows in each bucket is one tenth the total number of rows in the table. Four-tenths of the rows have values that are between 60 and 100 in this example of uniform distribution.
If the data is not uniformly distributed, then the histogram might look similar to Figure 14-2.
Figure 14-2 height-Balanced Histogram with Non-Uniform Distribution
Description of "Figure 14-2 height-Balanced Histogram with Non-Uniform Distribution"
In this case, most of the rows have the value 5 for the column. Only 1/10 of the rows have values between 60 and 100.
Height-balanced histograms can be viewed using the *TAB_HISTOGRAMS tables, as shown in Example 14-1.
3,Viewing Height-Balanced Histogram Statistics
BEGIN
DBMS_STATS.GATHER_table_STATS (OWNNAME => 'OE', TABNAME => 'INVENTORIES',
METHOD_OPT => 'FOR COLUMNS SIZE 10 quantity_on_hand');
END;
/
SELECT column_name, num_distinct, num_buckets, histogram
FROM USER_TAB_COL_STATISTICS
WHERE table_name = 'INVENTORIES' AND column_name = 'QUANTITY_ON_HAND';
COLUMN_NAME NUM_DISTINCT NUM_BUCKETS HISTOGRAM
------------------------------ ------------ ----------- ---------------
QUANTITY_ON_HAND 237 10 HEIGHT BALANCED
SELECT endpoint_number, endpoint_value
FROM USER_HISTOGRAMS
WHERE table_name = 'INVENTORIES' and column_name = 'QUANTITY_ON_HAND'
ORDER BY endpoint_number;
ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
0 0
1 27
2 42
3 57
4 74
5 98
6 123
7 149
8 175
9 202
10 353
In the query output, one row corresponds to one bucket in the histogram.
Demo
SQL> alter session set statistics_level=all;
Session altered.
SQL> create table his
2 as
3 select rownum id,'a'||rownum from dba_objects
where rownum<20001;
Table created.
SQL> create index his_idx_id on his(id);
Index created.
SQL> exec dbms_stats.gather_table_stats('u01','his',
method_opt=>'FOR ALL INDEXED COLUMNS size 254');
--用skewonly时,oracle不会收集histogram,因为oracle认为该列的数据分布均匀.
exec dbms_stats.gather_table_stats('u01','his',
method_opt=>'FOR ALL INDEXED COLUMNS size skewonly');
PL/SQL procedure successfully completed.
SQL> set lin 120
SQL> select * from his where id=10;
ID NAME
---------- -----------------------------------------
10 a10
SQL> select * from table(dbms_xplan.display_cursor
(null,null,'last allstats'));
PLAN_TABLE_OUTPUT
--------------------------------------------------
SQL_ID 75pauyhkmn80u, child number 0
-------------------------------------
select * from his where id=10
Plan hash value: 2080926353
--------------------------------------------------
| Id | Operation | Name |
--------------------------------------------------
| 1 | TABLE ACCESS BY INDEX ROWID| HIS |
|* 2 | INDEX RANGE SCAN | HIS_IDX_ID |--使用index正确
PLAN_TABLE_OUTPUT
--------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ID"=10)
18 rows selected.
SQL> update his set id=10 where id<19996;
19995 rows updated.
SQL> commit;
Commit complete.
SQL> exec dbms_stats.gather_table_stats('u01','his',
method_opt=>'FOR ALL INDEXED COLUMNS size 254');--柱状图统计信息已经修改
PL/SQL procedure successfully completed.
SQL> select * from his where id=10;
19995 rows selected.
SQL> select * from table(dbms_xplan.display_cursor
(null,null,'last allstats'));
PLAN_TABLE_OUTPUT
--------------------------------------------------
SQL_ID 75pauyhkmn80u, child number 0
-------------------------------------
select * from his where id=10
Plan hash value: 2080926353
PLAN_TABLE_OUTPUT
--------------------------------------------------
| Id | Operation | Name |
--------------------------------------------------
| 1 | TABLE ACCESS BY INDEX ROWID| HIS |
|* 2 | INDEX RANGE SCAN | HIS_IDX_ID |--execution plan不对
--------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ID"=10)
18 rows selected.
按道理说,如下3种情况都会导致execution plan失效:
a.alter system flush shared_pool;
b.对语句中的对象做个ddl ;
c.重新收集统计信息
然而通过above lab发现没有导致execution plan 失效,只能采取如下方式导致它失效,看optimizer是否可以做出正确的choose.
SQL> alter system flush shared_pool;
System altered.
SQL> select * from his where id=10;
19995 rows selected.
SQL> select * from table(dbms_xplan.display_cursor
(null,null,'last allstats'));
PLAN_TABLE_OUTPUT
------------------------------------------------
SQL_ID 75pauyhkmn80u, child number 0
-------------------------------------
select * from his where id=10
Plan hash value: 4154987155
PLAN_TABLE_OUTPUT
----------------------------------
| Id | Operation | Name |
----------------------------------
|* 1 | TABLE ACCESS FULL| HIS |---正确的选择
----------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("ID"=10)
17 rows selected.