Gathering Statistics for the Cost Based Optimizer (Pre 10g) (Doc ID 114671.1)
Give the available methods for gathering statistics for the Cost Based Optimizer for Database Administrators.
For recommendations on later versions see:
The cost-based optimization approach uses statistics to estimate
the cost of each execution plan. You should gather statistics periodically
for objects where the statistics become stale over time because of changing
data volumes or changes in column values.
For more information about when to gather new statistics, see
Note:44961.1 Gathering statistics frequency and strategy guidelines
There are 2 ways to collect statistics on schema objects:
1. DBMS_STATS (Oracle 8.1.5 and higher)
1.1 GATHER_DATABASE_STATS Statistics for all objects in a database
1.2 GATHER_SCHEMA_STATS Statistics for all objects in a schema
1.3 GATHER_TABLE_STATS Table, column, and index statistics
1.4 GATHER_INDEX_STATS Index statistics
2. ANALYZE or PL/SQL package with Recursive ANALYZE statement (old fashion)
2.1 ANALYZE command
2.2 DBMS_UTILITY
2.3 DBMS_DDL
There are new enhancements to dbms_stats but not to analyze.
****************************************
1. DBMS_STATS package (8.1.5 and higher)
****************************************
- Oracle Corporation strongly recommends that you use the DBMS_STATS package
rather than ANALYZE to collect optimizer statistics.
- The DBMS_STATS package can gather statistics on indexes, tables, columns,
and partitions, as well as statistics on all schema objects in a schema
or database.
- It does not gather cluster statistics, but you can use DBMS_STATS to gather
statistics on the individual tables instead of the whole cluster.
- The statistics-gathering operations can run either serially or in parallel.
Whenever possible, DBMS_STATS calls a parallel query to gather statistics
with the specified degree of parallelism;
otherwise, it calls a serial query or the ANALYZE statement.
o For example ,With 8.1.x DBMS_STATS uses recursive ANALYZE to gather statistics on
- All Index Statistics
- All Histograms
o For example ,With 9.2.x DBMS_STATS uses recursive ANALYZE to gather statistics only on
* CLUSTER INDEX
* DOMAIN INDEX
* JOIN INDEX
- By default DBMS_STATS will generate statistics for the table and not it's
indexes (By default CASCADE => FALSE).
- Global statistics Note:236935.1
o For partitioned tables and indexes, DBMS_STATS can gather separate
statistics for each partition as well as global statistics for the entire
table or index.
o Similary, for composite partitioning DBMS_STATS can gather separate
statistics for subpartitions, partitions, and the entire table or index.
o Depending on the SQL statement being optimized, the optimizer may choose
to use either the partition (or subpartition) statistics or the global
statistics. Unless the query predicate narrows the query to a single
partition, the optimizer uses the global statistics.
Because most queries are not likely to be this restrictive,
it is most important to have accurate global statistics.
o Therefore, actually gathering global statistics with the DBMS_STATS package
is highly recommended, rather than calculating them with the ANALYZE statement.
- It can collect statistics only on the "stale" objects when monitoring is
enabled. See: Note:102334.1
- In 9i and above, it can also do
o Estimate statistics with automatically determined adequate sample size,
o Create histograms only on the columns they would be useful.
1.1 DBMS_STATS.GATHER_DATABASE_STATS
====================================
Can gather statistics on all the tables and indexes in a database.
The GATHER_DATABASE_STATS procedure - some parameters - :
- estimate_percent
o NULL means compute
o The valid range is [0.000001,100].
o DBMS_STATS.AUTO_SAMPLE_SIZE so Oracle determines the
best sample size for good statistics (Recommended value 9i and above).
- block_sample
o Only pertinent when doing an estimate statistics.
o Use or not random block sampling instead of random row sampling.
o Random block sampling is more efficient,
but if the data is not randomly distributed on disk,
then the sample values may be somewhat correlated.
- method_opt
o DEFAULT (prior to 10g 'FOR ALL COLUMNS SIZE 1', starting with 10g 'FOR ALL COLUMNS SIZE AUTO')
o FOR ALL [INDEXED | HIDDEN] COLUMNS [size_clause]
. INDEXED can be used when predicate columns are all indexed
. HIDDEN for columns in Function-Based Indexes
. ALL when all columns appear in predicates
o FOR COLUMNS [size clause] column|attribute [size_clause]
[,column|attribute [size_clause]...],
o size_clause := SIZE {integer | REPEAT | AUTO | SKEWONLY}
. integer Number of histogram buckets. Valid Range [1,254]
. REPEAT Collects histograms only on the columns that
already have histograms (9i and above).
. AUTO Oracle determines the columns to collect histograms
based on data distribution and the workload of the
columns (9i and above).
. SKEWONLY Oracle determines the columns to collect histograms
based on the data distribution of the columns (9i and above).
o Note:252361.1 DBMS_STATS always sets AVG_ROW_LEN=100 or left the old value
- degree
o Degree of parallelism.
o NULL means use the table default value
o DBMS_STATS.DEFAULT_DEGREE means value based on the init parameters (9i and above).
- granularity
o Only pertinent if the table is partitioned.
o DEFAULT: Gather global- and partition-level statistics.
o GLOBAL: Gather global statistics.
o PARTITION: Gather partition-level statistics.
o SUBPARTITION: Gather subpartition-level statistics.
o ALL: Gather all (subpartition, partition, and global)stats.
o Analyze table does not replace statistics
o Global statistics - DBMS_STATS versus ANALYZE
- cascade
o Default (CASCADE => FALSE). No Index Gathering.
o Gathers statistics on the indexes as well (TRUE).
o Index statistics gathering is not parallelized prior to 9iR2
o Equivalent to gather_index_stats procedure on each of the indexes in
the database in addition to gathering table and column statistics.
o Note:159374.1 DBMS_STATS.GATHER_SCHEMA_STATS don't analyze indexes
- options
o GATHER: Default. Gathers statistics on all objects.
o GATHER AUTO: Gathers all necessary statistics automatically.
Oracle implicitly determines which objects need new statistics,
and determines how to gather those statistics (9i and above).
o GATHER STALE: Gathers statistics on stale objects as determined by
looking at the *_tab_modifications views.
o GATHER EMPTY: Gathers statistics on objects which currently have no
statistics.
o Note:102334.1 How to Automate Change Based Statistic Gathering - Monitoring Table
o Note:228186.1 Differences between GATHER STALE and GATHER AUTO
- gather_sys
o Gathers statistics on the objects owned by the 'SYS' user (TRUE)
1.2 DBMS_STATS.GATHER SCHEMA_STATS
==================================
Can gather statistics on all the tables and indexes in a schema.
The GATHER SCHEMA_STATS procedure - some parameters - :
o ownname => Schema to analyze (NULL means current schema).
o See GATHER_DATABASE_STATS except GATHER_SYS
1.3 DBMS_STATS.GATHER_TABLE_STATS
=================================
Can gather statistics on a table and its indexes.
The GATHER_TABLE_STATS procedure - some parameters - :
o ownname => Schema to analyze (NULL means current schema).
o tabname => Name of table.
o partname => Name of partition.
o See DBMS_STATS.GATHER SCHEMA_STATS
1.4 DBMS_STATS.GATHER_INDEX_STATS
=================================
Can gather statistics on an index.
The GATHER_INDEX_STATS procedure - some parameters - :
o ownname => Schema to analyze (NULL means current schema).
o indname => Name of Index.
o partname => Name of partition.
o See DBMS_STATS.GATHER SCHEMA_STATS
***********
2. ANALYZE
***********
===================
2.1 ANALYZE Command
===================
- Old fashion
- Oracle Corporation strongly recommends that you use the DBMS_STATS package
rather than ANALYZE to collect optimizer statistics.
- The ANALYZE command gathers statistics in a serial manner.
- It can be used with an ESTIMATE or COMPUTE statistics.
- If you specify ANALYZE TABLE ESTIMATE STATISTICS without a sample size,
Oracle will only gather statistics based on 1064 rows.
When using the ANALYZE command, you should always specify a sample size so that
the default does not take place.
(i.e. analyze table test estimate statistics sample size 20 percent;)
- The COMPUTE statistics option will guarantee that the optimizer has the best
statistics available in order to determine an execution path for a given query.
(i.e. analyze table test compute statistics;)
- ANALYZE collects only partition level statistics and derives the entire
table level statistics by aggregation.
- By default the ANALYZE TABLE command will generate statistics for the table
and all it's indexes provided that the FOR clause is not used.
========================
2.2 DBMS_UTILITY Package
========================
The DBMS_UTILITY package provides 2 procedures to assist in the gathering of
statistics. It generates ANALYZE command for different objects.
2.2.1. ANALYZE_SCHEMA
=====================
Gathers statistics on all the tables, clusters and indexes in a schema.
The ANALYZE_SCHEMA procedure accepts 5 arguments:
- schema => the schema to be analyzed
- method => ESTIMATE, COMPUTE or DELETE.
If ESTIMATE, then either
. estimate_rows or
. estimate_percent must be non-zero.
- estimate_rows => Number of rows to estimate.
- estimate_percent => Percentage of rows to estimate.
If estimate_rows is specified,
then this parameter is ignored.
- method_opt => [ FOR TABLE ] [ FOR ALL [INDEXED] COLUMNS] [SIZE n]
[ FOR ALL INDEXES ]
For more information on the ANALYZE_SCHEMA procedure Note:67615.1
For more information on the ANALYZE_SCHEMA procedure Note:1011835.102
2.2.2. ANALYZE_DATABASE
=======================
Gathers statistics on all the tables, clusters and indexes in a database.
The ANALYZE_DATABASE procedure accepts 4 arguments:
- method => ESTIMATE, COMPUTE or DELETE.
If ESTIMATE, then either
. estimate_rows or
. estimate_percent must be non-zero.
- estimate_rows => Number of rows to estimate.
- estimate_percent => Percentage of rows to estimate.
If estimate_rows is specified,
then this parameter is ignored.
- method_opt => [ FOR TABLE ] [ FOR ALL [INDEXED] COLUMNS] [SIZE n]
[ FOR ALL INDEXES ]
For more information on the ANALYZE_DATABASE procedure Note:67616.1
====================
2.3 DBMS_DDL Package
====================
The DBMS_DDL package provides one procedure which gathers also statistics:
2.3.1 ANALYZE_OBJECT
====================
Equivalent to SQL
"ANALYZE
TABLE|CLUSTER|INDEX [.]
[] STATISTICS
[SAMPLE [ROWS|PERCENT]]"
For more information on the ANALYZE_OBJECT procedure Note:77283.1