How to Gather Optimizer Statistics on 11g (Doc ID 749227.1)

In this Document

  Goal
  Solution
  Quick Recreate Recommendation
  Important Notes Regarding the Gathering of Optimizer Statistics
  Gathering Object statistics
  Use a large enough sample size
  Gather statistics on all objects
  Collect Column Statistics/Histograms for Skewed Data Distributions
  Gather Global Statistics for Partitioned Objects
  Gather System Statistics
  Upgrading to 11g from an earlier version
  Default Settings
  SAMPLE STATISTIC GATHERING COMMANDS
  Gathering statistics an individual table
  Gathering statistics for all objects in a schema
  Gathering statistics for all objects in the database:
  References

APPLIES TO:

Oracle Database - Personal Edition - Version 11.1.0.6 to 11.2.0.3 [Release 11.1 to 11.2]
Oracle Database - Enterprise Edition - Version 11.1.0.6 to 11.2.0.3 [Release 11.1 to 11.2]
Oracle Database - Standard Edition - Version 11.1.0.6 to 11.2.0.3 [Release 11.1 to 11.2]
Oracle EBS Applications Performance - Version 12.1.3 to 12.1.3 [Release 12.1]
Information in this document applies to any platform.

GOAL

This Document outline the recommended method to gather a standard set of optimizer statistics for use by the Cost Based Optimizer under Oracle 11g. For other versions see:

Document 1226841.1 How To: Gather Statistics for the Cost Based Optimizer

 

NOTE: There is a presentation entitled "Best Practices for Managing Optimizer Statistics" available in :

Document 1380043.1 Selected Performance Related Seminars from Oracle Openworld

Which also addresses and provides specific tips on this subject.

SOLUTION

Quick Recreate Recommendation

To achieve a quick delete and recreate of the statistics on an individual table and it's indexes (adding column statistics for any skewed columns) and following the recommendations in this article use:

exec dbms_stats.delete_table_stats(ownname=>'user_name',-
  tabname=>'table_name',cascade_indexes=>true);

exec dbms_stats.gather_table_stats(ownname=>'user_name',-
   tabname=>'table_name',-
   estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE,-
   cascade=>true,-
   method_opt=>'for all columns size AUTO');


For explanation of these recommendations, see below. For more usage examples see the end of this article. 
Note that, from 10gR2 statistics can be restored using:

Document 452011.1 * Restoring table statistics in 10G onwards

Important Notes Regarding the Gathering of Optimizer Statistics

  • These recommendations apply to the majority of databases.
  • The recommendations aim to generate statistics with as much statistical accuracy as possible. To this  end 100% sample sizes are suggested since any reduction in sample size is always a concession to accuracy. It is acknowledged that such 100% samples are potentially time consuming and consideration needs to be made to fit the statistics gathering activities within the existing maintenance window.
  • Where possible, we recommend using preferences to standardise collection across different objects. Rather than specifying individual settings for each table individually, set defaults at the Database/Schema/Table level centrally and just override these where necessary for individual collections. Setting preferences also ensures that Automatic Statistics Collection uses the same settings. For information on how to set up default parameter values for statistics gathering see: 
    Document 1493227.1 How to Change Default Parameters for Gathering Statistics in Oracle 11g
    Although the default parameter settings are appropriate for most situations, if you find settings other than the default settings that work better for you, then consider adding these as preferences to meet the requirements of your system.
  • Gathering new optimizer statistics should maintain or improve existing execution plans, but it is possible that some queries performance may degrade. Note that from 10gR1 previous copies of statistics are maintained by default for the last 30 days and can be restored in the case of problems. See:
    Document 452011.1 * Restoring table statistics in 10G onwards
  • Gathering new optimizer statistics may invalidate cursors in the shared pool so it is prudent to restrict all gathering operations execution to periods of low activity in the database, such as the scheduled maintenance windows.
  • Apart from object statistics, we also recommend that statistics should be gathered on dictionary objects. See:
    Document 457926.1 How to Gather Statistics on SYS Objects and 'Fixed' Objects?
  • For very large systems, the gathering of statistics can be a very time consuming and resource intensive activity.In this environment sample sizes need to be carefully controlled to ensure that gathering completes within acceptable timescale and resource constraints and within the maintenance window. For guidance on this topic See:
Document 44961.1 Statistics Gathering: Frequency and Strategy Guidelines

In these environments, it is also recommended to utilise change based statistics gathering to avoid re-gathering information unnecessarily. The procedure for automatic statistics collection has changed in 11g when compared to 10g. Please see:
Document 237901.1 Gathering Schema or Database Statistics Automatically - Examples
Document 756734.1  11g: Scheduler Maintenance Tasks or Autotasks (Doc ID 756734.1)
Document 743507.1 Why Has the GATHER_STATS_JOB been removed in 11g? (Doc ID 743507.1)

Gathering Object statistics

The Cost Based Optimizer (CBO) uses statistics to determine the execution plan for a particular query.  Potentially, with reduced sample sizes, sampling could produce different statistics due to chance groupings of data that may be the result of differing loading methods etc. 

On 11g  it is recommended to Gather statistics using scheduled statistics gathering scripts. In most cases the default scripts provide an adequate level of sampling taking into account the following recommendations:

  • Use a large enough sample size

    On 11g support suggests using the default DBMS_STATS.AUTO_SAMPLE_SIZE for ESTIMATE_PERCENT. This will generate estimate sample size of 100% for the table (if it is possible for this to fit within the maintenance window), even if that means that statistics are gathered on a reduced frequency. If this 100% sample is not feasible, then try using at least an estimate of 30%, however since 11g uses a hashing algorithm to compute the statistic, performance should be acceptable in most cases.
    Generally, the accuracy of the statistics overall outweighs the day to day changes in most applications. See below for notes regarding earlier versions and this setting.
  • Gather statistics on all objects

    Ensure all objects (tables and indexes) have stats gathered. An easy way to achieve this is to use the CASCADE parameter.
  • Collect Column Statistics/Histograms for Skewed Data Distributions

    Ensure that any columns with skewed data distribution have histograms collected, and at sufficient resolution, using the METHOD_OPT parameter. Generally, Support recommends using the default column statistics setting of "AUTO", which means that DBMS_STATS will decide which columns to add histogram to where it believes that they may help to produce a better plan.  "Adding a histogram only if it is known to be needed" is a conservative and more plan-stable approach rather than collecting column statistics on all columns. 

    Document 390249.1  How To Quickly Add/Remove Column Statistics (Histograms) For A Column
     

    Note: If column data is very skewed and a non-100% sample is used (for example with AUTO) then it is possible that some column data could be omitted. If this occurs and a query references a value for a missing bucket then it is unlikely to chance upon accurate statistics and may deliver inaccurate or erratically performing plans. In these cases a 100% sample is the only way to guarantee accuracy and if this not possible, you might find that removing column statistics (ie using just the high and low value) to give a consistent selectivity gives more consistent results.

    In a similar manner, if statistics are not completely up to date, then the presence of Histograms can cause trouble when parsing values are out of range, or between values for "frequency" histograms (for example, if a new value is added or if the volume of rows for a particular value is changed significantly). In these circumstances the optimizer has to make (potentially inaccurate) guesses and, on occasion, cause poor plans. As ever, testing different values with the application will yield the best results.

    Remember that there are limitations in the capabilities of column statistics in that only 254 buckets can be stored. If you have an extremely large number of disctinct values, and skewed data relating to a number of them, then columns statistics may still be inaccurate in some cases. See:

    Document 212809.1 Limitations of the Oracle Cost Based Optimizer

    Additionally, it only makes sense to collect column statistics (histograms) if your application has the facility to use them. Specifically, if you use bind variables in the application but do not peek the values (for example with _OPTIM_PEEK_USER_BINDS = FALSE ), then the optimizer will not have any value information to lookup column data from and will be unable to utilize column statistic information to improve cardinality estimates.

    In earlier versions the default setting for the METHOD_OPT parameter was "FOR ALL COLUMNS SIZE 1" which would collect only a high and a low value and effectively meant that there were no detailed column statistics. It is known that the effect of a histogram is adverse to the generation of a better plan in some cases, users moving between versions may initially wish to set this parameter to its pre-upgrade release value, and later adjust to the post-upgrade release default value to give more stability for the initial upgrade. See: 

    Document 465787.1 How to: Manage CBO Statistics During an Upgrade from 10g or 9i into 11g


  • Gather Global Statistics for Partitioned Objects

    If partitions are in use, gather global statistics if possible due to time constraints. Global stats are very important but gathering is often avoided due to the sizes involved and length of time to required. If 100% samples are not possible then support would recommend going for a minimum of 1%. Gathering with small sample sizes (e.g. 0.001, 0.0001, 0.00001 etc. ) can be very effective but equally, a large proportion of the data will not be examined which could prove decisive to the optimizer's plan choices. Note that the available range for the ESTIMATE_PERCENT parameter is a very flexible [0.000001 -> 100] which can use very small sample sizes suitable for huge partitioned tables. Testing will reveal the most suitable settings for each system.

    See:
    Document 236935.1 Global statistics - An Explanation

    11g Also provides functionality to collect global statistics incrementally. See:

    Oracle Database Performance Tuning Guide
    11g Release 1 (11.1)
    Part Number B28274-02
    Chapter 13 Managing Optimizer Statistics
    Section 13.3.1.3 Statistics on Partitioned Objects
  • Gather System Statistics

    Gather system statistics to reflect the CPU loading of the system and to improve the accuracy of the CBO's estimates by providing the CBO with CPU cost estimates in addition to the normal I/O cost estimates. See:
    Document 470316.1 Using Actual System Statistics (Collected CPU and IO information
    Document 149560.1 Collect and Display System Statistics (CPU and IO) for CBO usage
    Document 153761.1 Scaling the System to Improve CBO optimizer
  • Upgrading to 11g from an earlier version

    For instances where you are upgrading to 11g from an earlier version, see the following document for advice on how to manage Statistics:

    Document 465787.1 How to: Manage CBO Statistics During an Upgrade from 10g or 9i into 11g/div>

Default Settings

Note that the defaults for statistics gathering on different versions of Oracle are not necessarily the same, for example:

  • ESTIMATE_PERCENT: defaults:
    •  9i : 100%
    • 10g : DBMS_STATS.AUTO_SAMPLE_SIZE (using very small estimate percentage)
    • 11g : DBMS_STATS.AUTO_SAMPLE_SIZE (using larger estimate percentage - 100%)
  • METHOD_OPT: defaults:
    • 9i : "FOR ALL COLUMNS SIZE 1" effectively no detailed column statistics.
    • 10g and 11g : "FOR ALL COLUMNS SIZE AUTO" - This setting means that DBMS_STATS decides which columns to add histogram to where it believes that they may help to produce a better plan. 

In 11g, using auto size for ESTIMATE_PERCENT defaults to 100% and therefore is as accurate as possible for the table itself.  In prior versions a 100% sample may have been impossible due to time collection constraints, however 11g implements a new hashing algorithm to compute the statistics rather than sorting (in 9i and 10g the "slow" part was typically the sorting) which significantly improves collection time and resource usage. Note that the column statistics are automatically decided and so a more variable sample may apply here.

You can modify the default settings using the procedures in the following article:

Document 1493227.1 How to Change Default Parameters for Gathering Statistics in Oracle 11g 

SAMPLE STATISTIC GATHERING COMMANDS

Gathering statistics an individual table

>exec dbms_stats.gather_table_stats(  -
       ownname => '  Schema_name ', -
       tabname => '  Table_name  ', -
       estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE,  -
       cascade => TRUE,  -
       method_opt => 'FOR ALL COLUMNS SIZE AUTO' );

NOTE: For a more cautious approach as outlined in the text above and where column statistics are known not to be beneficial, Replace:

method_opt => 'FOR ALL COLUMNS SIZE AUTO'

with

method_opt => 'FOR ALL COLUMNS SIZE 1'

N.B. replace '  Schema_name ' and ' Table_name ' with the name of the schema 
and table  to gather statistics for respectively.

Gathering statistics for all objects in a schema

exec dbms_stats.gather_schema_stats( -
 ownname => '  Schema_name ', -
cascade => TRUE, -
method_opt => 'FOR ALL COLUMNS SIZE AUTO' );
N.B. replace '  Schema_name ' with the name of the schema to gather statistics for.

Gathering statistics for all objects in the database:

exec dbms_stats.gather_database_stats( -
cascade => TRUE, -
method_opt => 'FOR ALL COLUMNS SIZE AUTO' );

你可能感兴趣的:(How to Gather Optimizer Statistics on 11g (Doc ID 749227.1))