Statistics Gathering: Frequency and Strategy Guidelines (Doc ID 44961.1)

APPLIES TO:

Oracle Server - Enterprise Edition - Version 10.1.0.2 and later
Information in this document applies to any platform.

PURPOSE

 Provide guidelines for Gathering CBO statistics.

QUESTIONS AND ANSWERS

Recommendations for Gathering CBO Statistics

Summary

  • Use individual statistic gathering commands for more control
  • Gather statistics on tables with a 5% sample
  • Gather statistics on indexes with compute
  • Add histograms where column data is known to be skewed

Explanation of summary

The level to which objects statistics should be collected is very much data dependent. 
The goal is to read as little data as possible to achieve an accurate sample. Different sample sizes may be required to generate accurate enough figures to produce acceptable plans.

Research has indicated that a 5% sample is generally sufficient for most tables although if possible 100% is suggested. Gathering statistics on tables requires sorting to be done. Gathering statistics on indexes does not because the data is already sorted.  Often this means that a compute on an index will perform acceptably whereas a compute on a table will not.
Column statistics in the form of histograms are only appropriate for columns whose distribution deviates from the expected uniform distribution.

Gathering statistics detail

The following article is a collection of opinions. Different systems need different levels of statistical analysis due to differences in data. However, if these recommendations are used sensibly, they give good basic guidelines for gathering statistics of objects.

  • The reason for gathering statistics is to provide CBO with the best information possible to help it choose a 'good' execution plans.
  • The accuracy of the stats depends on the sample size.
  • Even given COMPUTED stats, it is possible that CBO will not always arrive at the BEST plan for a given SQL statement. This is because the optimizer inherently makes assumptions and has only limited information available.
  • Given a production system with predictable, known queries, the 'best' execution plan for each statement is not likely to vary over time - unless the application is unusual and uses data with wildy different characteristics from day to day.
  • Given the 'best' plan is unlikely to change, frequent gathering statistics has no benefit. It does incur costs though.
  • How frequently you gather statistics really depends on how frequently the objects involved change and also whether the statistics have become inaccurate.

    If the data involved is static in nature (ie lots of schema objects are not being dropped and recreated ) or if schema objects are replaced, but the new objects would create the same or similar statistics then more infrequent statistics gathering (maybe even a single gather) may be possible. 

    If schemas are more changeable, then the frequency of gathering may need to be increased.

    To determine the best sample size it is best to gather statistics using different sample sizes and look at the results. The statistics should be fairly consistent once a reasonable sample size has been used. Increasing the sample size beyond a given size is unlikely to improve the accuracy. You can see this  easily by analyzing 10 rows, 100 rows, 1000 rows etc.. At some point the results should start to look consistent.
  • To determine the best statistic gathering interval one should keep a history of the statistics from BEFORE and AFTER each collect. By keeping a history the user can check for varying statistics and adjust the sampling accordingly.  If the statistics remain reasonably constant then the statistic gathering activity may not be adding any value. Unfortunately, it is not possible to determine that there will be a problem without collecting the statistics.
  • When considering when to gather statistics, choose as quiet a period as possible. Although gathering will work, if you gather statistics on objects that are actively being updated and inserted into, you are much more likely to encounter contention than if the objects in question are quiet.
  • If the before / after stats do vary often then either the data profile is not predictable or the sample size is too small to be accurately reflecting the true nature of the data, or (unlikely) the data is too random, in which case the stats are of limited use anyway and one should think of ways of ensuring queries using the data will get at least a reasonable response.
  • As the CBO uses the stats as the basis of its cost calculations, gathering new statistics on a table MAY result in a different execution plan for some statements. This is expected behaviour and allows the CBO to adjust access paths if the data profile changes.
  • It is possible that a different execution plan may be worse than the original plan. The difference may be small or quite large. This is a non-negotiable fact. Given a set of base information CBO will choose a plan. Given slightly different base information it may choose a different plan. It is unlikely the change of plan will coincide exactly with the real point at which a change of plan would be beneficial.
  • For most production systems predictability is more important than absolute best performance. Hopefully from the above it is clear than gathering statistics can have a destabilizing effect.  This is not to say it should not be used, but to be aware of what may happen.
  • Most applications have several queries that form the heart of most transactions. These queries are critical in that any adverse change in execution plan could incur a high cost due to the number and frequency of usage of these statements.  

    It is a good idea to isolate such statements into a test-suite where sample queries can be used to guage if performance of the key statements has deteriorated badly due to a statistics collect. I.E.: after gathering statistics check these statements still respond in acceptable times.
  • It is recommended that users should store critical statistics to allow them to revert back to a working configuration in the event of a statistics change that significantly affects application performance. From 10g onwards the database retains statistics for a default period of 31 days. The following article explains how to store statistics: 

    Document 452011.1 * Restoring Table Statistics (Oracle 10G Onwards)
    Document 117203.1  How to Use DBMS_STATS to Move Statistics to a Different Database
     
  • If gathering statistics causes critical statements to perform badly you can revert to the pre-analyze stats by:
    • Restoring previous statistics (for up to 31 days afterwards (default)

      Document 117203.1  How to Use DBMS_STATS to Move Statistics to a Different Database
    • importing previously exported statistics
    • using database point=in-time recovery.
    • re-analyze with a larger sample size in an attempt to generate more accurate statistics  
    • Look at the bad plan/plans to see where the 'bad' chunk of cost has been introduced.
    • Use any available tuning options to correct the problem. For example:
      add hints to statements or views to correct the performance of problem queries, or selectively delete statistics. 
       
  • For any statement that does suffer a change between good and very bad plans there is usually some element of the cost which is  finely balanced and re-gathering may tip you between the plans. Any such statements can impact the performance of in a production system but there is no easy way to identify them. Generally, it is best to use plan stability features (or a hint) on the SQL wherever there could be a fine balance between 2 options once they have been identified.

    Document 1359841.1 Plan Stability Features (Including SPM) Start Point
     
  • In a DSS / warehouse environment, queries are generally NOT predictable so there is no stable environment / query set to upset by gathering statistics.
  • Although the above may sound quite dramatic. It is actually unusual for a plan to change wildly or for stats to be finely balanced. Unusual does NOT mean it never happens. It can happen. The likelyhood is very small.

REFERENCES

NOTE:117203.1  - How to Use DBMS_STATS to Move Statistics to a Different Database
NOTE:1359841.1  - Plan Stability Features (Including SPM) Start Point
NOTE:452011.1  - * Restoring Table Statistics (Oracle 10G Onwards)
NOTE:1369591.1  - Optimizer Statistics - Central Point

你可能感兴趣的:(Statistics Gathering: Frequency and Strategy Guidelines (Doc ID 44961.1))