Overview
There are two major innovations in optimizer statistic collection in the Oracle Database 11g release: pending statistics and extended statistics. In addition, the Statistics Preferences feature has been enhanced so you can easily change statistics collection settings that are different from the database default settings for specific objects.
Statistics Preferences
Although automatic statistics gathering takes the burden of collecting statistics off the DBA, it isn’t perfect. Often, you have to manually collect statistics or provide non- default options for subsets of database objects for which the default statistics collection options aren’t appropriate. For example, you’d have to specify your own sample size for tables with a heavily skewed data distribution, instead of letting the database automatically determine the sample size by using the AUTO_SAMPLE_SIZE value for the estimate_percent attribute. The Statistics Preference feature enables automatic statistics gathering to function better, by allowing you to easily customize statistics collection settings for specific objects that need special treatment. Under the Statistics Preferences feature, when you execute any of the GATHER_*_STATS procedures or the database runs the automatic Optimizer Statistics Gathering task, you can override the default behavior of the procedure and the task at the object or schema level.
You can view the current settings for statistics preferences by querying the DBA_TAB_STAT_PREFS view, which has the following structure:
SQL> desc dba_tab_stat_prefs
Name Null? Type
----------------------------------------- -------- ----------------------------
OWNER NOT NULL VARCHAR2(30)
TABLE_NAME NOT NULL VARCHAR2(30)
PREFERENCE_NAME VARCHAR2(30)
PREFERENCE_VALUE VARCHAR2(1000)
You can set preferences at the table, schema, database, and global level. A preference set at the database level applies to all tables in the database, whereas the global preferences apply to all tables for which you don’t set any preferences. Preferences you set at a lower granularity level override preferences at a higher level. That is, the preferences are in decreasing order of priority in this list: table level, schema level, database level, global level. A table level preference setting, for example, overrides a database level preference setting.
You could set preferences such as estimate_percent, degree, and method_opt in Oracle Database 10g. In Oracle Database 11g, you can set the following three new options:
- publish: Determines whether the database should make the statistics it collects public by storing them in the data dictionary. This is called publishing the statistics, and it was the only option when the database gathered statistics for any object in Oracle Database 10g. In Oracle Database 11g, if you choose not to automatically make the new statistics public, but rather wait until you confirm that the new statistics are conducive to better performance, the database treats the statistics as pending statistics. You can set the values of TRUE or FALSE for the publish preference.
- stale_percent: Lets you specify the threshold level for classifying an object’s statistics as stale. The parameter uses a percentage of the rows that were modified since the database collected statistics. If, for example, the default is 10 percent for a table, you can change it to 20 percent if you want.
- incremental: Lets the database incrementally collect global statistics on partitioned tables. The two possible values for this parameter are TRUE and FALSE.
Deferred Statistics Publishing
By default, the database automatically allows the optimizer to immediately make use of the statistics that it collects. In other words, by default, once the statistics gathering is complete for the database, table, or schema, the database automatically publishes the new optimizer statistics into the dictionary tables. Oracle Database 11g introduces the concept of pending statistics, wherein you have the option to save new statistics as pending until you validate those statistics. You have the option now of publishing only those statistics that you consider are satisfactory, but not all statistics. This means that as a DBA, you can test the new statistics before publishing them for use by the optimizer. This also means that you can have two types of statistics in your database, based on their publication status: current statistics (published statistics) and pending statistics. Current or published statistics are for public use by the optimizer, and pending statistics are private statistics that you may or may not choose to make public.
In addition to permitting testing by the DBA before allowing the optimizer to use the new statistics, the concept of pending statistics provides another significant benefit. In the prior release of Oracle Database, the database could end up with inconsistent statistics when a table’s statistics were published before the statistics for its index or partition. This could occur because the statistics gathering job failed midway through its execution. In Oracle Database 11g, the entire statistics gathering job is treated as one atomic transaction. The database publishes the statistics for all of a schema’s objects at the same time. Thus, you can ensure that the statistics viewed by the optimizer are always consistent. If a statistics gathering job fails, you can resume it with the RESUME_GATHER_STATS procedure and publish the entire schema’s statistics at a single point in time after you’ve verified and tested the statistics. Determining the Status of the Statistics The DBMS_STATS package’s GET_PREFS function tells you whether the database will automatically publish optimizer statistics or not. The GET_PREFS function is new in Oracle Database 11g. The function returns the default values of various preferences, including estimate_percent, stale_percent, and others. The preference that is of interest to us with regard to publishing statistics is the preference named publish. Here’s the query to determine whether new statistics will be published automatically or not:
SQL> select dbms_stats.get_prefs('PUBLISH') publish from dual;
PUBLISH
--------
TRUE
The query returns a value of TRUE, meaning the database will automatically publish the new statistics once the statistics gathering process completes. If the query returns a value of FALSE, it means that the database will keep the new statistics in the pending status until you decide to publish them. By default, the database publishes all statistics automatically, so the default value for the previous query is TRUE. You can determine the publishing mode for a single table by executing the get_prefs function with the table parameters, as shown in the example here:
SQL> select dbms_stats.get_prefs('PUBLISH','stats','test_table') from dual;
The database stores the published statistics in the DBA_TAB_STATS and the DBA_IND_STATS views. The database stores the pending statistics in the DBA_TAB_PENDING_STATS and the DBA_IND_PENDING_STATS views.
Making Pending Statistics Public
Once you decide to keep some statistics in the pending status, you can test them against a workload to see how the unpublished statistics affect performance. If the statistics seem like they are helping performance, you can publish the statistics. If not, you can drop the statistics. In order to make pending statistics available to the optimizer, set the new initialization parameter optimizer_use_pending_ statistics to TRUE. The default value of this parameter is FALSE, meaning the optimizer doesn’t use pending statistics by default, as shown here:
SQL> show parameter optimizer_use_pending_statistics
NAME TYPE VALUE
-------------------------------- ------- ------
optimizer_use_pending_statistics boolean FALSE
In order for the session to switch to using the new pending statistics, you issue the following alter session statement:
SQL> alter session set optimizer_use_pending_statistics=TRUE;
Once you execute the alter session command shown here, when the optimizer compiles a SQL statement that refers to objects with pending statistics, it will make use of the new pending statistics.
You can change the status of all pending statistics in the database to current statistics, by executing the PUBLISH_PENDING_STATS procedure, as shown here:
SQL> exec dbms_stats.publish_pending_stats (NULL,NULL);
You can also publish the pending statistics for only a single table in the following manner:
SQL> exec dbms_stats.publish_pending_stats('HR','EMPLOYEES');
If, after testing a workload that includes a table with pending statistics, you decide you’re better off with the older statistics, delete the pending statistics by executing the DELETE_PENDING_STATS procedure, as shown here:
SQL> exec dbms_stats.delete_pending_stats('HR','EMPLOYEES');
The DELETE_PENDING_STATS procedure helps you delete any pending statistics that you haven’t published. You can also test the new pending statistics against a workload in a test database by exporting the pending statistics to another database with the EXPORT_PENDING_STATS procedure
PROCEDURE EXPORT_PENDING_STATS
参数名称 类型 输入/输出默认值?
------------------------------ ----------------------- ------ --------
OWNNAME VARCHAR2 IN DEFAULT
TABNAME VARCHAR2 IN
STATTAB VARCHAR2 IN
STATID VARCHAR2 IN DEFAULT
STATOWN VARCHAR2 IN DEFAULT
The EXPORT_PENDING_STATS procedure exports any statistics that the database has gathered and kept as pending statistics.
Partitioned Tables and Incremental Statistics
For a partitioned table, the optimizer maintains statistics both at the global level for the entire table as well as partition-level statistics for each partition. However, in most types of partitioned tables, the data remains the same in most of the older partitions, and DML changes are made only to the data in the new or more recent partitions. In Oracle Database 11g, the database collects statistics only for those partitions that show a significant change in data. The default value of the threshold for significant change is 10 percent of the rows in a partition.
In previous versions, aggregated stats were used to reduce the overhead of gathering stats on partitioned objects. This worked fine for the number of rows in the table and the high/low values for columns, but was terrible at calculating the number of distinct values for the columns.In Oracle Database 11g, the database maintains global statistics incrementally, by scanning only those partitions that have undergone significant changes and using the old statistics for all partitions that remain unchanged since the last statistics collection job.In Oracle 11g now keeps an abbreviated record of the high/low column values, called a synopsis, and stores them in the WRI$_OPTSTAT_SYNOPSIS_HEAD$ and WRI$_OPTSTAT_SYNOPSIS$ views. This allows the new incremental stats to estimate the number of distinct values much more accurately. The end result is that you can now collect global statistics much faster on large partitioned tables because the database doesn’t have to scan the entire table to collect the statistics.
Note:The incremental statistics feature for partitioned tables doesn’t incrementally maintain histograms.If you want the database to update global table statistics by scanning only the changed partitions instead if the entire table, you must satisfy the following conditions:
- Set the incremental value for the table to TRUE.
- Se the publish value for the table to TRUE.
- Specify the auto_sample_size value for the estimate_percent attribute and the auto value for the granularity attribute when executing the GATHER_TABLE_STATS procedure to collect the statistics.
EXEC DBMS_STATS.set_table_prefs('MY_SCHEMA', 'MY_TABLE', 'INCREMENTAL', 'TRUE');
-- Resetting to defaults values if you've changed the global/database preferences.
EXEC DBMS_STATS.set_table_prefs('MY_SCHEMA', 'MY_TABLE', 'GRANULARITY', 'AUTO');
EXEC DBMS_STATS.set_table_prefs('MY_SCHEMA', 'MY_TABLE', 'ESTIMATE_PERCENT', DBMS_STATS.AUTO_SAMPLE_SIZE);
The stats will now be gathered incrementally for the partitioned table by issuing the basic gather command.-- Using default preferences.
EXEC DBMS_STATS.gather_table_stats('MY_SCHEMA', 'MY_TABLE');
-- Overriding preferences.
EXEC DBMS_STATS.gather_table_stats('MY_SCHEMA', 'MY_TABLE', granularity => 'AUTO', estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE);
New Sampling Technique
Selectivity of the data in a table is of critical importance when the optimizer is figuring out an execution plan. The optimizer normally uses the number of distinct values in a column to figure out the selectivity of a predicate using a column. In Oracle Database 10g, DBAs often had a tough time meddling with the sample size factor, alternately trying to lower it if the statistic gathering job took too long to finish, and raising the sample size if the job finished quickly but was yielding bad results. Oracle Database 11g provides the best of both the worlds, by providing a row- sampling technique that uses a small sample such as 1 or 5 percent of the data but provides results that are as accurate as those from a full table scan. Simply let the estimate_percent option remain at its default value of auto_sample_ size. Oracle recommends this option when collecting statistics.
EXEC DBMS_STATS.gather_table_stats('MYSCHEMA', 'BIG_TABLE', estimate_percent => DBMS_STATS.auto_sample_size);
Using the AUTO_SAMPLE_SIZE constant in previous versions got a bit of a bad reputation. The selected sample size was sometimes inappropriate, making the resulting statistics questionable. In Oracle 11g, the AUTO_SAMPLE_SIZE constant is the preferred (and default) sample size as the mechanism for determining the actual sample size has been improved. In addition, the statistics estimate based on the auto sampling are near to 100% accurate and much faster to gather than in previous versions, as described here.
EXEC DBMS_STATS.gather_table_stats('MYSCHEMA', 'BIG_TABLE', estimate_percent => DBMS_STATS.auto_sample_size);
参 考至:http://www.oracle-base.com/articles/11g/statistics-collection- enhancements-11gr1.php 《McGraw.Hill.OCP.Oracle.Database.11g.New.Features.for.Administrators.Exam.Guide.Apr.2008》
本文原创,转载请注明出处、作者
如有错误,欢迎指正