ETL系列:一种针对oracle分区表的统计信息收集策略

************************************************* ** 转发请注明原文,尊重原创 ** 原文来自: blog.csdn.net/clark_xu 徐长亮的专栏 *************************************************

oracle分区表有一维分区表和组合分区表;

组合分区呢在10g 中有两种:range-hash,range-list

11g 在组合分区功能这块有所增强,又推出了range-range,list-range,list-list,list-hash

 

统计信息对于oracle 11g的执行计划非常重要;

当单表数据量变化超过10%的时候,表的统计信息就会失效;

 

对于分区表来说:每个分区,子分区都有自己的统计信息;当单个分区(或者子分区)的数据量变化超过10%的时候,分区的统计信息也会失效;就会影响表的执行计划;查看分区表统计信息失效的语句为:

 

------------------------------------------------------------------------------
--子分区统计信息失效
select distinct owner,table_name,partition_name,subpartition_name
  from dba_tab_statistics
 where (last_analyzed is null or stale_stats = 'YES')
   and owner = 'DW'
   and partition_name is not null
   and subpartition_name is not null;
------------------------------------------------------------------------------
--表的统计信息失效
select distinct owner,table_name,partition_name
  from dba_tab_statistics
 where (last_analyzed is null or stale_stats = 'YES')
   and owner = 'DW'
   and partition_name is null
   and subpartition_name is null;
------------------------------------------------------------------------------
--分区统计信息失效 
select distinct owner,table_name,partition_name
  from dba_tab_statistics
 where (last_analyzed is null or stale_stats = 'YES')
   and owner = 'DW'
   and partition_name is not null
   AND subpartition_name is  null;


 

针对分区表,希望有这样一种收集统计信息的策略:只针对统计失效的某几个分区(或者子分区)进行统计信息收集;

替代一般的策略:只要有一个分区统计信息失效,就针对所有的分区表,子分区做统计信息收集;

后一种策略,简单,但是浪费了IO资源;

针对第一种策略,我写了如下的存储过程;

create or replace procedure P_gather_table_stats(avc_schema in varchar2 --输入需要收集统计信息的用户
                                                 ) is
  --01. stale_table_global处理不是分区表的统计信息失效
  cursor stale_table_global is
    select /*+ unnest */
    distinct owner, table_name
      from dba_tab_statistics
     where (last_analyzed is null or stale_stats = 'YES')
       and owner = avc_schema
          --排除partition_name,subpartition_name
       and partition_name is null
       and subpartition_name is null;

  --02. stale_table_part处理一维分区表的统计信息失效
  cursor stale_table_part is
    select /*+ unnest */
    distinct owner, table_name, partition_name
      from dba_tab_statistics
     where (last_analyzed is null or stale_stats = 'YES')
       and owner = avc_schema
          --确定partition_name,排除subpartition_name
       and partition_name is not null
       and subpartition_name is null;

  --03.stale_table_subpart处理二维分区表的统计信息失效
  cursor stale_table_subpart is
    select distinct owner, table_name, partition_name
      from dba_tab_statistics
     where (last_analyzed is null or stale_stats = 'YES')
       and owner = avc_schema
       and partition_name is not null
       and subpartition_name is not null;

begin

  --01.处理不是分区表的统计信息失效
  --SELECT * FROM USER_TAB_MODIFICATIONS来展示统计结果,因为信息不是实时刷新到数据字典,所以
  dbms_stats.flush_database_monitoring_info;
  for stale in stale_table_global loop
    dbms_stats.gather_table_stats(ownname          => stale.owner,
                                  tabname          => stale.table_name,
                                  estimate_percent => dbms_stats.auto_sample_size,
                                  --for all columns size repeat替换for all indexed columns
                                  method_opt  => 'for all indexed columns',
                                  degree      => 8,
                                  granularity => 'GLOBAL',
                                  cascade     => true,
                                  --force - gather statistics of table even if it is locked.
                                  force => true);
  end loop;

  --02.处理一维分区表的统计信息失效
  dbms_stats.flush_database_monitoring_info;
  for stale_part in stale_table_part loop
    dbms_stats.gather_table_stats(ownname          => stale_part.owner,
                                  tabname          => stale_part.table_name,
                                  partname         => stale_part.partition_name,
                                  estimate_percent => dbms_stats.auto_sample_size,
                                  --for all columns size repeat替换for all indexed columns
                                  method_opt  => 'for all indexed columns',
                                  degree      => 8,
                                  granularity => 'PARTITION',
                                  cascade     => true,
                                  --force - gather statistics of table even if it is locked.
                                  force => true);
  end loop;

  --03.处理二维分区表的统计信息失效
  dbms_stats.flush_database_monitoring_info;
  for stale_subpart in stale_table_subpart loop
    dbms_stats.gather_table_stats(ownname          => stale_subpart.owner,
                                  tabname          => stale_subpart.table_name,
                                  partname         => stale_subpart.partition_name,
                                  estimate_percent => dbms_stats.auto_sample_size,
                                  --for all columns size repeat替换for all indexed columns
                                  method_opt  => 'for all indexed columns',
                                  degree      => 8,
                                  granularity => 'SUBPARTITION',
                                  cascade     => true,
                                  --force - gather statistics of table even if it is locked.
                                  force => true);
  end loop;
  --异常处理,简化版
exception
    when others then
      dbms_output.put_line(substr(sqlerrm, 1, 80);
     
end;


 

参数说明:(avc_schema in varchar2 --输入需要收集统计信息的用户


我写的这个脚本,在实际的生产数据库,每天都执行,所需时间,一个小时以内;

 

注:在oracle 11g中,从oracle 表的级别看,两个因素影响工作效率:统计信息,高水位线问题;

下次分析高水位线表的处理;

你可能感兴趣的:(分区表统计信息收集)