直方图统计导致错误的执行计划

今天下午一哥们遇到个case.他说如下SQL语句

SELECT WORKITEMID
  FROM WFWIPARTICIPANT
 WHERE PARTICIPANT IN ('771', '99999', '41', '146', '李锦');

用DBMS_STATS收集统计信息之后会走全表扫描,而用

analyze table WFWIPARTICIPANT  compute statistics ---收集统计信息会走索引

analyze table WFWIPARTICIPANT  delete statistics    ----删除统计信息也会走索引,Oracle采用动态采样

表结构如下:

SQL> desc WFWIPARTICIPANT;
Name            Type          Nullable Default Comments
--------------- ------------- -------- ------- --------
WIPARTICID      NUMBER                                 
WORKITEMID      NUMBER        Y                        
PARTICIPANTTYPE VARCHAR2(20)  Y                        
PARTICIPANT     VARCHAR2(256) Y                        
PARTICIPANT2    VARCHAR2(64)  Y                        
WORKITEMSTATE   NUMBER(2)     Y                        
PARTIINTYPE     VARCHAR2(20)  Y                        
EXTEND1         VARCHAR2(64)  Y     

并且在列partintype 上面有个索引。

我叫那位哥们把表用exp导出,然后我自己导入到我电脑测试一下,测试过程:

首先用DBMS_STATS收集统计信息:

SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'TSSA',TABNAME=>'WFWIPARTICIPANT',ESTIMATE_PERCENT=>100,CASCADE=>TRUE);

PL/SQL procedure successfully completed

执行该SQL

SQL> SELECT WORKITEMID
  2    FROM WFWIPARTICIPANT
  3   WHERE PARTICIPANT IN ('771', '99999', '41', '146', '李锦');

已选择60行。


执行计划
----------------------------------------------------------
Plan hash value: 1708170390

-------------------------------------------------------------------------------------
| Id  | Operation         | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                 |   530 |  7420 |   103   (5)| 00:00:02 |
|*  1 |  TABLE ACCESS FULL| WFWIPARTICIPANT |   530 |  7420 |   103   (5)| 00:00:02 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("PARTICIPANT"='146' OR "PARTICIPANT"='41' OR
              "PARTICIPANT"='771' OR "PARTICIPANT"='99999' OR "PARTICIPANT"='李锦')

发现确实是走了全表扫描,于是改用ANALYZE 收集统计信息

SQL> analyze table WFWIPARTICIPANT  delete statistics;

表已分析。

SQL> analyze table WFWIPARTICIPANT  compute statistics;

表已分析。

SQL> SELECT WORKITEMID
  2    FROM WFWIPARTICIPANT
  3   WHERE PARTICIPANT IN ('771', '99999', '41', '146', '李锦');

已选择60行。


执行计划
----------------------------------------------------------
Plan hash value: 1217134846

---------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                    |    71 |   852 |    64   (0)| 00:00:01 |
|   1 |  INLIST ITERATOR             |                    |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| WFWIPARTICIPANT    |    71 |   852 |    64   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | WF_IDX_PARTICIPANT |    71 |       |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("PARTICIPANT"='146' OR "PARTICIPANT"='41' OR "PARTICIPANT"='771' OR
              "PARTICIPANT"='99999' OR "PARTICIPANT"='李锦')
用ANALYZE命令分析之后,发现确实走了索引扫描

解决问题的思路:遇到这类问题,首先应该查询索引的选择率:

SQL> select a.owner,a.index_name,a.index_type,partitioned,b.num_rows,b.distinct_keys,b.num_rows/b.distinct_keys avg_row_per_key
  2  ,b.distinct_keys/b.num_rows SELECTIVITY,b.last_analyzed,b.stale_stats from dba_indexes a,dba_ind_statistics b
  3   where a.owner=b.owner and a.index_name=b.index_name and a.index_name='WF_IDX_PARTICIPANT';

OWNER INDEX_NAME           INDEX_TYPE PARTITIONED   NUM_ROWS DISTINCT_KEYS AVG_ROW_PER_KEY SELECTIVITY LAST_ANALYZED STALE_STATS
----- -------------------- ---------- ----------- ---------- ------------- --------------- ----------- ------------- -----------
TSSA  WF_IDX_PARTICIPANT   NORMAL     NO               57820          4073 14.195924380063 0.070442753 2010-4-15 17: NO

注意观察,表一共有57820列,但是索引的列上面只有4073个不同的值,也就是说索引选择性为7%,很显然,这列数据分布不均衡。于是猜测该问题和直方图有关。现在查看列上面有没有直方图:

SQL>  select owner,table_name,column_name,num_distinct,histogram,num_buckets from dba_tab_col_statistics
  2  where table_name='WFWIPARTICIPANT' and column_name='PARTICIPANT';

OWNER TABLE_NAME                     COLUMN_NAME                    NUM_DISTINCT HISTOGRAM       NUM_BUCKETS
----- ------------------------------ ------------------------------ ------------ --------------- -----------
TSSA  WFWIPARTICIPANT                PARTICIPANT                            4073 NONE                      1

NUM_BUCKETS为1表示该列没有直方图,恩,这里用ANALYZE命令收集的统计信息里面没有直方图信息,于是改用

DBMS_STATS收集统计信息,看看是否有直方图统计信息:

SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'TSSA',TABNAME=>'WFWIPARTICIPANT',ESTIMATE_PERCENT=>100,CASCADE=>TRUE);

PL/SQL procedure successfully completed
SQL>  select owner,table_name,column_name,num_distinct,histogram,num_buckets from dba_tab_col_statistics
  2  where table_name='WFWIPARTICIPANT' and column_name='PARTICIPANT';

OWNER TABLE_NAME                     COLUMN_NAME                    NUM_DISTINCT HISTOGRAM       NUM_BUCKETS
----- ------------------------------ ------------------------------ ------------ --------------- -----------
TSSA  WFWIPARTICIPANT                PARTICIPANT                            4073 HEIGHT BALANCED         254

注意观察,NUM_BUCKETS=254,Oracle自动的对该列收集了直方图统计信息,于是怀疑直方图的存在影响了执行计划

,现在我删掉直方图的统计信息:

SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'TSSA',TABNAME=>'WFWIPARTICIPANT',ESTIMATE_PERCENT=>100,DEGREE=>16,method_opt=>'for columns size 1 PARTICIPANT',CASCADE=>TRUE);

PL/SQL procedure successfully completed
SQL> select owner,table_name,column_name,num_distinct,histogram,num_buckets from dba_tab_col_statistics
  2  where table_name='WFWIPARTICIPANT' and column_name='PARTICIPANT';

OWNER TABLE_NAME                     COLUMN_NAME                    NUM_DISTINCT HISTOGRAM       NUM_BUCKETS
----- ------------------------------ ------------------------------ ------------ --------------- -----------
TSSA  WFWIPARTICIPANT                PARTICIPANT                            4073 NONE                      1

再次运行该查询语句

SQL> SELECT WORKITEMID
  2    FROM WFWIPARTICIPANT
  3   WHERE PARTICIPANT IN ('771', '99999', '41', '146', '李锦');

已选择60行。


执行计划
----------------------------------------------------------
Plan hash value: 1217134846

---------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                    |    71 |   994 |    64   (0)| 00:00:01 |
|   1 |  INLIST ITERATOR             |                    |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| WFWIPARTICIPANT    |    71 |   994 |    64   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | WF_IDX_PARTICIPANT |    71 |       |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("PARTICIPANT"='146' OR "PARTICIPANT"='41' OR "PARTICIPANT"='771' OR
              "PARTICIPANT"='99999' OR "PARTICIPANT"='李锦')

当删掉直通图统计信息之后,优化器选择了我们期望的访问路径

这个案例提醒我们,对表收集统计信息的时候,要写好参数,另外,我们在对访问路径做优化的时候,首先应该查看的就是索引的选择率,索引的类型,以及列上面关于直方图的统计信息,有了直方图的统计信息并不总是会给我们带来好处,当然也不总是会对我们带来坏处,具体问题具体对待。另外一个值得注意的就是,关于统计信息收集的方式,一定要写好参数,如果Oracle自动去收集了直方图统计信息,而我们不知道,这样对于性能诊断会带来麻烦的。

 

你可能感兴趣的:(oracle,sql,table,iterator,Access,statistics)