数据库性能问题:当表字段重复记录过多,索引可能失去意义

今天遇到一个DB2的问题:两条同样的SQL语句,只是where条件指定的字段的值不一样,结果访问计划中,一条SQL走了索引,另一条没有,类似如下:

SQL1: select id, name from t1 where id = 3000  and .... ->使用了索引

SQL2:select id, name from t1 where id = 10001 and ....->没有使用索引

于是研究了一下,发现原因是id = 10001的记录太多,导致DB2认为走索引并非最优执行计划。重现问题的过程如下:

$ for i in {1..10000}; do echo "$i,'aaa'" >> 1.txt; done
$ for i in {1..10000}; do echo "10001,'bbb'" >> 1.txt; done
$ db2 "create table t1(id int, name char(20))"
$ db2 "load from 1.txt of del insert into t1 nonrecoverable"
$ db2 "create index idx1 on t1(id)"
$ db2 "runstats on table t1 and indexes all"
$ db2expln -d dba -g -statement "select id, name from t1 where id = 3000" -terminal

....

Statement:
  
  select id, name 
  from t1 
  where id =3000


Section Code Page = 1208

Estimated Cost = 13.542770
Estimated Cardinality = 1.000000

Access Table Name = INST105.T1  ID = 2,4
|  Index Scan:  Name = INST105.IDX1  ID = 1
|  |  Regular Index (Not Clustered)
|  |  Index Columns:
|  |  |  1: ID (Ascending)
|  #Columns = 1
|  Skip Inserted Rows
|  Avoid Locking Committed Data
|  Currently Committed for Cursor Stability
|  Evaluate Predicates Before Locking for Key
|  #Key Columns = 1
|  |  Start Key: Inclusive Value
|  |  |  1: 3000 
|  |  Stop Key: Inclusive Value
|  |  |  1: 3000 
|  Data Prefetch: Sequential(1), Readahead
|  Index Prefetch: Sequential(1), Readahead
|  Lock Intents
|  |  Table: Intent Share
|  |  Row  : Next Key Share
|  Sargable Predicate(s)
|  |  Return Data to Application
|  |  |  #Columns = 2
Return Data Completion

End of section


Optimizer Plan:

        Rows   
      Operator 
        (ID)   
        Cost   
              
         1    
      RETURN  
       ( 1)   
      13.5428 
        |     
         1    
       FETCH  
       ( 2)   
      13.5428 
     /       \
    1      20000  
 IXSCAN   Table:  
  ( 3)    INST105 
 6.77555  T1      
   |     
  10001  
 Index:  
 INST105 
 IDX1    
 
$ db2expln -d dba -g -statement "select id, name from t1 where id = 10001" -terminal

....

Statement:
  
  select id, name 
  from t1 
  where id =10001


Section Code Page = 1208

Estimated Cost = 168.612518
Estimated Cardinality = 10000.000000

Access Table Name = INST105.T1  ID = 2,4
|  #Columns = 1
|  Skip Inserted Rows
|  Avoid Locking Committed Data
|  Currently Committed for Cursor Stability
|  May participate in Scan Sharing structures
|  Scan may start anywhere and wrap, for completion
|  Fast scan, for purposes of scan sharing management
|  Scan can be throttled in scan sharing management
|  Relation Scan
|  |  Prefetch: Eligible
|  Lock Intents
|  |  Table: Intent Share
|  |  Row  : Next Key Share
|  Sargable Predicate(s)
|  |  #Predicates = 1
|  |  Return Data to Application
|  |  |  #Columns = 2
Return Data Completion

End of section


Optimizer Plan:

   Rows   
 Operator 
   (ID)   
   Cost   
         
  10000  
 RETURN  
  ( 1)   
 168.613 
   |     
  10000  
 TBSCAN  
  ( 2)   
 168.613 
   |     
  20000  
 Table:  
 INST105 
 T1      

测试环境为LINUX,建了一张表,共20000条记录,前10000条记录的ID字段是不同的。后面10000条记录ID字段都为10001,所以当where条件为ID=10001时,有一半的记录都符合条件。这时候DB2认为索引效率可能比全表扫要低,就走了表扫。

你可能感兴趣的:(DB2)