在Oracle世界里,索引主要有B树索引,位图索引和全文索引等,其中最常见于OLTP系统的是B树索引,位图索引和全文索引则多用于OLAP系统中;索引就类似于一本书的目录,若只想了解某个章节,则在不需要通读全书,可以通过索引迅速的找到需要的内容;若需要通读全书,恐怕没有人会看一页文章内容,再回头看下目录这样阅读吧,这2次I/O,必然浪费时间和资源;B树索引比较适合值分布比较均匀的场合,因而普遍用于OLTP系统中;位图索引则适用于取值范围相对较小,且更改频率低的场合,相比B树索引,位图索引将占用更少的存储空间;全文索引则类似分词系统,一般用在CLOB字段或者TEXT字段上,全文索引会占用很大的储存空间,有可能索引的大小要远大于基表本身;索引会随着基表的更改而被oracle自动维护!下面就来简要的谈谈B树索引!
一:下列几种情况将不使用索引
1:使用不等于操作符
SQL> create table t1 as select * from dba_source; SQL> create index ind_t1 on t1(name); SQL> set autot trace exp stat Execution Plan -------------------------------------------------------------------------- Predicate Information (identified by operation id): 1 - filter("NAME"<>'STANDARD') Note Statistics |
2: 使用IS NULL或者IS NOT NULL
SQL> exec dbms_stats.gather_table_stats('HR','T1',cascade=>true); SQL> select * from t1 where name is null; Execution Plan -------------------------------------------------------------------------- Predicate Information (identified by operation id): 1 - filter("NAME" IS NULL) Statistics |
3:不使用基于函数的索引条件下,使用函数
SQL> select * from t1 where lower(name)='standard'; Execution Plan -------------------------------------------------------------------------- Predicate Information (identified by operation id): 1 - filter(LOWER("NAME")='standard') Statistics |
4:比较不匹配的数据类型
SQL> create table t2 (id varchar2(10),name varchar2(10)); SQL> begin SQL> create index ind_t2 on t2(id); SQL> exec dbms_stats.gather_table_stats('HR','T2',cascade=>true); SQL> select * from t2 where id=100; Execution Plan -------------------------------------------------------------------------- Predicate Information (identified by operation id): 1 - filter(TO_NUMBER("ID")=100) Statistics |
二:使用B树索引
1:新建2张表,并分别对name字段创建索引,收集表统计信息;t2表的name字段为倾斜型,t1表则相对比较均匀;
2:对t1表进行查询,可以看到,查询1使用索引ind_t1,查询2根据hint提示使用全表扫描,对比两次查询的CPU代价和一致性读,可以看出索引的优势!
SQL> set autot trace exp stat Execution Plan -------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 476 | 55692 | 18 (0)| 00:0 | 1 | TABLE ACCESS BY INDEX ROWID| T1 | 476 | 55692 | 18 (0)| 00:0 |* 2 | INDEX RANGE SCAN | IND_T1 | 476 | | 4 (0)| 00:0 -------------------------------------------------------------------------------- Predicate Information (identified by operation id): 2 - access("NAME"='STANDARD') Statistics SQL> select /*+full(t1)*/ * from t1 where name='STANDARD'; Execution Plan -------------------------------------------------------------------------- Predicate Information (identified by operation id): 1 - filter("NAME"='STANDARD') Statistics |
3:对t2表进行查询,可以看到,查询1使用索引ind_t2,查询2则使用全表扫描,因为name字段基于所以的值均为SALE,所以这个时候使用全表扫描反而更快,查询3使用hint提示使用ind_t2索引,将产生更大的开销,由此可见,B树索引适合于值分布比较均匀的场合,否则2次I/O将导致查询更慢!
SQL> set autot trace exp stat Execution Plan -------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 107 | 4 (0)| 00:0 | 1 | TABLE ACCESS BY INDEX ROWID| T2 | 1 | 107 | 4 (0)| 00:0 |* 2 | INDEX RANGE SCAN | IND_T2 | 1 | | 3 (0)| 00:0 -------------------------------------------------------------------------------- Predicate Information (identified by operation id): 2 - access("NAME"='T2') Statistics SQL> select * from t2 where name='SALE'; Execution Plan -------------------------------------------------------------------------- Predicate Information (identified by operation id): 1 - filter("NAME"='SALE') Statistics SQL> select /*+index(t2 ind_t2)*/ * from t2 where name='SALE'; Execution Plan -------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 291K| 29M| 6850 (1)| 00:0 | 1 | TABLE ACCESS BY INDEX ROWID| T2 | 291K| 29M| 6850 (1)| 00:0 |* 2 | INDEX RANGE SCAN | IND_T2 | 292K| | 662 (2)| 00:0 -------------------------------------------------------------------------------- Predicate Information (identified by operation id): 2 - access("NAME"='SALE') Statistics |
三:表和索引的压缩,在大数据量条件下,除了使用分区技术来裁剪I/O外,压缩也是个不错的选择,但压缩比较适合于更新不频繁的场景
1:查看T1表和IND_T1索引的DDL创建语句和存储占用情况
2:新建表t3和索引ind_t3,带压缩属性,对比表t1所占的存储,表为50M:37M,索引为9M:4M
3:分区表的本地索引压缩,需要先在对象级别上启用压缩
四:对象分析与执行计划,oracle 10g以后关于sql语句的执行,默认都是使用cbo,即基于代价的优化器,而不是基于规则的,这个代价则是有对象分析而来,也就是我们平常说的统计信息,下面来看下当统计信息未及时更新的情况下,sql执行计划的选择!
1:创建基表和索引,可以看到,当没有统计信息的时候,oracle将采取动态采样方式,也可以获得正确的执行计划
SQL> create table t4 as select 100 object_id,object_name from dba_objects; SQL> update t4 set object_id=1 where rownum=1; SQL> commit; SQL> create index ind_t4 on t4(object_id); SQL> set autot trace exp stat Execution Plan -------------------------------------------------------------------------- Predicate Information (identified by operation id): 1 - filter("OBJECT_ID"=100) Note Statistics SQL> select * from t4 where object_id=1; Execution Plan -------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 79 | 2 (0)| 00:0 | 1 | TABLE ACCESS BY INDEX ROWID| T4 | 1 | 79 | 2 (0)| 00:0 |* 2 | INDEX RANGE SCAN | IND_T4 | 1 | | 1 (0)| 00:0 -------------------------------------------------------------------------------- Predicate Information (identified by operation id): 2 - access("OBJECT_ID"=1) Note Statistics |
2:删除表中所有的数据后,收集统计信息,然后再插入原先一样的数据
3:这个时候,统计信息未及时更新,认为下面的查询语句选择走索引会更优,则会产生错误的执行计划,导致查询语句缓慢!在生产环境,大数据量情况下尤为明显!
SQL> set autot trace exp stat Execution Plan -------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 79 | 1 (0)| 00:0 | 1 | TABLE ACCESS BY INDEX ROWID| T4 | 1 | 79 | 1 (0)| 00:0 |* 2 | INDEX RANGE SCAN | IND_T4 | 1 | | 1 (0)| 00:0 -------------------------------------------------------------------------------- Predicate Information (identified by operation id): 2 - access("OBJECT_ID"=100) Statistics |