2013-08-07 星期三 下午
-----------------位图索引--------------------------------------
当设计一个存放海量数据的数据库时,需要注意的是,几乎很少用主键,表都是事实表,
以存放事实数据为目的,在这些大量的数据中,数据冗余是不可避免的,考虑位图索引。
位图索引(bitmap index)――从oracle7开始就已经存在了,用来解决B数索引在查询的时候遇到的一些性能问题,
重复的数据太多导致查询性能低下。
在每一个位图索引的索引键值中,都会保存指向多行的指针,这样位图索引的键值就会比B数索引少很多。
创建位图索引的语法:
SQL> create table emp as select * from scott.emp;
Table created.
SQL> create bitmap index emp_bit_ind on emp(job);
Index created.
select * from user_indexes where index_name=upper('emp_bit_ind') --位图索引
查询位图索引的结构:
analyze index emp_bit_ind validate structure;
select * from index_stats
位图索引的结构
行值 1 2 3 4 5 6 7 8 9 10 11 12 13 14
--------------------------------------------
CLERK 1 0 0 0 0 0 0 0 0 0 1 1 0 1
SALESMAN 0 1 1 0 1 0 0 0 0 1 0 0 0 0
PRESIDENT 0 0 0 0 0 0 0 0 1 0 0 0 0 0
MANAGER 0 0 0 1 0 1 1 0 0 0 0 0 0 0
ANALYST 0 0 0 0 0 0 0 1 0 0 0 0 1 0
---------------------------------------------
创建的位图索引创建了5个键值,但是对应14行数据
创建B数索引的话,要14个索引键值,14个叶子行
从上面的结果可以看出,位图索引比B树索引节省空间,位图索引非产适合于作逻辑运算操作以及聚合运算的COUNT操作
创建一个比较大的表
SQL> create table t7 as select * from all_objects;
Table created.
SQL> insert into t7 select * from t7;
40935 rows created.
SQL> commit;
Commit complete.
SQL> select count(1) from t7;
COUNT(1)
----------
81870
SQL> insert into t7 select * from t7;
81870 rows created.
SQL> commit;
Commit complete.
SQL> select count(1) from t7;
COUNT(1)
----------
163740
SQL> create bitmap index t7_bit_ind on t7(object_type); --T7上创建位图索引
Index created.
SQL> create table t8 as select * from t7;
Table created.
SQL> create index t8_bit_ind on t8(object_type); --T8上创建B树索引
Index created.
SQL> exec dbms_stats.gather_table_stats(user,'t7',cascade=>true);
PL/SQL procedure successfully completed.
SQL> exec dbms_stats.gather_table_stats(user,'t8',cascade=>true);
PL/SQL procedure successfully completed.
查看两个索引占用的存储空间:
SQL> set linesize 10000
SQL> select segment_name,bytes from user_segments where segment_type='INDEX' and (segment_name like '%T7%' or segment_name like '%T8%');
SEGMENT_NAME BYTES
--------------------------------------------------------------------------------- ----------
T7_BIT_IND 131072
T8_BIT_IND 4194304
显然位图索引比B树索引占用的空间少很多。
SQL> set autotrace trace exp
SQL> select count(1) from t7 where object_type='TABLE';
Execution Plan
----------------------------------------------------------
Plan hash value: 277922936
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 10 | 1 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 10 | | |
| 2 | BITMAP CONVERSION COUNT | | 9115 | 91150 | 1 (0)| 00:00:01 |
|* 3 | BITMAP INDEX SINGLE VALUE| T7_BIT_IND | | | | |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("OBJECT_TYPE"='TABLE')
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
2 consistent gets
0 physical reads
0 redo size
412 bytes sent via SQL*Net to client
400 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL> select count(1) from t8 where object_type='TABLE';
Execution Plan
----------------------------------------------------------
Plan hash value: 368075426
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 10 | 28 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 10 | | |
|* 2 | INDEX RANGE SCAN| T8_BIT_IND | 9082 | 90820 | 28 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJECT_TYPE"='TABLE')
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4 consistent gets
1 physical reads
0 redo size
412 bytes sent via SQL*Net to client
400 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
在创建有位图的索引的表上以创建索引的字段作为where条件,作count运算,从执行计划来看,
oracle选择访问位图索引代价很低。
SQL> select count(1) from t7 where object_type='TABLE' or object_type='INDEX';
Execution Plan
----------------------------------------------------------
Plan hash value: 285551327
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 10 | 2 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 10 | | |
| 2 | INLIST ITERATOR | | | | | |
| 3 | BITMAP CONVERSION COUNT | | 18230 | 178K| 2 (0)| 00:00:01 |
|* 4 | BITMAP INDEX SINGLE VALUE| T7_BIT_IND | | | | |
-------------------------------------------------------------------------------------------
SORT AGGREGATE――排序后聚合操作
BITMAP CONVERSION COUNT――在位图索引上作count运算
BITMAP INDEX SINGLE VALUE――扫表位图索引的单值
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("OBJECT_TYPE"='INDEX' OR "OBJECT_TYPE"='TABLE')
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
412 bytes sent via SQL*Net to client
400 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL> select count(1) from t8 where object_type='TABLE' or object_type='INDEX';
Execution Plan
----------------------------------------------------------
Plan hash value: 2716926154
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 10 | 53 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 10 | | |
| 2 | INLIST ITERATOR | | | | | |
|* 3 | INDEX RANGE SCAN| T8_BIT_IND | 18164 | 177K| 53 (0)| 00:00:01 |
---------------------------------------------------------------------------------
INLIST ITERATOR――列表过滤器
相当于把这个SQL做了改写:
select count(1) from t8 where object_type in('TABLE','INDEX');
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("OBJECT_TYPE"='INDEX' OR "OBJECT_TYPE"='TABLE')
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
7 consistent gets
0 physical reads
0 redo size
412 bytes sent via SQL*Net to client
400 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
行值 1 2 3 4 5 6 7 8 9 10 11 12 13 14
-----------------------------------------------
TABLE 1 0 0 0 0 0 0 0 0 0 1 1 0 1
INDEX 0 1 1 0 1 0 0 0 0 1 0 0 0 0
-----------------------------------------------
OR运算 1 1 1 0 1 0 0 0 0 1 1 1 0 1
原理:由于位图索引的每个键值保存了一组对应的记录,当对不同的键值作and或者or操作的时候,
只需要对这些信息在各行的分布进行计算(实际就是0和1的位运算),得出结果。
位图索引对于重复数据进行and或者or操作,比B树索引性能更优,这是由位图索引的结构来决定的。
案例:
SQL> select count(1) from emp where job='CLERK' or JOB='MANAGER';
Execution Plan
----------------------------------------------------------
Plan hash value: 3902482824
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 6 | 1 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 6 | | |
| 2 | BITMAP CONVERSION COUNT | | 7 | 42 | 1 (0)| 00:00:01 |
|* 3 | BITMAP INDEX FAST FULL SCAN| EMP_BIT_IND | | | | |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("JOB"='CLERK' OR "JOB"='MANAGER')
Note
-----
- dynamic sampling used for this statement
Statistics
----------------------------------------------------------
5 recursive calls
0 db block gets
8 consistent gets
0 physical reads
0 redo size
411 bytes sent via SQL*Net to client
400 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
行值 1 2 3 4 5 6 7 8 9 10 11 12 13 14
-----------------------------------------------
CLERK 1 0 0 0 0 0 0 0 0 0 1 1 0 1
MANAGER 0 0 0 1 0 1 1 0 0 0 0 0 0 0
-------------------------------------------------
OR 1 0 0 1 0 1 1 0 0 0 1 1 0 1
SQL> conn hr/hr
Connected.
SQL> select count(1) from emp where job='CLERK' or JOB='MANAGER';
COUNT(1)
----------
7
行值 1 2 3 4 5 6 7 8 9 10 11 12 13 14
---------------------------------------------
CLERK 1 0 0 0 0 0 0 0 0 0 1 1 0 1
MANAGER 0 0 0 1 0 1 1 0 0 0 0 0 0 0
---------------------------------------------
AND 0 0 0 0 0 0 0 0 0 0 0 0 0 0
SQL> select count(1) from emp where job='CLERK' and JOB='MANAGER';--and操作是不符合逻辑的
COUNT(1)
----------
0
SQL> select count(1) from emp where job='CLERK' and JOB='MANAGER';
Execution Plan
----------------------------------------------------------
Plan hash value: 3353207192
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 6 | 0 (0)| |
| 1 | SORT AGGREGATE | | 1 | 6 | | |
|* 2 | FILTER | | | | | |
| 3 | BITMAP CONVERSION COUNT | | 4 | 24 | 1 (0)| 00:00:01 |
|* 4 | BITMAP INDEX FAST FULL SCAN| EMP_BIT_IND | | | | |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(NULL IS NOT NULL)
4 - filter("JOB"='CLERK')
****************************************
oracle在底层对SQL做了改写:
select count(1) from emp where NULL IS NOT NULL and job='CLERK'
SQL语句在ORACLE底层都要被修改的,修改目标是不改变查询结果的逻辑,而让SQL语句性能更优。
Note
-----
- dynamic sampling used for this statement
按照改写后的看PLAN。结果是一样的。
SQL> select count(1) from emp where NULL IS NOT NULL and job='CLERK';
Execution Plan
----------------------------------------------------------
Plan hash value: 3353207192
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 6 | 0 (0)| |
| 1 | SORT AGGREGATE | | 1 | 6 | | |
|* 2 | FILTER | | | | | |
| 3 | BITMAP CONVERSION COUNT | | 4 | 24 | 1 (0)| 00:00:01 |
|* 4 | BITMAP INDEX FAST FULL SCAN| EMP_BIT_IND | | | | |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(NULL IS NOT NULL)
4 - filter("JOB"='CLERK')
Note
-----
- dynamic sampling used for this statement
Statistics
----------------------------------------------------------
7 recursive calls
0 db block gets
6 consistent gets
0 physical reads
0 redo size
410 bytes sent via SQL*Net to client
400 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
------------------------------------------------
作查询操作,但是不做聚合运算。
SQL> exec dbms_stats.gather_table_stats(user,'t7',estimate_percent=>100,cascade=>true);
PL/SQL procedure successfully completed.
SQL> exec dbms_stats.gather_table_stats(user,'t8',estimate_percent=>100,cascade=>true);
PL/SQL procedure successfully completed.
SQL> select * from t7 where object_type='TABLE';
528 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1637879808
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 528 | 50160 | 114 (1)| 00:00:02 |
| 1 | TABLE ACCESS BY INDEX ROWID | T7 | 528 | 50160 | 114 (1)| 00:00:02 |
| 2 | BITMAP CONVERSION TO ROWIDS| | | | | |
|* 3 | BITMAP INDEX SINGLE VALUE | T7_BIT_IND | | | | |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("OBJECT_TYPE"='TABLE')
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
138 consistent gets
0 physical reads
0 redo size
30906 bytes sent via SQL*Net to client
785 bytes received via SQL*Net from client
37 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
528 rows processed
SQL> select * from t8 where object_type='TABLE';
528 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1051944969
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 528 | 50160 | 23 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T8 | 528 | 50160 | 23 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | T8_BIT_IND | 528 | | 4 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJECT_TYPE"='TABLE')
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
179 consistent gets
0 physical reads
0 redo size
30906 bytes sent via SQL*Net to client
785 bytes received via SQL*Net from client
37 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
528 rows processed
SQL> select /*+ index(t8 t8_bit_ind)*/ * from t8 where object_type='TABLE';
528 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1051944969
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 528 | 50160 | 23 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T8 | 528 | 50160 | 23 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | T8_BIT_IND | 528 | | 4 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJECT_TYPE"='TABLE')
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
179 consistent gets
0 physical reads
0 redo size
30906 bytes sent via SQL*Net to client
785 bytes received via SQL*Net from client
37 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
528 rows processed
如果不做聚合运算,还是B树索引的性能比较高。
结论:位图索引在有聚合操作的时候且键值重复率高的情况下,才能体现其优势,
主要用在OLAP系统上,OLTP是不用的。