位图索引是oracle中非常重要的一种索引形式。本文通过总结有关位图索引的资料,尝试回答如下几个问题:
1:什么是位图索引?
2:位图索引适合什么场景,不适合什么场景?
3:位图索引的性能如何?
位图索引,顾名思义,与“位”有关。大家都知道,计算机中的所有信息最终都是通过“位bit”来运算的, 二进制位运算在计算机中是非常高效的。每一个二进制位都可以取值0或者1,而取值的确切含义是由具体的上下文环境决定的。在oracle位图索引中,每一个二进制位代表了某一行中索引列的取值情况。例如,学生表中性别列的位图索引结构如下:
男:0101001101
女:1010110010
在上面的位图结构中,存储了10条学生记录的性别分布情况,以“男”性别为例,从左到右的第n个二进制位代表了第n条记录是否性别为男,如果二进制位为1,代表true即性别为男,0代表false即性别不为男。以此类推,从图中可以看出,第一条记录的性别为女,第二条记录的性别为男,...第九条记录的性别为女,第十条记录的性别为男。
大家都知道,在oracle中是根据rowid来定位记录的,因此,我们需要引入start rowid和end rowid,通过start rowid ,end rowid 和二进制位的偏移,我们就可以非常快速的计算出二进制位所代表的表记录rowid。位图索引的最终逻辑结构如下图:
CREATE or replace FUNCTION ind_spc_test(rn NUMBER) RETURN NUMBER
AS
v_j NUMBER;
v_dis NUMBER;
v_bm_sp NUMBER;
v_bt_sp NUMBER;
BEGIN
FOR i IN 1 .. 10LOOP
EXECUTE immediate 'truncate table t_easy1';
EXECUTE immediate 'truncate table t_easy2';
SELECT floor(rn/(11-i)) INTO v_j FROM dual;
FOR j IN 1 .. rn LOOP
INSERT INTO t_easy1 VALUES (mod(j,v_j));
INSERT INTO t_easy2 VALUES (mod(j,v_j));
END LOOP;
commit;
select count(distinct id) into v_j from t_easy1;
EXECUTE immediate 'analyze index i_easy1 COMPUTE STATISTICS';
SELECT lEAF_BLOCKS INTO v_bt_sp FROM user_indexes where index_name='I_EASY1';
EXECUTE immediate 'analyze index i_easy2 COMPUTE STATISTICS';
SELECT LEAF_BLOCKS INTO v_bm_sp FROM user_indexes where index_name='I_EASY2';
INSERT INTO bitmap_ind_space VALUES (v_j,v_bm_sp,v_bt_sp,rn );
COMMIT;
END LOOP;
RETURN 0;
END;
SQL> select * from bitmap_ind_space order by 1;
DISTINCT_VAL BITMAP_IND_BLKS BTREE_IND_BLKS ROW_NUM
------------ --------------- -------------- ----------
10000 139 300 100000
11111 79 335 100000
12500 89 285 100000
14285 103 220 100000
16666 120 257 100000
20000 146 310 100000
25000 183 293 100000
33333 246 262 100000
50000 371 296 100000
100000 408 200 100000
SQL> create table emp_normal(empno number(10), ename varchar2(30), sal number(10));
表已创建。
Begin
For i in 1..1000000
Loop
Insert into emp_normal
values(i, dbms_random.string('U',30), dbms_random.value(1000,7000));
If mod(i, 10000) = 0 then
Commit;
End if;
End loop;
10 End;
11 /
PL/SQL 过程已成功完成。
SQL> create table emp_random as select /* +append */ * from emp_normal order by dbms_random.random;
SQL> create bitmap index bm_normal on emp_normal(empno);
索引已创建。
SQL> analyze table emp_normal compute statistics for table for all indexes for all indexed columns;
表已分析。
SQL> select index_name,clustering_factor from user_indexes;
INDEX_NAME CLUSTERING_FACTOR
------------------------------ -----------------
BM_NORMAL 1000000
SQL> set autot traceonly
SQL> select * from emp_normal where empno=&empno;
输入 empno 的值: 1000
原值 1: select * from emp_normal where empno=&empno
新值 1: select * from emp_normal where empno=1000
执行计划
----------------------------------------------------------
Plan hash value: 1526426521
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 34 | 3 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID | EMP_NORMAL | 1 | 34 | 3 (0)| 00:00:01 |
| 2 | BITMAP CONVERSION TO ROWIDS| | | | | |
|* 3 | BITMAP INDEX SINGLE VALUE | BM_NORMAL | | | | |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("EMPNO"=1000)
统计信息
----------------------------------------------------------
1 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
702 bytes sent via SQL*Net to client
520 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL> select * from emp_normal where empno=&empno;
输入 empno 的值: 2398
原值 1: select * from emp_normal where empno=&empno
新值 1: select * from emp_normal where empno=2398
执行计划
----------------------------------------------------------
Plan hash value: 1526426521
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 34 | 3 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID | EMP_NORMAL | 1 | 34 | 3 (0)| 00:00:01 |
| 2 | BITMAP CONVERSION TO ROWIDS| | | | | |
|* 3 | BITMAP INDEX SINGLE VALUE | BM_NORMAL | | | | |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("EMPNO"=2398)
统计信息
----------------------------------------------------------
1 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
703 bytes sent via SQL*Net to client
520 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL> select * from emp_normal where empno=&empno;
输入 empno 的值: 8545
原值 1: select * from emp_normal where empno=&empno
新值 1: select * from emp_normal where empno=8545
执行计划
----------------------------------------------------------
Plan hash value: 1526426521
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 34 | 3 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID | EMP_NORMAL | 1 | 34 | 3 (0)| 00:00:01 |
| 2 | BITMAP CONVERSION TO ROWIDS| | | | | |
|* 3 | BITMAP INDEX SINGLE VALUE | BM_NORMAL | | | | |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("EMPNO"=8545)
统计信息
----------------------------------------------------------
1 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
703 bytes sent via SQL*Net to client
520 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL> select * from emp_normal where empno=&empno;
输入 empno 的值: 128444
原值 1: select * from emp_normal where empno=&empno
新值 1: select * from emp_normal where empno=128444
执行计划
----------------------------------------------------------
Plan hash value: 1526426521
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 34 | 3 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID | EMP_NORMAL | 1 | 34 | 3 (0)| 00:00:01 |
| 2 | BITMAP CONVERSION TO ROWIDS| | | | | |
|* 3 | BITMAP INDEX SINGLE VALUE | BM_NORMAL | | | | |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("EMPNO"=128444)
统计信息
----------------------------------------------------------
1 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
704 bytes sent via SQL*Net to client
520 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL> drop index bm_normal;
索引已删除。
SQL> create index bt_normal on emp_normal(empno);
索引已创建。
SQL> analyze table emp_normal compute statistics for table for all indexes for all indexed columns;
表已分析。
SQL> select index_name,clustering_factor from user_indexes;
INDEX_NAME CLUSTERING_FACTOR
------------------------------ -----------------
BT_NORMAL 6210
SYS_IL0000076897C00002$$
PK_EMP 1
PK_DEPT 1
SQL> set autot traceonly
SQL> select * from emp_normal where empno=&empno;
输入 empno 的值: 1000
原值 1: select * from emp_normal where empno=&empno
新值 1: select * from emp_normal where empno=1000
执行计划
----------------------------------------------------------
Plan hash value: 733975378
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 34 | 4 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| EMP_NORMAL | 1 | 34 | 4 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | BT_NORMAL | 1 | | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("EMPNO"=1000)
统计信息
----------------------------------------------------------
1 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
702 bytes sent via SQL*Net to client
520 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL> select * from emp_normal where empno=&empno;
输入 empno 的值: 128444
原值 1: select * from emp_normal where empno=&empno
新值 1: select * from emp_normal where empno=128444
执行计划
----------------------------------------------------------
Plan hash value: 733975378
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 34 | 4 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| EMP_NORMAL | 1 | 34 | 4 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | BT_NORMAL | 1 | | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("EMPNO"=128444)
统计信息
----------------------------------------------------------
1 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
704 bytes sent via SQL*Net to client
520 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL> select * from emp_normal where empno=&empno;
输入 empno 的值: 2398
原值 1: select * from emp_normal where empno=&empno
新值 1: select * from emp_normal where empno=2398
执行计划
----------------------------------------------------------
Plan hash value: 733975378
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 34 | 4 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| EMP_NORMAL | 1 | 34 | 4 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | BT_NORMAL | 1 | | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("EMPNO"=2398)
统计信息
----------------------------------------------------------
1 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
703 bytes sent via SQL*Net to client
520 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
总结如下:
BITMAP | EMPNO | B-TREE | ||
Consistent Reads | Physical Reads | Consistent Reads | Physical Reads | |
5 | 0 | 1000 | 5 | 0 |
5 | 0 | 2398 | 5 | 0 |
5 | 0 | 8545 | 5 | 0 |
5 | 0 | 98008 | 5 | 0 |
5 | 0 | 85342 | 5 | 0 |
5 | 0 | 128444 | 5 | 0 |
5 | 0 | 858 | 5 | 0 |
对emp_random表进行实验,得出的结果与之类似,这里不再獒述。从这里可以看出,在唯一列上的等值查询,位图索引与btree索引的效率相当。
下面,我们在针对范围查询来进行测试。
SQL> create bitmap index bm_random on emp_random(empno);
索引已创建。
SQL> analyze table emp_random compute statistics for table for all indexes for all columns;
表已分析。
SQL> select index_name,clustering_factor from user_indexes;
INDEX_NAME CLUSTERING_FACTOR
------------------------------ -----------------
BM_RANDOM 1000000
SQL> set autot traceonly
SQL> select * from emp_random where empno between &range1 and &range2;
输入 range1 的值: 1
输入 range2 的值: 2300
原值 1: select * from emp_random where empno between &range1 and &range2
新值 1: select * from emp_random where empno between 1 and 2300
已选择2300行。
执行计划
----------------------------------------------------------
Plan hash value: 811843605
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2299 | 85063 | 418 (1)| 00:00:06 |
| 1 | TABLE ACCESS BY INDEX ROWID | EMP_RANDOM | 2299 | 85063 | 418 (1)| 00:00:06 |
| 2 | BITMAP CONVERSION TO ROWIDS| | | | | |
|* 3 | BITMAP INDEX RANGE SCAN | BM_RANDOM | | | | |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("EMPNO">=1 AND "EMPNO"<=2300)
统计信息
----------------------------------------------------------
1 recursive calls
0 db block gets
2463 consistent gets
0 physical reads
0 redo size
130225 bytes sent via SQL*Net to client
2203 bytes received via SQL*Net from client
155 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2300 rows processed
SQL> select * from emp_random where empno between &range1 and &range2;
输入 range1 的值: 8
输入 range2 的值: 1980
原值 1: select * from emp_random where empno between &range1 and &range2
新值 1: select * from emp_random where empno between 8 and 1980
已选择1973行。
执行计划
----------------------------------------------------------
Plan hash value: 811843605
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1972 | 72964 | 366 (0)| 00:00:05 |
| 1 | TABLE ACCESS BY INDEX ROWID | EMP_RANDOM | 1972 | 72964 | 366 (0)| 00:00:05 |
| 2 | BITMAP CONVERSION TO ROWIDS| | | | | |
|* 3 | BITMAP INDEX RANGE SCAN | BM_RANDOM | | | | |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("EMPNO">=8 AND "EMPNO"<=1980)
统计信息
----------------------------------------------------------
1 recursive calls
0 db block gets
2114 consistent gets
0 physical reads
0 redo size
111758 bytes sent via SQL*Net to client
1961 bytes received via SQL*Net from client
133 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1973 rows processed
SQL> select * from emp_random where empno between &range1 and &range2;
输入 range1 的值: 28888
输入 range2 的值: 31850
原值 1: select * from emp_random where empno between &range1 and &range2
新值 1: select * from emp_random where empno between 28888 and 31850
已选择2963行。
执行计划
----------------------------------------------------------
Plan hash value: 811843605
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2962 | 107K| 513 (0)| 00:00:07 |
| 1 | TABLE ACCESS BY INDEX ROWID | EMP_RANDOM | 2962 | 107K| 513 (0)| 00:00:07 |
| 2 | BITMAP CONVERSION TO ROWIDS| | | | | |
|* 3 | BITMAP INDEX RANGE SCAN | BM_RANDOM | | | | |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("EMPNO">=28888 AND "EMPNO"<=31850)
统计信息
----------------------------------------------------------
1 recursive calls
0 db block gets
3172 consistent gets
0 physical reads
0 redo size
170625 bytes sent via SQL*Net to client
2687 bytes received via SQL*Net from client
199 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2963 rows processed
SQL> drop index bm_random;
索引已删除。
SQL> create index bt_random on emp_random(empno);
索引已创建。
SQL> analyze table emp_random compute statistics for table for all indexes for all columns;
表已分析。
SQL> set autot off
SQL> select index_name,clustering_factor from user_indexes;
INDEX_NAME CLUSTERING_FACTOR
------------------------------ -----------------
BT_RANDOM 999834
SQL> set autot traceonly
SQL> select * from emp_random where empno between &range1 and &range2;
输入 range1 的值: 1
输入 range2 的值: 2300
原值 1: select * from emp_random where empno between &range1 and &range2
新值 1: select * from emp_random where empno between 1 and 2300
已选择2300行。
执行计划
----------------------------------------------------------
Plan hash value: 731629521
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2299 | 85063 | 1735 (1)| 00:00:21 |
|* 1 | TABLE ACCESS FULL| EMP_RANDOM | 2299 | 85063 | 1735 (1)| 00:00:21 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("EMPNO"<=2300 AND "EMPNO">=1)
统计信息
----------------------------------------------------------
1 recursive calls
0 db block gets
6410 consistent gets
0 physical reads
0 redo size
121081 bytes sent via SQL*Net to client
2203 bytes received via SQL*Net from client
155 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2300 rows processed
SQL> select * from emp_random where empno between &range1 and &range2;
输入 range1 的值: 8
输入 range2 的值: 1980
原值 1: select * from emp_random where empno between &range1 and &range2
新值 1: select * from emp_random where empno between 8 and 1980
已选择1973行。
执行计划
----------------------------------------------------------
Plan hash value: 731629521
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1972 | 72964 | 1735 (1)| 00:00:21 |
|* 1 | TABLE ACCESS FULL| EMP_RANDOM | 1972 | 72964 | 1735 (1)| 00:00:21 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("EMPNO"<=1980 AND "EMPNO">=8)
统计信息
----------------------------------------------------------
1 recursive calls
0 db block gets
6388 consistent gets
0 physical reads
0 redo size
103922 bytes sent via SQL*Net to client
1961 bytes received via SQL*Net from client
133 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1973 rows processed
BITMAP | EMPNO (Range) | B-TREE | ||
Consistent Reads | Physical Reads | Consistent Reads | Physical Reads | |
2463 | 0 | 1-2300 | 6410 | 0 |
2114 | 0 | 8-1980 | 6388 | 0 |
2572 | 0 | 1850-4250 | 6418 | 0 |
3172 | 0 | 28888-31850 | 6456 | 0 |
2762 | 0 | 82900-85478 | 6431 | 0 |
7254 | 0 | 984888-1000000 | 7254 | 0 |
从这里可以看出,位图索引要优于btree索引,这是因为btree索引的cluster factor 较大,从而优化器选择了全表扫描。即便在emp_normal 表下,即clustering factor较小时,位图索引btree索引相当的。因此在distinct cardinality 较大的情况下,范围扫描的效率位图索引也是不逊色与btree索引。
总结如下:
位图索引的查询性能经常是优于btree索引的,即便在distinct cardinality较大的情况下
位图索引不适合与dml频繁的环境
位图索引适用于DSS系统
位图索引可以进行逻辑运算,多个索引和同时在查询语句中发挥作用,这是一个非常重要的地方