最近一周在复习索引相关的东西,除了回顾concept,还在MOS上看到了一篇比较好的文档。分享给大家。
文档编号:[ID 30405.1]
This article is only concerned with B*tree indexes which are currently the most commonly used. The theory of B*tree indexes is beyond the scope of this article; for more information refer to computer science texts dealing with data structures.
这篇文档只描述关于当前最常用的b*tree索引。b*tree索引的原理已经超过了本文档的范围。更多的信息可以去查看计算机的数据结构原理。
Format of Index Blocks
~~~~~~~~~~~~~~~~~~~~~~
在b*tree索引中,一共有两种索引块,一种是branch block(分支块),还有leaf block(叶块),一种是高level,一种低level的(低level,在索引底部)。branch block包含定位低等级的索引块(可能是branch block或者leaf block)的pointer 。leaf block包含每一个索引数据值和相应的rowid(用来定位真正的row)。
以下是一个索引块的分布情况:
~~~~~~~~~~~~~~~~~~~~~
When a B*tree index is created using the CREATE INDEX statement, the parameter PCTFREE can be specified. PCTFREE specifies the percentage of space to leave free for future updates and insertions to an index block. A value of 0 reserves no space for future inserts and updates. It allows the entire data area of the
block to be filled when the index is created. If PCTFREE is not specified, it defaults to 10. This value reserves 10% of each block for updates to existing key values, and inserts of new key values.
当一个btree 索引使用create index语句创建的时候,可以设置pctfree 参数。pctfree指定index block中为了未来更新或者新增索引数据时预留的存储空间百分比。如果设置为0,表示不会预留存储空间。它允许index block中的data area 所有的空间都填满 索引条目 。如果pctfree 没有被指定,那么它的默认值为10。它意味着每个index block中都会预留10%的存储空间,为未来的插入和更新预留。
Thus PCTFREE is only relevant at initial index creation. It causes optimal splitting of the B*-tree in preparation for subsequent growth. The idea is to do as much splitting as possible during initial creation and avoid having to pay the penalty later during insertion into the table. This is what a high PCTFREE setting on an index gives you. However, if your inserted keys are monotonically increasing (say a date/time field) a PCTFREE=0 is best. Only the rightmost index leaf block will be inserted into, so there's no point leaving room in the other leaf blocks at creation time.
所以pctfree只与创建索引的时候相关。它会影响b tree索引 分裂(因为后面的数据增长)的效率。比较好的办法是让索引分裂发生在索引创建阶段(预留空间多,自然需要更多的索引分裂操作)如果数据不会有增长pctfree设置为0是最好的。但是如果未来会插入大量数据,那么设置pctfree为非零那么就会有效的减少索引分裂的情况。
Following index creation, an index block can accommodate keys up to the full available data area including space for ITLs. Thus an index block will not require splitting until the available data area is fully used. The bottom line is PCTFREE is not looked at once you pass the index creation phase. One thing to remember is that each row in the index has only one correct block it can live in, based on the key value.
索引创建,可以填满它的data area (包括itl)。所以在索引块被填满之前,是不需要分裂的。每一行在 索引中,只在一个block上,基于它的 index key column value 。
Inserting an index entry after index creation
~~~~~~~~~~~~~~~~
Oracle indexes are implemented as B* Trees which are always balanced.
oracle index btree 通常是平衡的。
In an Oracle B*tree the root of the tree is at level 0. In a very small B*tree the root block can also be a Leaf block.
在oracle btree索引中,tree 的根节点 在level 0 。在一个非常小的b tree索引中,root block 也可能是leaf block 。
In most cases, blocks on levels 0 through N-2 (where N is the height of the tree) are Branch blocks. Branch blocks do not contain data, they simply contain separators which are used to navigate from the root block to the the Leaf blocks.
大多数情况下,level 在 n-2 (n=heigh)之上的都是 branch block (分支块)。分支块不包含rowid。它们只包含分隔符。
All Leaf blocks are at level N-1 in Oracle B*trees. All data stored in a B*tree is stored in the Leaf blocks.
所有的叶块都在 level N-1 。所有的数据存储在leaf block (叶块)中。
The definition of a 'Balanced Tree' is that all the data is on the same level. Which means that the path to all data in the tree is the same length. Since all the data is stored in Leaf blocks and all the Leaf blocks are on the same level the B*trees are always balanced. There is no way to unbalance a B* tree.
“平衡树”的定义为,所有的数据在相同的level。它意味着 所有数据的访问路径,都是大致相同的。因为所有的数据存储在leaf block 并且所有的leaf block 在相同的level 上面,并且btree 通常是平衡的。没有办法去unbalance 一个 btree 。
Deletion of many rows in the B*Tree
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If a table has 100,000 rows and 99,999 of 100,000 rows and index entries are deleted. How is the index balanced?
如果表中拥有100000行数据,并且其中99999行数据和索引条目被删除了,那么索引时怎样平衡的呢 ?
In this case the rows are deleted from the index, and the empty blocks inserted onto the index free list. These blocks are still part of the index and will need to be accessed when traversing the index.
在这里数据会从index中删除,并且空的blocks 会被插入到索引 free list中。这些blocks一直是这个索引的一部分,并且并且在检索数据的时候,依然需要访问这些块。
Thus if an index has one entry in it, there is only one block (a root/leaf block) in the tree. When searching the tree only one block needs to be accessed to answer thequery. If you load a B* Tree with 100,000 rows and get a tree with say 3 Levels.
因此,如果一个索引只有一个条目,只有一个block,root branch block=leaf block 。访问的时候仅需要访问这个块即可。如果加载了数据,with 100000条,并且得到了tree with 3 levels 。
Levels zero and one are Branch blocks used to access the data in the Leaf blocks on level 2. When querying this tree you first access the root block using the search key to find correct Branch block in level one of the Tree.
level 0 和 1 是分支块branch block 用来取得leaf block中的数据(on level 2)。当查询这个tree , 第一需要访问的是root 块,去查找 相应的 branch block (level 1) 。
Next you use the search key and the Branch block to find the correct Leaf block that should contain the key being sought. So it takes three block accesses to answer the same query. Now if 99,999 rows were deleted from the tree the rows are removed from the index but the index is not collapsed.
下一步,需要使用 需要查找的键值 和branch block 分支块去查找正确的 叶块。 所以需要访问3个块去相应查询。现在如果99999行数据被删除,但是索引没有收缩、合并。
In this case you still have a 3 level index to store the one row and it will still take three block accesses to access that row. The fact that we need three block accesses instead of one does not mean this tree is unbalanced. The tree is still balanced it just contains a lot of empty blocks.
在这种情况下,仍然需要3个级别的索引 去存储 这行数据,并且依然需要访问3个块才能够得到相应的数据。这意味着我们需要访问3个block 而不是一个block ,但是这并不意味着tree 不是平衡的tree 依然是平衡的 ,只是包含了一些 空块。