专栏内容:
- postgresql内核源码分析
- 手写数据库toadb
- 并发编程
开源贡献:
- toadb开源库
个人主页:我的主页
管理社区:开源数据库
座右铭:天行健,君子以自强不息;地势坤,君子以厚德载物.
在postgresql最常用的索引就是btree,它支持范围和等值查询。
本文主要介绍btree的代码的入口,接口定义,主要涉及索引的查询,插入,删除,和数据的清理操作。
索引是为了更快的找到实际数据表中的数据,那么索引键值就非常小,可以一次性从磁盘读取大量的索引数据。
但是有些索引值中存储了实际数据,与数据是一一对应的,就是密集型索引,而有一些索引并不存储实际数据,而是存储范围内的最大最小值,此类型索引叫做稀疏索引;对于密集型索引,如主键,直接可以得到对应的数据位置或对应列的数据,btree算法就可以支持此类型的索引;
而稀疏索引,查到索引后,需要再遍历数据表,或者二级索引才能命中目标数据。
postgresql中为了代码的解耦,定义了索引操作的结构体,基成员是一组统一的操作和标识选项;
对于btree的定义如下,可以在这里找到btree索引的操作接口名称,在实际实用的只是调用结构体的成员,也就是函数指针。
/*
* Btree handler function: return IndexAmRoutine with access method parameters
* and callbacks.
*/
Datum
bthandler(PG_FUNCTION_ARGS)
{
IndexAmRoutine *amroutine = makeNode(IndexAmRoutine);
amroutine->amstrategies = BTMaxStrategyNumber;
amroutine->amsupport = BTNProcs;
amroutine->amoptsprocnum = BTOPTIONS_PROC;
amroutine->amcanorder = true;
amroutine->amcanorderbyop = false;
amroutine->amcanbackward = true;
amroutine->amcanunique = true;
amroutine->amcanmulticol = true;
amroutine->amoptionalkey = true;
amroutine->amsearcharray = true;
amroutine->amsearchnulls = true;
amroutine->amstorage = false;
amroutine->amclusterable = true;
amroutine->ampredlocks = true;
amroutine->amcanparallel = true;
amroutine->amcaninclude = true;
amroutine->amusemaintenanceworkmem = false;
amroutine->amsummarizing = false;
amroutine->amparallelvacuumoptions =
VACUUM_OPTION_PARALLEL_BULKDEL | VACUUM_OPTION_PARALLEL_COND_CLEANUP;
amroutine->amkeytype = InvalidOid;
amroutine->ambuild = btbuild;
amroutine->ambuildempty = btbuildempty;
amroutine->aminsert = btinsert;
amroutine->ambulkdelete = btbulkdelete;
amroutine->amvacuumcleanup = btvacuumcleanup;
amroutine->amcanreturn = btcanreturn;
amroutine->amcostestimate = btcostestimate;
amroutine->amoptions = btoptions;
amroutine->amproperty = btproperty;
amroutine->ambuildphasename = btbuildphasename;
amroutine->amvalidate = btvalidate;
amroutine->amadjustmembers = btadjustmembers;
amroutine->ambeginscan = btbeginscan;
amroutine->amrescan = btrescan;
amroutine->amgettuple = btgettuple;
amroutine->amgetbitmap = btgetbitmap;
amroutine->amendscan = btendscan;
amroutine->ammarkpos = btmarkpos;
amroutine->amrestrpos = btrestrpos;
amroutine->amestimateparallelscan = btestimateparallelscan;
amroutine->aminitparallelscan = btinitparallelscan;
amroutine->amparallelrescan = btparallelrescan;
PG_RETURN_POINTER(amroutine);
}
我们首先来看索引的基本操作,查询btgettuple,插入btinsert和删除。
在执行计划中会有索引查询的节点,如ExecIndexScan, 发起索引查询,通过索引查找到数据表的tuple;
返回数据表的tuple, 如果是稀疏索引,此处会进行二次查找;
返回数据表的tuple,此处会使用索引找到的tid,在数据表中查找,并检查可见性,如果不可见,那继续查找下一条;
返回索引键中的记录的tid;
在索引中查找, 通过遍历比较,命中查找键对应的索引项
索引的查找大致分为两个步骤:
- 找到起始点,也就是查找键值
- 从起始点开始扫描,返回符合条件的索引项
索引的查询入口函数是 btgettuple,下面是它的实现;
bool
btgettuple(IndexScanDesc scan, ScanDirection dir)
{
BTScanOpaque so = (BTScanOpaque) scan->opaque;
bool res;
/* btree indexes are never lossy */
scan->xs_recheck = false;
/*
* If we have any array keys, initialize them during first call for a
* scan. We can't do this in btrescan because we don't know the scan
* direction at that time.
*/
if (so->numArrayKeys && !BTScanPosIsValid(so->currPos))
{
/* punt if we have any unsatisfiable array keys */
if (so->numArrayKeys < 0)
return false;
_bt_start_array_keys(scan, dir);
}
/* This loop handles advancing to the next array elements, if any */
do
{
/*
* If we've already initialized this scan, we can just advance it in
* the appropriate direction. If we haven't done so yet, we call
* _bt_first() to get the first item in the scan.
*/
if (!BTScanPosIsValid(so->currPos))
res = _bt_first(scan, dir);
else
{
/*
* Check to see if we should kill the previously-fetched tuple.
*/
if (scan->kill_prior_tuple)
{
/*
* Yes, remember it for later. (We'll deal with all such
* tuples at once right before leaving the index page.) The
* test for numKilled overrun is not just paranoia: if the
* caller reverses direction in the indexscan then the same
* item might get entered multiple times. It's not worth
* trying to optimize that, so we don't detect it, but instead
* just forget any excess entries.
*/
if (so->killedItems == NULL)
so->killedItems = (int *)
palloc(MaxTIDsPerBTreePage * sizeof(int));
if (so->numKilled < MaxTIDsPerBTreePage)
so->killedItems[so->numKilled++] = so->currPos.itemIndex;
}
/*
* Now continue the scan.
*/
res = _bt_next(scan, dir);
}
/* If we have a tuple, return it ... */
if (res)
break;
/* ... otherwise see if we have more array keys to deal with */
} while (so->numArrayKeys && _bt_advance_array_keys(scan, dir));
return res;
}
从insert来看,调用路径如下
SQL insert语句的执行入口函数
如果当前表中建有索引,在表数据tuple插入后,调用此函数插入索引,有可能存在多个索引,循环对每个索引调用下级函数进行插入;
索引插入的公共调用接口,实际调用对应索引的插入定义接口;
btree索引插入的操作的入口函数; 在此函数中,首先拼装一个索引tuple,然后调用下级函数进行插入;
执行索引项的插入,会经过查找位置,检查唯一性,插入等一系列流程环节;
索引插入的大体流程主要有以下环节:
索引插入的入函数是 btinsert,实际执行是 _bt_doinsert,下面来看一下执行的代码流程;
bool
_bt_doinsert(Relation rel, IndexTuple itup,
IndexUniqueCheck checkUnique, bool indexUnchanged,
Relation heapRel)
{
bool is_unique = false;
BTInsertStateData insertstate;
BTScanInsert itup_key;
BTStack stack;
bool checkingunique = (checkUnique != UNIQUE_CHECK_NO);
/* we need an insertion scan key to do our search, so build one */
itup_key = _bt_mkscankey(rel, itup);
if (checkingunique)
{
if (!itup_key->anynullkeys)
{
/* No (heapkeyspace) scantid until uniqueness established */
itup_key->scantid = NULL;
}
else
{
checkingunique = false;
/* Tuple is unique in the sense that core code cares about */
Assert(checkUnique != UNIQUE_CHECK_EXISTING);
is_unique = true;
}
}
insertstate.itup = itup;
insertstate.itemsz = MAXALIGN(IndexTupleSize(itup));
insertstate.itup_key = itup_key;
insertstate.bounds_valid = false;
insertstate.buf = InvalidBuffer;
insertstate.postingoff = 0;
search:
stack = _bt_search_insert(rel, heapRel, &insertstate);
if (checkingunique)
{
TransactionId xwait;
uint32 speculativeToken;
xwait = _bt_check_unique(rel, &insertstate, heapRel, checkUnique,
&is_unique, &speculativeToken);
if (unlikely(TransactionIdIsValid(xwait)))
{
/* Have to wait for the other guy ... */
_bt_relbuf(rel, insertstate.buf);
insertstate.buf = InvalidBuffer;
if (speculativeToken)
SpeculativeInsertionWait(xwait, speculativeToken);
else
XactLockTableWait(xwait, rel, &itup->t_tid, XLTW_InsertIndex);
/* start over... */
if (stack)
_bt_freestack(stack);
goto search;
}
/* Uniqueness is established -- restore heap tid as scantid */
if (itup_key->heapkeyspace)
itup_key->scantid = &itup->t_tid;
}
if (checkUnique != UNIQUE_CHECK_EXISTING)
{
OffsetNumber newitemoff;
CheckForSerializableConflictIn(rel, NULL, BufferGetBlockNumber(insertstate.buf));
newitemoff = _bt_findinsertloc(rel, &insertstate, checkingunique,
indexUnchanged, stack, heapRel);
_bt_insertonpg(rel, heapRel, itup_key, insertstate.buf, InvalidBuffer,
stack, itup, insertstate.itemsz, newitemoff,
insertstate.postingoff, false);
}
else
{
/* just release the buffer */
_bt_relbuf(rel, insertstate.buf);
}
/* be tidy */
if (stack)
_bt_freestack(stack);
pfree(itup_key);
return is_unique;
}
代码流程如下:
索引的更新,就是删除和插入操作,这里我们来看一下索引删除的概要流程。
对于数据表的tuple的删除,数据并没有真实删除,所以对应的索引项也不会删除,那么什么时候删除索引项呢?
在进行vacuum 或进行 prune paga时,对于HOT链都会在每个page上留下最后一个数据元组,因为同一个page内的HOT链只对应一个索引项,留下这最后一个也是为了删除索引项。
当进行vacuum 索引时,就会通过这个dead tuple找到对应的索引项,先删除索引项,再删除dead tuple。
常常说索引的性能下降了,其实就是索引膨胀导致,也就是deadtuple变多,导致待删除索引项变多,查询效率大降低,同时也会带来索引IO的增加。
调用 通用索引处理接口;
这里通用索引处理接口,其中调用对应索引的处理接口,这里是调用btree索引处理;
btree对应的批量删除接口; 避免退出的影响,在开始时会注册退出的回调函数,在解除共享内存前处理善后;然后调用 btvacuumscan 对所有page进行索引删除清理。
非常感谢大家的支持,在浏览的同时别忘了留下您宝贵的评论,如果觉得值得鼓励,请点赞,收藏,我会更加努力!
作者邮箱:[email protected]
如有错误或者疏漏欢迎指出,互相学习。
注:未经同意,不得转载!