本节简单介绍了PostgreSQL缓存管理(Buffer Manager)中的实现函数ReadBuffer_common->BufferAlloc->BufTableInsert,该函数对于给定的tag和buffer ID,插入到哈希表中。
BufferDesc
共享缓冲区的共享描述符(状态)数据
/*
* Flags for buffer descriptors
* buffer描述器标记
*
* Note: TAG_VALID essentially means that there is a buffer hashtable
* entry associated with the buffer's tag.
* 注意:TAG_VALID本质上意味着有一个与缓冲区的标记相关联的缓冲区散列表条目。
*/
//buffer header锁定
#define BM_LOCKED (1U << 22) /* buffer header is locked */
//数据需要写入(标记为DIRTY)
#define BM_DIRTY (1U << 23) /* data needs writing */
//数据是有效的
#define BM_VALID (1U << 24) /* data is valid */
//已分配buffer tag
#define BM_TAG_VALID (1U << 25) /* tag is assigned */
//正在R/W
#define BM_IO_IN_PROGRESS (1U << 26) /* read or write in progress */
//上一个I/O出现错误
#define BM_IO_ERROR (1U << 27) /* previous I/O failed */
//开始写则变DIRTY
#define BM_JUST_DIRTIED (1U << 28) /* dirtied since write started */
//存在等待sole pin的其他进程
#define BM_PIN_COUNT_WAITER (1U << 29) /* have waiter for sole pin */
//checkpoint发生,必须刷到磁盘上
#define BM_CHECKPOINT_NEEDED (1U << 30) /* must write for checkpoint */
//持久化buffer(不是unlogged或者初始化fork)
#define BM_PERMANENT (1U << 31) /* permanent buffer (not unlogged,
* or init fork) */
/*
* BufferDesc -- shared descriptor/state data for a single shared buffer.
* BufferDesc -- 共享缓冲区的共享描述符(状态)数据
*
* Note: Buffer header lock (BM_LOCKED flag) must be held to examine or change
* the tag, state or wait_backend_pid fields. In general, buffer header lock
* is a spinlock which is combined with flags, refcount and usagecount into
* single atomic variable. This layout allow us to do some operations in a
* single atomic operation, without actually acquiring and releasing spinlock;
* for instance, increase or decrease refcount. buf_id field never changes
* after initialization, so does not need locking. freeNext is protected by
* the buffer_strategy_lock not buffer header lock. The LWLock can take care
* of itself. The buffer header lock is *not* used to control access to the
* data in the buffer!
* 注意:必须持有Buffer header锁(BM_LOCKED标记)才能检查或修改tag/state/wait_backend_pid字段.
* 通常来说,buffer header lock是spinlock,它与标记位/参考计数/使用计数组合到单个原子变量中.
* 这个布局设计允许我们执行原子操作,而不需要实际获得或者释放spinlock(比如,增加或者减少参考计数).
* buf_id字段在初始化后不会出现变化,因此不需要锁定.
* freeNext通过buffer_strategy_lock锁而不是buffer header lock保护.
* LWLock可以很好的处理自己的状态.
* 务请注意的是:buffer header lock不用于控制buffer中的数据访问!
*
* It's assumed that nobody changes the state field while buffer header lock
* is held. Thus buffer header lock holder can do complex updates of the
* state variable in single write, simultaneously with lock release (cleaning
* BM_LOCKED flag). On the other hand, updating of state without holding
* buffer header lock is restricted to CAS, which insure that BM_LOCKED flag
* is not set. Atomic increment/decrement, OR/AND etc. are not allowed.
* 假定在持有buffer header lock的情况下,没有人改变状态字段.
* 持有buffer header lock的进程可以执行在单个写操作中执行复杂的状态变量更新,
* 同步的释放锁(清除BM_LOCKED标记).
* 换句话说,如果没有持有buffer header lock的状态更新,会受限于CAS,
* 这种情况下确保BM_LOCKED没有被设置.
* 比如原子的增加/减少(AND/OR)等操作是不允许的.
*
* An exception is that if we have the buffer pinned, its tag can't change
* underneath us, so we can examine the tag without locking the buffer header.
* Also, in places we do one-time reads of the flags without bothering to
* lock the buffer header; this is generally for situations where we don't
* expect the flag bit being tested to be changing.
* 一种例外情况是如果我们已有buffer pinned,该buffer的tag不能改变(在本进程之下),
* 因此不需要锁定buffer header就可以检查tag了.
* 同时,在执行一次性的flags读取时不需要锁定buffer header.
* 这种情况通常用于我们不希望正在测试的flag bit将被改变.
*
* We can't physically remove items from a disk page if another backend has
* the buffer pinned. Hence, a backend may need to wait for all other pins
* to go away. This is signaled by storing its own PID into
* wait_backend_pid and setting flag bit BM_PIN_COUNT_WAITER. At present,
* there can be only one such waiter per buffer.
* 如果其他进程有buffer pinned,那么进程不能物理的从磁盘页面中删除items.
* 因此,后台进程需要等待其他pins清除.这可以通过存储它自己的PID到wait_backend_pid中,
* 并设置标记位BM_PIN_COUNT_WAITER.
* 目前,每个缓冲区只能由一个等待进程.
*
* We use this same struct for local buffer headers, but the locks are not
* used and not all of the flag bits are useful either. To avoid unnecessary
* overhead, manipulations of the state field should be done without actual
* atomic operations (i.e. only pg_atomic_read_u32() and
* pg_atomic_unlocked_write_u32()).
* 本地缓冲头部使用同样的结构,但并不需要使用locks,而且并不是所有的标记位都使用.
* 为了避免不必要的负载,状态域的维护不需要实际的原子操作
* (比如只有pg_atomic_read_u32() and pg_atomic_unlocked_write_u32())
*
* Be careful to avoid increasing the size of the struct when adding or
* reordering members. Keeping it below 64 bytes (the most common CPU
* cache line size) is fairly important for performance.
* 在增加或者记录成员变量时,小心避免增加结构体的大小.
* 保持结构体大小在64字节内(通常的CPU缓存线大小)对于性能是非常重要的.
*/
typedef struct BufferDesc
{
//buffer tag
BufferTag tag; /* ID of page contained in buffer */
//buffer索引编号(0开始),指向相应的buffer pool slot
int buf_id; /* buffer's index number (from 0) */
/* state of the tag, containing flags, refcount and usagecount */
//tag状态,包括flags/refcount和usagecount
pg_atomic_uint32 state;
//pin-count等待进程ID
int wait_backend_pid; /* backend PID of pin-count waiter */
//空闲链表链中下一个空闲的buffer
int freeNext; /* link in freelist chain */
//缓冲区内容锁
LWLock content_lock; /* to lock access to buffer contents */
} BufferDesc;
BufferTag
Buffer tag标记了buffer存储的是磁盘中哪个block
/*
* Buffer tag identifies which disk block the buffer contains.
* Buffer tag标记了buffer存储的是磁盘中哪个block
*
* Note: the BufferTag data must be sufficient to determine where to write the
* block, without reference to pg_class or pg_tablespace entries. It's
* possible that the backend flushing the buffer doesn't even believe the
* relation is visible yet (its xact may have started before the xact that
* created the rel). The storage manager must be able to cope anyway.
* 注意:BufferTag必须足以确定如何写block而不需要参照pg_class或者pg_tablespace数据字典信息.
* 有可能后台进程在刷新缓冲区的时候深圳不相信关系是可见的(事务可能在创建rel的事务之前).
* 存储管理器必须可以处理这些事情.
*
* Note: if there's any pad bytes in the struct, INIT_BUFFERTAG will have
* to be fixed to zero them, since this struct is used as a hash key.
* 注意:如果在结构体中有填充的字节,INIT_BUFFERTAG必须将它们固定为零,因为这个结构体用作散列键.
*/
typedef struct buftag
{
//物理relation标识符
RelFileNode rnode; /* physical relation identifier */
ForkNumber forkNum;
//相对于relation起始的块号
BlockNumber blockNum; /* blknum relative to begin of reln */
} BufferTag;
HTAB
哈希表的顶层控制结构.
/*
* Top control structure for a hashtable --- in a shared table, each backend
* has its own copy (OK since no fields change at runtime)
* 哈希表的顶层控制结构.
* 在这个共享哈希表中,每一个后台进程都有自己的拷贝
* (之所以没有问题是因为fork出来后,在运行期没有字段会变化)
*/
struct HTAB
{
//指向共享的控制信息
HASHHDR *hctl; /* => shared control information */
//段开始目录
HASHSEGMENT *dir; /* directory of segment starts */
//哈希函数
HashValueFunc hash; /* hash function */
//哈希键比较函数
HashCompareFunc match; /* key comparison function */
//哈希键拷贝函数
HashCopyFunc keycopy; /* key copying function */
//内存分配器
HashAllocFunc alloc; /* memory allocator */
//内存上下文
MemoryContext hcxt; /* memory context if default allocator used */
//表名(用于错误信息)
char *tabname; /* table name (for error messages) */
//如在共享内存中,则为T
bool isshared; /* true if table is in shared memory */
//如为T,则固定大小不能扩展
bool isfixed; /* if true, don't enlarge */
/* freezing a shared table isn't allowed, so we can keep state here */
//不允许冻结共享表,因此这里会保存相关状态
bool frozen; /* true = no more inserts allowed */
/* We keep local copies of these fixed values to reduce contention */
//保存这些固定值的本地拷贝,以减少冲突
//哈希键长度(以字节为单位)
Size keysize; /* hash key length in bytes */
//段大小,必须为2的幂
long ssize; /* segment size --- must be power of 2 */
//段偏移,ssize的对数
int sshift; /* segment shift = log2(ssize) */
};
/*
* Header structure for a hash table --- contains all changeable info
* 哈希表的头部结构 -- 存储所有可变信息
*
* In a shared-memory hash table, the HASHHDR is in shared memory, while
* each backend has a local HTAB struct. For a non-shared table, there isn't
* any functional difference between HASHHDR and HTAB, but we separate them
* anyway to share code between shared and non-shared tables.
* 在共享内存哈希表中,HASHHDR位于共享内存中,每一个后台进程都有一个本地HTAB结构.
* 对于非共享哈希表,HASHHDR和HTAB没有任何功能性的不同,
* 但无论如何,我们还是把它们区分为共享和非共享表.
*/
struct HASHHDR
{
/*
* The freelist can become a point of contention in high-concurrency hash
* tables, so we use an array of freelists, each with its own mutex and
* nentries count, instead of just a single one. Although the freelists
* normally operate independently, we will scavenge entries from freelists
* other than a hashcode's default freelist when necessary.
* 在高并发的哈希表中,空闲链表会成为竞争热点,因此我们使用空闲链表数组,
* 数组中的每一个元素都有自己的mutex和条目统计,而不是使用一个.
*
* If the hash table is not partitioned, only freeList[0] is used and its
* spinlock is not used at all; callers' locking is assumed sufficient.
* 如果哈希表没有分区,那么只有freelist[0]元素是有用的,自旋锁没有任何用处;
* 调用者锁定被认为已足够OK.
*/
FreeListData freeList[NUM_FREELISTS];
/* These fields can change, but not in a partitioned table */
//这些域字段可以改变,但不适用于分区表
/* Also, dsize can't change in a shared table, even if unpartitioned */
//同时,就算是非分区表,共享表的dsize也不能改变
//目录大小
long dsize; /* directory size */
//已分配的段大小(<= dbsize)
long nsegs; /* number of allocated segments (<= dsize) */
//正在使用的最大桶ID
uint32 max_bucket; /* ID of maximum bucket in use */
//进入整个哈希表的模掩码
uint32 high_mask; /* mask to modulo into entire table */
//进入低于半个哈希表的模掩码
uint32 low_mask; /* mask to modulo into lower half of table */
/* These fields are fixed at hashtable creation */
//下面这些字段在哈希表创建时已固定
//哈希键大小(以字节为单位)
Size keysize; /* hash key length in bytes */
//所有用户元素大小(以字节为单位)
Size entrysize; /* total user element size in bytes */
//分区个数(2的幂),或者为0
long num_partitions; /* # partitions (must be power of 2), or 0 */
//目标的填充因子
long ffactor; /* target fill factor */
//如目录是固定大小,则该值为dsize的上限值
long max_dsize; /* 'dsize' limit if directory is fixed size */
//段大小,必须是2的幂
long ssize; /* segment size --- must be power of 2 */
//端偏移,ssize的对数
int sshift; /* segment shift = log2(ssize) */
//一次性分配的条目个数
int nelem_alloc; /* number of entries to allocate at once */
#ifdef HASH_STATISTICS
/*
* Count statistics here. NB: stats code doesn't bother with mutex, so
* counts could be corrupted a bit in a partitioned table.
* 统计信息.
* 注意:统计相关的代码不会影响mutex,因此对于分区表,统计可能有一点点问题
*/
long accesses;
long collisions;
#endif
};
/*
* Per-freelist data.
* 空闲链表数据.
*
* In a partitioned hash table, each freelist is associated with a specific
* set of hashcodes, as determined by the FREELIST_IDX() macro below.
* nentries tracks the number of live hashtable entries having those hashcodes
* (NOT the number of entries in the freelist, as you might expect).
* 在一个分区哈希表中,每一个空闲链表与特定的hashcodes集合相关,通过下面的FREELIST_IDX()宏进行定义.
* nentries跟踪有这些hashcodes的仍存活的hashtable条目个数.
* (注意不要搞错,不是空闲的条目个数)
*
* The coverage of a freelist might be more or less than one partition, so it
* needs its own lock rather than relying on caller locking. Relying on that
* wouldn't work even if the coverage was the same, because of the occasional
* need to "borrow" entries from another freelist; see get_hash_entry().
* 空闲链表的覆盖范围可能比一个分区多或少,因此需要自己的锁而不能仅仅依赖调用者的锁.
* 依赖调用者锁在覆盖面一样的情况下也不会起效,因为偶尔需要从另一个自由列表“借用”条目,详细参见get_hash_entry()
*
* Using an array of FreeListData instead of separate arrays of mutexes,
* nentries and freeLists helps to reduce sharing of cache lines between
* different mutexes.
* 使用FreeListData数组而不是一个独立的mutexes,nentries和freelists数组有助于减少不同mutexes之间的缓存线共享.
*/
typedef struct
{
//该空闲链表的自旋锁
slock_t mutex; /* spinlock for this freelist */
//相关桶中的条目个数
long nentries; /* number of entries in associated buckets */
//空闲元素链
HASHELEMENT *freeList; /* chain of free elements */
} FreeListData;
/*
* HASHELEMENT is the private part of a hashtable entry. The caller's data
* follows the HASHELEMENT structure (on a MAXALIGN'd boundary). The hash key
* is expected to be at the start of the caller's hash entry data structure.
* HASHELEMENT是哈希表条目的私有部分.
* 调用者的数据按照HASHELEMENT结构组织(位于MAXALIGN的边界).
* 哈希键应位于调用者hash条目数据结构的开始位置.
*/
typedef struct HASHELEMENT
{
//链接到相同桶中的下一个条目
struct HASHELEMENT *link; /* link to next entry in same bucket */
//该条目的哈希函数结果
uint32 hashvalue; /* hash function result for this entry */
} HASHELEMENT;
/* Hash table header struct is an opaque type known only within dynahash.c */
//哈希表头部结构,非透明类型,用于dynahash.c
typedef struct HASHHDR HASHHDR;
/* Hash table control struct is an opaque type known only within dynahash.c */
//哈希表控制结构,非透明类型,用于dynahash.c
typedef struct HTAB HTAB;
/* Parameter data structure for hash_create */
//hash_create使用的参数数据结构
/* Only those fields indicated by hash_flags need be set */
//根据hash_flags标记设置相应的字段
typedef struct HASHCTL
{
//分区个数(必须是2的幂)
long num_partitions; /* # partitions (must be power of 2) */
//段大小
long ssize; /* segment size */
//初始化目录大小
long dsize; /* (initial) directory size */
//dsize上限
long max_dsize; /* limit to dsize if dir size is limited */
//填充因子
long ffactor; /* fill factor */
//哈希键大小(字节为单位)
Size keysize; /* hash key length in bytes */
//参见上述数据结构注释
Size entrysize; /* total user element size in bytes */
//
HashValueFunc hash; /* hash function */
HashCompareFunc match; /* key comparison function */
HashCopyFunc keycopy; /* key copying function */
HashAllocFunc alloc; /* memory allocator */
MemoryContext hcxt; /* memory context to use for allocations */
//共享内存中的哈希头部结构地址
HASHHDR *hctl; /* location of header in shared mem */
} HASHCTL;
/* A hash bucket is a linked list of HASHELEMENTs */
//哈希桶是HASHELEMENTs链表
typedef HASHELEMENT *HASHBUCKET;
/* A hash segment is an array of bucket headers */
//hash segment是桶数组
typedef HASHBUCKET *HASHSEGMENT;
/*
* Hash functions must have this signature.
* Hash函数必须有它自己的标识
*/
typedef uint32 (*HashValueFunc) (const void *key, Size keysize);
/*
* Key comparison functions must have this signature. Comparison functions
* return zero for match, nonzero for no match. (The comparison function
* definition is designed to allow memcmp() and strncmp() to be used directly
* as key comparison functions.)
* 哈希键对比函数必须有自己的标识.
* 如匹配则对比函数返回0,不匹配返回非0.
* (对比函数定义被设计为允许在对比键值时可直接使用memcmp()和strncmp())
*/
typedef int (*HashCompareFunc) (const void *key1, const void *key2,
Size keysize);
/*
* Key copying functions must have this signature. The return value is not
* used. (The definition is set up to allow memcpy() and strlcpy() to be
* used directly.)
* 键拷贝函数必须有自己的标识.
* 返回值无用.
*/
typedef void *(*HashCopyFunc) (void *dest, const void *src, Size keysize);
/*
* Space allocation function for a hashtable --- designed to match malloc().
* Note: there is no free function API; can't destroy a hashtable unless you
* use the default allocator.
* 哈希表的恐惧分配函数 -- 被设计为与malloc()函数匹配.
* 注意:这里没有释放函数API;不能销毁哈希表,除非使用默认的分配器.
*/
typedef void *(*HashAllocFunc) (Size request);
BufferLookupEnt
/* entry for buffer lookup hashtable */
//检索hash表的条目
typedef struct
{
//磁盘page的tag
BufferTag key; /* Tag of a disk page */
//相关联的buffer ID
int id; /* Associated buffer ID */
} BufferLookupEnt;
BufTableInsert源码很简单,重点是需要理解HTAB数据结构,即全局变量SharedBufHash的数据结构.
/*
* BufTableInsert
* Insert a hashtable entry for given tag and buffer ID,
* unless an entry already exists for that tag
* BufTableInsert
* 给定tag和buffer ID,插入到哈希表中,如该tag相应的条目已存在,则不处理.
*
* Returns -1 on successful insertion. If a conflicting entry exists
* already, returns the buffer ID in that entry.
* 如成功插入,则返回-1.如冲突的条目已存在,则返回条目的buffer ID.
*
* Caller must hold exclusive lock on BufMappingLock for tag's partition
* 调用者必须持有tag分区BufMappingLock独占锁.
*/
int
BufTableInsert(BufferTag *tagPtr, uint32 hashcode, int buf_id)
{
BufferLookupEnt *result;
bool found;
Assert(buf_id >= 0); /* -1 is reserved for not-in-table */
Assert(tagPtr->blockNum != P_NEW); /* invalid tag */
//static HTAB *SharedBufHash;
result = (BufferLookupEnt *)
hash_search_with_hash_value(SharedBufHash,
(void *) tagPtr,
hashcode,
HASH_ENTER,
&found);
if (found) /* found something already in the table */
return result->id;
result->id = buf_id;
return -1;
}
测试脚本,查询数据表:
10:01:54 (xdb@[local]:5432)testdb=# select * from t1 limit 10;
启动gdb,设置断点
(gdb)
(gdb) b BufTableInsert
Breakpoint 1 at 0x875c92: file buf_table.c, line 125.
(gdb) c
Continuing.
Breakpoint 1, BufTableInsert (tagPtr=0x7fff0cba0ef0, hashcode=1398580903, buf_id=101) at buf_table.c:125
125 Assert(buf_id >= 0); /* -1 is reserved for not-in-table */
(gdb)
输入参数
tagPtr-BufferTag结构体
hashcode=1398580903,
buf_id=101
(gdb) p *tagPtr
$1 = {rnode = {spcNode = 1663, dbNode = 16402, relNode = 51439}, forkNum = MAIN_FORKNUM, blockNum = 0}
调用hash_search_with_hash_value,重点考察SharedBufHash(HTAB指针)
(gdb) n
129 hash_search_with_hash_value(SharedBufHash,
SharedBufHash
(gdb) p *SharedBufHash
$2 = {hctl = 0x7f5489004380, dir = 0x7f54890046d8, hash = 0xa3bf74 , match = 0x4791a0 ,
keycopy = 0x479690 , alloc = 0x89250b , hcxt = 0x0,
tabname = 0x1fbf1d8 "Shared Buffer Lookup Table", isshared = true, isfixed = false, frozen = false, keysize = 20,
ssize = 256, sshift = 8}
(gdb)
SharedBufHash->hctl,HASHHDR结构体
freeList是一个数组
num_partitions是分区个数,默认为128
(gdb) p *SharedBufHash->hctl
$3 = {freeList = {{mutex = 0 '\000', nentries = 3, freeList = 0x7f5489119700}, {mutex = 0 '\000', nentries = 2,
freeList = 0x7f548912d828}, {mutex = 0 '\000', nentries = 4, freeList = 0x7f54891418d8}, {mutex = 0 '\000',
nentries = 3, freeList = 0x7f5489155a00}, {mutex = 0 '\000', nentries = 8, freeList = 0x7f5489169a38}, {
mutex = 0 '\000', nentries = 3, freeList = 0x7f548917dc00}, {mutex = 0 '\000', nentries = 5,
freeList = 0x7f5489191cb0}, {mutex = 0 '\000', nentries = 3, freeList = 0x7f54891a5e00}, {mutex = 0 '\000',
nentries = 1, freeList = 0x7f54891b9f50}, {mutex = 0 '\000', nentries = 3, freeList = 0x7f54891ce000}, {
mutex = 0 '\000', nentries = 3, freeList = 0x7f54891e2100}, {mutex = 0 '\000', nentries = 5,
freeList = 0x7f54891f61b0}, {mutex = 0 '\000', nentries = 4, freeList = 0x7f548920a2d8}, {mutex = 0 '\000',
nentries = 2, freeList = 0x7f548921e428}, {mutex = 0 '\000', nentries = 2, freeList = 0x7f5489232528}, {
mutex = 0 '\000', nentries = 4, freeList = 0x7f54892465d8}, {mutex = 0 '\000', nentries = 3,
freeList = 0x7f548925a700}, {mutex = 0 '\000', nentries = 3, freeList = 0x7f548926e800}, {mutex = 0 '\000',
nentries = 5, freeList = 0x7f54892828b0}, {mutex = 0 '\000', nentries = 2, freeList = 0x7f5489296a28}, {
mutex = 0 '\000', nentries = 4, freeList = 0x7f54892aaad8}, {mutex = 0 '\000', nentries = 4,
freeList = 0x7f54892bebd8}, {mutex = 0 '\000', nentries = 5, freeList = 0x7f54892d2cb0}, {mutex = 0 '\000',
nentries = 0, freeList = 0x7f54892e6e78}, {mutex = 0 '\000', nentries = 2, freeList = 0x7f54892faf28}, {
mutex = 0 '\000', nentries = 3, freeList = 0x7f548930f000}, {mutex = 0 '\000', nentries = 4,
freeList = 0x7f54893230d8}, {mutex = 0 '\000', nentries = 4, freeList = 0x7f54893371d8}, {mutex = 0 '\000',
nentries = 2, freeList = 0x7f548934b328}, {mutex = 0 '\000', nentries = 1, freeList = 0x7f548935f450}, {
mutex = 0 '\000', nentries = 4, freeList = 0x7f54893734d8}, {mutex = 0 '\000', nentries = 3,
freeList = 0x7f5489387600}}, dsize = 512, nsegs = 512, max_bucket = 131071, high_mask = 262143, low_mask = 131071,
keysize = 20, entrysize = 24, num_partitions = 128, ffactor = 1, max_dsize = 512, ssize = 256, sshift = 8,
nelem_alloc = 51}
(gdb)
(gdb) p *SharedBufHash->hctl->freeList[0].freeList
$4 = {link = 0x7f54891196d8, hashvalue = 0}
(gdb) p *SharedBufHash->hctl->freeList[0].freeList.link
$5 = {link = 0x7f54891196b0, hashvalue = 0}
(gdb)
SharedBufHash->dir,段开始目录
(gdb) p *SharedBufHash->dir
$6 = (HASHSEGMENT) 0x7f5489005700
(gdb) p **SharedBufHash->dir
$7 = (HASHBUCKET) 0x0
(gdb) p *SharedBufHash->dir[0]
$8 = (HASHBUCKET) 0x0
(gdb) p *SharedBufHash->dir[1]
$9 = (HASHBUCKET) 0x0
(gdb)
哈希函数为tag_hash
哈希键比较函数是memcmp @plt
哈希键拷贝函数是memcpy @plt
内存分配器是ShmemAllocNoError
内存上下文为NULL
表名是Shared Buffer Lookup Table
共享内存(isshared=T)
非固定/非冻结/哈希键长度为20B/段大小为256/段偏移为8
执行hash_search_with_hash_value,查看相关信息
(gdb) n
128 result = (BufferLookupEnt *)
(gdb)
135 if (found) /* found something already in the table */
(gdb) p *SharedBufHash
$10 = {hctl = 0x7f5489004380, dir = 0x7f54890046d8, hash = 0xa3bf74 , match = 0x4791a0 ,
keycopy = 0x479690 , alloc = 0x89250b , hcxt = 0x0,
tabname = 0x1fbf1d8 "Shared Buffer Lookup Table", isshared = true, isfixed = false, frozen = false, keysize = 20,
ssize = 256, sshift = 8}
(gdb) p *SharedBufHash->hctl
$11 = {freeList = {{mutex = 0 '\000', nentries = 3, freeList = 0x7f5489119700}, {mutex = 0 '\000', nentries = 2,
freeList = 0x7f548912d828}, {mutex = 0 '\000', nentries = 4, freeList = 0x7f54891418d8}, {mutex = 0 '\000',
nentries = 3, freeList = 0x7f5489155a00}, {mutex = 0 '\000', nentries = 8, freeList = 0x7f5489169a38}, {
mutex = 0 '\000', nentries = 3, freeList = 0x7f548917dc00}, {mutex = 0 '\000', nentries = 5,
freeList = 0x7f5489191cb0}, {mutex = 0 '\000', nentries = 4, freeList = 0x7f54891a5dd8}, {mutex = 0 '\000',
nentries = 1, freeList = 0x7f54891b9f50}, {mutex = 0 '\000', nentries = 3, freeList = 0x7f54891ce000}, {
mutex = 0 '\000', nentries = 3, freeList = 0x7f54891e2100}, {mutex = 0 '\000', nentries = 5,
freeList = 0x7f54891f61b0}, {mutex = 0 '\000', nentries = 4, freeList = 0x7f548920a2d8}, {mutex = 0 '\000',
nentries = 2, freeList = 0x7f548921e428}, {mutex = 0 '\000', nentries = 2, freeList = 0x7f5489232528}, {
mutex = 0 '\000', nentries = 4, freeList = 0x7f54892465d8}, {mutex = 0 '\000', nentries = 3,
freeList = 0x7f548925a700}, {mutex = 0 '\000', nentries = 3, freeList = 0x7f548926e800}, {mutex = 0 '\000',
nentries = 5, freeList = 0x7f54892828b0}, {mutex = 0 '\000', nentries = 2, freeList = 0x7f5489296a28}, {
mutex = 0 '\000', nentries = 4, freeList = 0x7f54892aaad8}, {mutex = 0 '\000', nentries = 4,
freeList = 0x7f54892bebd8}, {mutex = 0 '\000', nentries = 5, freeList = 0x7f54892d2cb0}, {mutex = 0 '\000',
nentries = 0, freeList = 0x7f54892e6e78}, {mutex = 0 '\000', nentries = 2, freeList = 0x7f54892faf28}, {
mutex = 0 '\000', nentries = 3, freeList = 0x7f548930f000}, {mutex = 0 '\000', nentries = 4,
freeList = 0x7f54893230d8}, {mutex = 0 '\000', nentries = 4, freeList = 0x7f54893371d8}, {mutex = 0 '\000',
nentries = 2, freeList = 0x7f548934b328}, {mutex = 0 '\000', nentries = 1, freeList = 0x7f548935f450}, {
mutex = 0 '\000', nentries = 4, freeList = 0x7f54893734d8}, {mutex = 0 '\000', nentries = 3,
freeList = 0x7f5489387600}}, dsize = 512, nsegs = 512, max_bucket = 131071, high_mask = 262143, low_mask = 131071,
keysize = 20, entrysize = 24, num_partitions = 128, ffactor = 1, max_dsize = 512, ssize = 256, sshift = 8,
nelem_alloc = 51}
(gdb) p **SharedBufHash->dir
$12 = (HASHBUCKET) 0x0
(gdb) p *SharedBufHash->dir
$13 = (HASHSEGMENT) 0x7f5489005700
(gdb) p result
$14 = (BufferLookupEnt *) 0x7f54891a5e10
(gdb) p *result
$15 = {key = {rnode = {spcNode = 1663, dbNode = 16402, relNode = 51439}, forkNum = MAIN_FORKNUM, blockNum = 0}, id = 0}
(gdb) p found
$16 = false
完成调用,返回
(gdb) n
138 result->id = buf_id;
(gdb)
140 return -1;
(gdb)
141 }
(gdb)
BufferAlloc (smgr=0x204f430, relpersistence=112 'p', forkNum=MAIN_FORKNUM, blockNum=0, strategy=0x0,
foundPtr=0x7fff0cba0fa3) at bufmgr.c:1216
1216 if (buf_id >= 0)
(gdb)
DONE!
PG Source Code
Buffer Manager
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/6906/viewspace-2636576/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/6906/viewspace-2636576/