如下图所示,当一个postgreSQL进程读取一个元组时,需要获取表的基本信息(例如:表的oid、索引信息和统计信息等)及元组的模式信息,这些信息被分别记录在多个系统表中。通常一个表的模式信息在设定好后的变化频率很低,因此在对同一个表的多个元组操作时,每次都去读取系统表的元组来构建模式信息显然是没有必要的,这也会降低元组的操作效率。为了减少对系统表的访问,在每个进程的本地内存区域设置了两种cache,一种是用来存储系统表的元组,一种是用来存储表的基本信息,从而可以让进程更快的构建出表的基本信息和元组的模式信息。cache在某一个进程对系统表发生更改时其他的 backend 进程要能够感知到,需要有一套维护cache 一致性的机制,也就是 PG 的 InvalidMessage机制。
用户表是如何被管理的,参考:https://zhuanlan.zhihu.com/p/623283855
syscache主要用来缓存最近使用过的系统表的元组。从代码实现看,syscache就是一个catcache数组,数组的长度为系统表的个数,每一个系统表唯一的对应catcache数组的一个元素。
typedef struct catcache
{
int id; /* catcache id */
int cc_nbuckets; /* # of hash buckets in this cache */
TupleDesc cc_tupdesc; /* tuple descriptor (copied from reldesc) */
dlist_head *cc_bucket; /* hash buckets */
CCHashFN cc_hashfunc[CATCACHE_MAXKEYS]; /* hash function for each key */
CCFastEqualFN cc_fastequal[CATCACHE_MAXKEYS]; /* fast equal function for
* each key */
int cc_keyno[CATCACHE_MAXKEYS]; /* AttrNumber of each key */
dlist_head cc_lists; /* list of CatCList structs */
int cc_ntup; /* # of tuples currently in this cache */
int cc_nkeys; /* # of keys (1..CATCACHE_MAXKEYS) */
const char *cc_relname; /* name of relation the tuples come from */
Oid cc_reloid; /* OID of relation the tuples come from */
Oid cc_indexoid; /* OID of index matching cache keys */
bool cc_relisshared; /* is relation shared across databases? */
slist_node cc_next; /* list link */
ScanKeyData cc_skey[CATCACHE_MAXKEYS]; /* precomputed key info for heap
* scans */
/*
* Keep these at the end, so that compiling catcache.c with CATCACHE_STATS
* doesn't break ABI for other modules
*/
#ifdef CATCACHE_STATS
long cc_searches; /* total # searches against this cache */
long cc_hits; /* # of matches against existing entry */
long cc_neg_hits; /* # of matches against negative entry */
long cc_newloads; /* # of successful loads of new entry */
/*
* cc_searches - (cc_hits + cc_neg_hits + cc_newloads) is number of failed
* searches, each of which will result in loading a negative entry
*/
long cc_invals; /* # of entries invalidated from cache */
long cc_lsearches; /* total # list-searches */
long cc_lhits; /* # of matches against existing lists */
#endif
} CatCache;
在对postgres进程初始化时,会对syscache进行初始化,将查找系统表元组的关键信息写入到catcache数组的元素中。
涉及到的数据结构如下:
cacheinfo:存储所有系统表的catcache描述信息
struct cachedesc
{
Oid reloid; /* OID of the relation being cached */
Oid indoid; /* OID of index relation for this cache */
int reloidattr; /* attr number of rel OID reference, or 0 */
int nkeys; /* # of keys needed for cache lookup */
int key[4]; /* attribute numbers of key attrs */
int nbuckets; /* number of hash buckets for this cache */
};
static const struct cachedesc cacheinfo[] = {
{AggregateRelationId, /* AGGFNOID */
AggregateFnoidIndexId,
1,
{
Anum_pg_aggregate_aggfnoid,
0,
0,
0
},
16
},
...
}
catcacheheader:catcache使用cc_next字段构成一个单向链表,头部使用此结构体记录
typedef struct catcacheheader
{
slist_head ch_caches; /* head of list of CatCache structs */
int ch_ntup; /* # of tuples in all caches */
} CatCacheHeader;
初始化阶段1:使用cacheinfo初始化catcache数组
typedef struct catcache
{
...
TupleDesc cc_tupdesc; /* tuple descriptor (copied from reldesc) */
int cc_nbuckets; /* # of hash buckets in this cache */
dlist_head *cc_bucket; /* hash buckets */
int cc_keyno[CATCACHE_MAXKEYS]; /* AttrNumber of each key */
int cc_nkeys; /* # of keys (1..CATCACHE_MAXKEYS) */
Oid cc_reloid; /* OID of relation the tuples come from */
Oid cc_indexoid; /* OID of index matching cache keys */
...
}
初始化阶段2:根据对应的系统表填充catcache中元组描述信息(cc_tupdesc)、系统表名(cc_relname)和查找关键字的相关字段
typedef struct catcache
{
...
CCHashFN cc_hashfunc[CATCACHE_MAXKEYS]; /* hash function for each key */
CCFastEqualFN cc_fastequal[CATCACHE_MAXKEYS]; /* fast equal function for each key */
const char *cc_relname; /* name of relation the tuples come from */
bool cc_relisshared; /* is relation shared across databases? */
ScanKeyData cc_skey[CATCACHE_MAXKEYS]; /* precomputed key info for heap scans */
...
}
每个catcache元素中cc_bucket数组是一个Hash桶数组,元组的键值可以通过hash函数映射到cc_bucket数组的下标。每个hash桶都被组织成一个双向链表Dllist,其中的节点为Dlelem类型,Dlelem是一个包装过的缓存元组,其dle_val字段指向一个CatCTup形式的缓存元组。
CatCache中的缓存元组将先包装成CatCTup形式,然后再包装成Dlelem形式,最后加入到其所在的hash桶链表中。
typedef struct dlist_node dlist_node;
struct dlist_node
{
dlist_node *prev;
dlist_node *next;
};
typedef struct catctup
{
int ct_magic; /* for identifying CatCTup entries */
#define CT_MAGIC 0x57261502
uint32 hash_value; /* hash value for this tuple's keys */
Datum keys[CATCACHE_MAXKEYS];
dlist_node cache_elem; /* list member of per-bucket list */
int refcount; /* number of active references */
bool dead; /* dead but not yet removed? 标记删除*/
bool negative; /* negative cache entry? 表示实际并不存在的元组*/
HeapTupleData tuple; /* tuple management header */
struct catclist *c_list; /* containing CatCList, or NULL if none */
CatCache *my_cache; /* link to owning catcache */
} CatCTup;
在catcache查找元组有两种方式:精确匹配和部分匹配。
精确匹配由SearchCatCache函数实现:
HeapTuple
SearchCatCache(CatCache *cache,
Datum v1,
Datum v2,
Datum v3,
Datum v4);
2. 部分匹配
部分匹配由SearchCatCacheList实现:
SearchCatCacheList(CatCache *cache,
int nkeys,
Datum v1,
Datum v2,
Datum v3)
该函数返回一个CatCList数据结构,返回的所有结果通过链表的方式管理。
typedef struct catclist
{
int cl_magic; /* for identifying CatCList entries */
#define CL_MAGIC 0x52765103
uint32 hash_value; /* hash value for lookup keys */
dlist_node cache_elem; /* list member of per-catcache list */
/*
* Lookup keys for the entry, with the first nkeys elements being valid.
* All by-reference are separately allocated.
*/
Datum keys[CATCACHE_MAXKEYS];
int refcount; /* number of active references */
bool dead; /* dead but not yet removed? */
bool ordered; /* members listed in index order? */
short nkeys; /* number of lookup keys specified */
int n_members; /* number of member tuples */
CatCache *my_cache; /* link to owning catcache */
CatCTup *members[FLEXIBLE_ARRAY_MEMBER]; /* members */
} CatCList;
RelCache存放的不是元组,而是RelationData数据,每一个RelationData结构表示一个表的模式信息,这些信息由系统表元组中的信息构造而来。
typedef struct RelationData
{
RelFileNode rd_node; /* relation physical identifier */
struct SMgrRelationData *rd_smgr; /* 表的文件句柄 */
。 ...
Form_pg_class rd_rel; /* 表在pg_class系统表中对应的元组里的信息 */
TupleDesc rd_att; /* 表的元组描述符,描述了表的各个属性 */
Oid rd_id; /* relation's object id */
List *rd_indexlist; /* list of OIDs of indexes on relation */
Bitmapset *rd_indexattr; /* identifies columns used in indexes */
Oid rd_oidindex; /* OID of unique index on OID, if any */
...
Form_pg_index rd_index; /* pg_index tuple describing this index */
...
} RelationData;
由于RelationData数据结构是不变的,采用了hash表维持这个结构。这个hash表也是 PG 内部应用最多最广的 hash 数据结构,其性能和稳定性在PostgreSQL 近三十年的生涯中历经磨练。这个 hash表的实现也是非常值得学习的工业级数据结。
动态hash表介绍参考:https://zhmin.github.io/posts/postgresql-dynamic-hash/
初始化阶段1:调用RelationCacheInitialize函数进行初始化,创建hash表。
初始化阶段2:将必要的系统表和系统表索引的模式加入到RelCache中,包括pg_class、pg_attribute、pg_proc、pg_type。
插入新打开的表
当打开新表时,需要把RelationData加入到RelCache中,该操作通过宏RelationCacheInsert
来实现。
查找hash表
查找hash表通过宏定义RelationIdCacheLookup
来实现,调用函数hash_search。
relation_open
RelationIdGetRelation
RelationIdCacheLookup(relationId, rd);
RelationBuildDesc -- no reldesc in the cache, RelationBuildDesc() build one and add it.
RelationBuildTupleDesc
....
RelationCacheInsert(relation);
从hash表中删除
从hash表中删除元素通过宏定义RelationCacheDelete
实现。
RelationClearRelation
RelationCacheDelete