PostgreSQL中page页结构

在PG中,磁盘存储和内存中的最小管理单位都是page,也是通常所说的block。一般PG页的大小为8K,在源码编译时可以设置。此后都不可更改,因为许多PG内存结构设计都是以此为基础的。

在一个page中,表的记录是从page的底部开始存储,然后慢慢向上涨。Page结构图如下:

PostgreSQL中page页结构_第1张图片

上图为一个page的结构,主要由5个部分组成:

    Page Header:为页头,主要存储LSN,page中空闲空间的开始offset和结束offset等。下面再展开讲。

    ItemId data:是page中表记录的索引条目。一个索引条目4个字节,由两部分组成:此记录在page中的offset和记录长度length。

    Free space:是此page中剩余可用的空间,不算标记为delete后的空间;是指完全没有被使用的空间,也相当于page中没有被分配的空间。

    Item:就是指表实际存储的记录。

    Special space: 存储索引访问方法(AM: Access Method)信息,不同的索引访问方法,内容不一样。但如果是表的page,那么这里是空的,没有任何信息。

源码在src/backend/storage/page/bufpage.c中,以下为Page的初始化:

PostgreSQL中page页结构_第2张图片

Page header 24个字节说明如下:PostgreSQL中page页结构_第3张图片

PageHeader 源码定义如下:

/*
 * disk page organization
 *
 * space management information generic to any page
 *
 *		pd_lsn		- identifies xlog record for last change to this page.
 *		pd_checksum - page checksum, if set.
 *		pd_flags	- flag bits.
 *		pd_lower	- offset to start of free space.
 *		pd_upper	- offset to end of free space.
 *		pd_special	- offset to start of special space.
 *		pd_pagesize_version - size in bytes and page layout version number.
 *		pd_prune_xid - oldest XID among potentially prunable tuples on page.
 *
 * The LSN is used by the buffer manager to enforce the basic rule of WAL:
 * "thou shalt write xlog before data".  A dirty buffer cannot be dumped
 * to disk until xlog has been flushed at least as far as the page's LSN.
 *
 * pd_checksum stores the page checksum, if it has been set for this page;
 * zero is a valid value for a checksum. If a checksum is not in use then
 * we leave the field unset. This will typically mean the field is zero
 * though non-zero values may also be present if databases have been
 * pg_upgraded from releases prior to 9.3, when the same byte offset was
 * used to store the current timelineid when the page was last updated.
 * Note that there is no indication on a page as to whether the checksum
 * is valid or not, a deliberate design choice which avoids the problem
 * of relying on the page contents to decide whether to verify it. Hence
 * there are no flag bits relating to checksums.
 *
 * pd_prune_xid is a hint field that helps determine whether pruning will be
 * useful.  It is currently unused in index pages.
 *
 * The page version number and page size are packed together into a single
 * uint16 field.  This is for historical reasons: before PostgreSQL 7.3,
 * there was no concept of a page version number, and doing it this way
 * lets us pretend that pre-7.3 databases have page version number zero.
 * We constrain page sizes to be multiples of 256, leaving the low eight
 * bits available for a version number.
 *
 * Minimum possible page size is perhaps 64B to fit page header, opaque space
 * and a minimal tuple; of course, in reality you want it much bigger, so
 * the constraint on pagesize mod 256 is not an important restriction.
 * On the high end, we can only support pages up to 32KB because lp_off/lp_len
 * are 15 bits.
 */

typedef struct PageHeaderData
{
	/* XXX LSN is member of *any* block, not only page-organized ones */
	PageXLogRecPtr pd_lsn;		/* LSN: next byte after last byte of xlog
								 * record for last change to this page */
	uint16		pd_checksum;	/* checksum */
	uint16		pd_flags;		/* flag bits, see below */
	LocationIndex pd_lower;		/* offset to start of free space */
	LocationIndex pd_upper;		/* offset to end of free space */
	LocationIndex pd_special;	/* offset to start of special space */
	uint16		pd_pagesize_version;
	TransactionId pd_prune_xid; /* oldest prunable XID, or zero if none */
	ItemIdData	pd_linp[FLEXIBLE_ARRAY_MEMBER]; /* line pointer array */
} PageHeaderData;

typedef PageHeaderData *PageHeader;

其中,PageXLogRecPtr为一个结构体,64位。记录xlog信息的原因:

  1. 保证buffer manger WAL原则,即写日志先于写数据

  2. 脏块checkpoint时,日志先刷出到disk

/*
 * For historical reasons, the 64-bit LSN value is stored as two 32-bit
 * values.
 */
typedef struct
{
	uint32		xlogid;			/* high bits */
	uint32		xrecoff;		/* low bits */
} PageXLogRecPtr;

总的来讲,PG中页的结构大体上与Oracle的Block结构是比较类似的,都是采用向上涨的方式来存储记录。但是在小细节上还是分别比较大的。Oracle的Block中还有ITL等事务相关信息等。

你可能感兴趣的:(postgresql,数据库)