postgresql 表文件介绍

文章目录

  • 前言
  • 集群目录结构
  • 表文件存储形式
  • 结尾

前言

本文是基于postgresql 14的代码进行分析解读,演示是在centos8系统上进行。


数据库中用SQL可以访问的表,在数据库中实际以文件的形式存储,一个或多个文件与之对应。

  • 集群目录结构

postgres数据库,通过 initdb初始化一个数据库集群目录,目录下存放着当前集群的所有数据,在磁盘上以目录和文件的方式来组织。我们下面看一下,集群目录的结构。

./zptest/

├── base

│   ├── 1

│   ├── 4

│   └── 5

├── global

│   ├── 1213

│   ├── 1213_fsm

│   ├── 1213_vm

│   ├── 1214

│   ├── 1232

│   ├── 1233

│   ├── 1260

│   ├── 1260_fsm

│   ├── 1260_vm

│   ├── 1261

│   ├── 1261_fsm

│   ├── 1261_vm

│   ├── 1262

│   ├── 1262_fsm

│   ├── 1262_vm

│   ├── 2396

│   ├── 2396_fsm

│   ├── 2396_vm

│   ├── 2397

│   ├── 2671

│   ├── 2672

│   ├── 2676

│   ├── 2677

│   ├── 2694

│   ├── 2695

│   ├── 2697

│   ├── 2698

│   ├── 2846

│   ├── 2847

│   ├── 2964

│   ├── 2965

│   ├── 2966

│   ├── 2967

│   ├── 3592

│   ├── 3593

│   ├── 4060

│   ├── 4061

│   ├── 4175

│   ├── 4176

│   ├── 4177

│   ├── 4178

│   ├── 4181

│   ├── 4182

│   ├── 4183

│   ├── 4184

│   ├── 4185

│   ├── 4186

│   ├── 6000

│   ├── 6001

│   ├── 6002

│   ├── 6100

│   ├── 6114

│   ├── 6115

│   ├── pg_control

│   └── pg_filenode.map

├── pg_commit_ts

├── pg_dynshmem

├── pg_hba.conf

├── pg_ident.conf

├── pg_logical

│   ├── mappings

│   ├── replorigin_checkpoint

│   └── snapshots

├── pg_multixact

│   ├── members

│   └── offsets

├── pg_notify

├── pg_replslot

├── pg_serial

├── pg_snapshots

├── pg_stat

├── pg_stat_tmp

├── pg_subtrans

│   └── 0000

├── pg_tblspc

├── pg_twophase

├── PG_VERSION

├── pg_wal

│   ├── 000000010000000000000001

│   └── archive_status

├── pg_xact

│   └── 0000

├── postgresql.auto.conf

└── postgresql.conf

数据库的表文件存储在base目录下。我们有很多database,那么base目录下,那个目录是自己的数据库呢?

每个database都有一个OID,目录以OID来命名。

/*

 * Object ID is a fundamental type in Postgres.

 */

typedef unsigned int Oid;
postgres=# select oid, datname  from pg_database order by oid;

 oid |  datname 

-----+-----------

   1 | template1

   4 | template0

   5 | postgres

(3 rows)

我们当前使用的默认数据库postgres,OID是5,那么路径在base/5/下面,我们来验证一下。

postgres=# select pg_relation_filepath('pg_class');

 pg_relation_filepath

----------------------

 base/5/1259

(1 row)

当前数据库的表pg_class的表文件在base/5/下面。

  • 表文件存储形式

(1) 表文件与表OID的关系:

数据库对象都有一个唯一的OID标识,数据表也不例外,一般情况下,表文件名也和数据库的OID相同,如下:

postgres=# create table test(id integer);

CREATE TABLE

postgres=# select oid from pg_class where relname='test';

  oid 

-------

 16384

(1 row)



postgres=# select pg_relation_filepath('test');

 pg_relation_filepath

----------------------

 base/5/16384

(1 row)

但是这种对应关系也会发生变化,如vaccum full时;所以要找到正确的对应,需要用pg_relation_filepath来查询。

两者如何映射,是由pg_filenode.map文件来维护。

/*

 * The map file is critical data: we have no automatic method for recovering

 * from loss or corruption of it.  We use a CRC so that we can detect

 * corruption.  To minimize the risk of failed updates, the map file should

 * be kept to no more than one standard-size disk sector (ie 512 bytes),

 * and we use overwrite-in-place rather than playing renaming games.

 * The struct layout below is designed to occupy exactly 512 bytes, which

 * might make filesystem updates a bit more efficient.

 *

 * Entries in the mappings[] array are in no particular order.  We could

 * speed searching by insisting on OID order, but it really shouldn't be

 * worth the trouble given the intended size of the mapping sets.

 */

#define RELMAPPER_FILENAME                "pg_filenode.map"

(2) 表文件大小:

表文件大小,由于各操作系统对于文件大小的限制,postgres将每个表文件限制到了1GB。

当表数据超过1GB时,会创建新的表文件,表文件名由oid.1 oid.2 … 编号,来拆分成多个文件。

/* RELSEG_SIZE is the maximum number of blocks allowed in one disk file. Thus,

   the maximum size of a single file is RELSEG_SIZE * BLCKSZ; relations bigger

   than that are divided into multiple files. RELSEG_SIZE * BLCKSZ must be

   less than your OS' limit on file size. This is often 2 GB or 4GB in a

   32-bit operating system, unless you have large file support enabled. By

   default, we make the limit 1 GB to avoid any possible integer-overflow

   problems within the OS. A limit smaller than necessary only means we divide

   a large relation into more chunks than necessary, so it seems best to err

   in the direction of a small limit. A power-of-2 value is recommended to

   save a few cycles in md.c, but is not absolutely required. Changing

   RELSEG_SIZE requires an initdb. */

#define RELSEG_SIZE 131072

BLCKSZ * RELSEG_SIZE 来限制每个表文件里的block数量,BLCKSZ 默认为8KB;


结尾

作者邮箱:[email protected]
如有错误或者疏漏欢迎指出,互相学习。

注:未经同意,不得转载!

你可能感兴趣的:(postgresql,#,postgresql,数据库)