基于时间点的恢复(PITR)简介
数据库的PITR是一般数据库都必须满足的技术;
其原理是依据之前的物理备份文件加上wal的预写日志模式备份做的恢复;
该技术支持8.*及以上版本。
recovery.conf文件还原点控制参数
1)命名的还原点
recovery_target_name = ‘’ # e.g.‘daily backup 2018-01-14‘
指pg_create_restore_point(text)创建的还原点,如果数据库中有多个重复命名的还原点,遇到第一个则停止。
因为它不需要从abort或commit判断结束点, 不需要判断参数recovery_target_inclusive的值。
2)目标时间还原点
recovery_target_time = ‘’ # e.g.‘2018-01-14 22:39:00 EST‘
指XLOG中记录的recordXtime(xl_xact_commit_compact->xact_time),配合recovery_target_inclusive使用;
如果在同一个时间点有多个事务回滚或提交:
其值为false则恢复到这个时间点第一个回滚或提交的事务(含)
其值为true则恢复到这个时间点最后一个回滚或提交的事务(含)
如果时间点上刚好只有1个事务回滚或提交:
那么其值为true和false一样,恢复将处理到这个事务包含的xlog信息(含)
如果时间点没有匹配的事务提交或回滚信息:
那么其值true和false一样,恢复将处理到这个时间后的下一个事务回滚或提交的xlog信息(含)
3)XID事务还原点
recovery_target_xid, 指XLogRecord->xl_xid,可以配合recovery_target_inclusive使用,但是recovery_target_inclusive只影响日志的输出,并不影响恢复进程截至点的选择,截至都截止于这个xid的xlog位置.也就是说无论如何都包含了这个事务的xlog信息的recovery.
这里需要特别注意xid的信息体现在结束时,而不是分配xid时.所以恢复到xid=100提交|回滚点,可能xid=102已经先提交了.那么包含xid=102的xlog信息会被recovery.
结论:PITR技术对于7*24小时支撑是至关重要的,但是如果数据库非常小,增大pg_dump备份的频率可能更方便,但对于大数据库就需要了。
命名的还原点示例(其他场景后续实验测试!!)
先以pg_basebackup的方法完成一个基础备份,参考指令如下:
cd /dbbak
mkdir `date +%F` ; pg_basebackup -F t -x -D /dbbak/`date +%F` -h 192.168.137.222 -p 1921 -U repl
生成的备份文件如下:
[postgres@hgdb01 2018-01-26]$ ls -l
total 46212
-rw-rw-r--. 1 postgres postgres 19968 Jan 26 07:11 16400.tar
-rw-rw-r--. 1 postgres postgres 47299584 Jan 26 07:11 base.tar
事务时间线:
begin1;query1;commit1; begin2; query2_1; pg_create_restore_point(text); query2_2; commit2;
按照以上描述, 使用recovery_target_name恢复数据库恢复后应该包含query1的变更,但是不包含query2_1和query2_2的变更.
begin1;query1;commit1; -- SESSION A :
testdb=# create table pitr_test(id int, info text);
CREATE TABLE
testdb=# insert into pitr_test values (1,'test');
INSERT 0 1
begin2; query2_1; -- SESSION A :
testdb=# begin;
BEGIN
testdb=# insert into pitr_test values (2,'test');
INSERT 0 1
pg_create_restore_point(text); -- SESSION B :
testdb=# select pg_create_restore_point('pitr_test');
pg_create_restore_point
-------------------------
0/240234D8
(1 row)
query2_2; commit2; -- SESSION B :
testdb=# insert into pitr_test values (3,'test');
INSERT 0 1
testdb=# commit;
COMMIT
testdb=# select * from pitr_test;
id | info
----+------
1 | test
3 | test
(2 rows)
切换日志, 归档 :
testdb=# checkpoint;
CHECKPOINT
testdb=# select pg_xlogfile_name(pg_switch_xlog());
pg_xlogfile_name
-
--------------------------
000000030000000000000024
(
(1 row)
d
testdb=# checkpoint;
CHECKPOINT
testdb=# select pg_xlogfile_name(pg_switch_xlog());
pg_xlogfile_name
-
--------------------------
000000030000000F0000000D
(
(1 row)
使用pg_basebackkup的备份包以及归档日志还原:
关闭数据库
[postgres@hgdb01 ~] pg_ctl stop -m fast
waiting for server to shut down.... done
server stopped
备份恢复
删除数据库目录文件以及表空间目录文件,然后将备份的数据包分别在两个目录中解压缩
注意表空间文件解压后,需要手动在$PGDATA/data/pg_tblspc目录下创建软连接
ln -s /pgtbls/tbls01 ./16400
创建pg_log目录,存放日志 :
[postgres@hgdb01 ~] cd $PGDATA
[postgres@hgdb01 ~] mkdir -p pg_log
配置$PGDATA/recovery.conf
[postgres@hgdb01 ~] cd $PGDATA
[postgres@hgdb01 ~] cp $PGHOME/share/recovery.conf.sample ./recovery.conf
vi recovery.conf
restore_command = 'cp /ssd/pg957/arch/20180118/%f %p'
recovery_target_name = 'pitr_test'
recovery_target_timeline = 'latest'
启动数据库
[postgres@hgdb01 ~] pg_ctl start
[postgres@hgdb01 data]$ pg_ctl start
server starting
[postgres@hgdb01 data]$ LOG: database system was interrupted; last known up at 2018-01-26 07:11:08 CST
cp: cannot stat ‘/ssd/pg957/arch/20180126/00000004.history’: No such file or directory
LOG: starting point-in-time recovery to "pitr_test"
cp: cannot stat ‘/ssd/pg957/arch/20180126/00000003.history’: No such file or directory
LOG: restored log file "000000030000000000000023" from archive
LOG: redo starts at 0/23000028
LOG: consistent recovery state reached at 0/230000F8
LOG: database system is ready to accept read only connections
LOG: restored log file "000000030000000000000024" from archive
LOG: file "pg_clog/0000" doesn't exist, reading as zeroes
CONTEXT: xlog redo Standby/LOCK: xid 1954 db 32839 rel 40979
LOG: recovery stopping at restore point "pitr_test", time 2018-01-26 07:13:08.588656+08
LOG: recovery has paused
HINT: Execute pg_xlog_replay_resume() to continue.
查看还原点是否与预计匹配 :
[postgres@hgdb01 ~] psql
p
psql (9.5.7devel)
T
Type "help" for help.
d
testdb=# select ctid,* from pitr_test ;
ctid | id | info
-
-------+----+------
(0,1) | 1 | test
(
(1 row)
d
testdb=# insert into pitr_test values(2,'new');
ERROR: cannot execute INSERT in a read-only transaction
STATEMENT: insert into pitr_test values(2,'new');
ERROR: cannot execute INSERT in a read-only transaction
testdb=#
testdb=# select pg_xlog_replay_resume();
pg_xlog_replay_resume
-----------------------
(1 row)
testdb=# LOG: redo done at 0/24023470
LOG: last completed transaction was at log time 2018-01-26 07:12:01.39043+08
cp: cannot stat ‘/ssd/pg957/arch/20180126/00000004.history’: No such file or directory
LOG: selected new timeline ID: 4
cp: cannot stat ‘/ssd/pg957/arch/20180126/00000003.history’: No such file or directory
LOG: archive recovery complete
LOG: MultiXact member wraparound protections are now enabled
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
testdb=# insert into pitr_test values(2,'new');
INSERT 0 1
testdb=# select ctid,* from pitr_test ;
ctid | id | info
-------+----+------
(0,1) | 1 | test
(0,3) | 2 | new
(2 rows)
结论:新插入的数据ctid=3, 说明query2_1的xlog信息被恢复了,但是回滚了.如果没有执行info=new的ctid应该=2。
另外,按照命名的时间点恢复后数据库的状态是可查询,不可以写入的,如果需要写入需要执行 select pg_xlog_replay_resume();
参考链接:https://yq.aliyun.com/articles/59359
by 波罗