一、pg_rewind 简介
pg_rewind 是postgresql主丛数据库之同步数据目录的工具。
pg_rewind只复制表数据文件中更改的块;所有其他文件都被完整复制,包括配置文件。pg_rewind相对于使用pg_basebackup备份或rsync等工具的优势在于,pg_rewind不需要读取数据库中未更改的块。这使得在数据库很大且之间只有一小部分块不同的情况下,速度会快得多。
二、使用前提:
需要目标服务器在postgresql.conf 中允许wal_log_hints,
或者
在 initdb初始化集群时允许 checksums ,full_page_writes也必须为on
三、命令详解:
[atlasdb@vm1 ~]$ /usr/local/atlasdb/bin/pg_rewind --help
pg_rewind resynchronizes a PostgreSQL cluster with another copy of the cluster.
Usage:
pg_rewind [OPTION]...
Options:
-D, --target-pgdata=DIRECTORY existing data directory to modify
--source-pgdata=DIRECTORY source data directory to synchronize with
--source-server=CONNSTR source server to synchronize with
-n, --dry-run stop before modifying anything
-P, --progress write progress messages
--debug write a lot of debug messages
-V, --version output version information, then exit
-?, --help show this help, then exit
Report bugs to .
-D directory --target-pgdata=directory
此选项指定与源同步的目标数据目录。在运行pg_rewind之前,必须干净关闭目标服务器
--source-pgdata=directory
指定要与之同步的源服务器的数据目录的文件系统路径。此选项要求干净关闭源服务器
--source-server=connstr
指定要连接到源PostgreSQL服务器的libpq连接字符串。连接必须是具有超级用户访问权限的正常(非复制)连接。此选项要求源服务器正在运行,而不是处于恢复模式。
-n --dry-run
除了实际修改目标目录之外,执行所有操作。
-P --progress
输出进展报告。
--debug
输出很多Debug的信息。如果失败时,可以用此选项定位错误原因。
例如:
pg_rewind --target-pgdata=/data/atlasdb --source-server='host=192.168.0.201 port=5432 user=atlasdb password=atlasdb dbname=atlasdb' -P --debug
四、测试
环境准备
准备两台主机,安装postgresql v11.5。并配置好流复制
- vm1: 192.168.0.201 主库
- vm2: 192.168.0.202 从库
注:我这里用于测试的版本是 AtlasDB v2.7,基于 postgresql v11.5 优化的版本,显示可能不一样。
vm1、vm2环境变量设置:
export PGDATABASE=atlasdb
export PGHOST=localhost
export PGPORT=5432
export PGUSER=atlasdb
export PGHOME=/home/atlasdb/atlasdb
export PGDATA=/data/atlasdb
export PATH=$PGHOME/bin:$HOME/bin:$PATH
export LD_LIBRARY_PATH=$PGHOME/lib/:$LD_LIBRARY_PATH
vm1查询流复制情况
[atlasdb@vm1 ~]$ psql
psql (11.5)
PSQL: Release 2.7.0
Connected to:
AtlasDB V2.7 Enterprise Edition Release - 64-bit Production
Type "help" for help.
atlasdb=# select * from pg_stat_replication ;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority |
sync_state
------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+--------------+-----------+-----------+-----------+-----------+------------+-----------+-----------+------------+---------------+
------------
3421 | 10 | atlasdb | walreceiver | 192.168.0.202 | | 33462 | 2019-12-25 18:38:34.168977+08 | | streaming | 0/3000060 | 0/3000060 | 0/3000060 | 0/3000060 | | | | 0 |
async
(1 row)
atlasdb=# select pg_is_in_recovery();
pg_is_in_recovery
-------------------
f
(1 row)
atlasdb=#
测试过程
主库vm1:创建测试表和测试数据
atlasdb=# create table test_tab(id int,name varchar(20),e_mail varchar(20),d_id int);
CREATE TABLE
atlasdb=# \dt+
List of relations
Schema | Name | Type | Owner | Size | Description
--------+----------+-------+---------+---------+-------------
public | test_tab | table | atlasdb | 0 bytes |
(1 row)
atlasdb=# insert into test_tab values(1,'hemny','[email protected]',10);
INSERT 0 1
atlasdb=# select * from test_tab ;
id | name | e_mail | d_id
----+-------+--------------+------
1 | hemny | [email protected] | 10
(1 row)
atlasdb=#
从库vm2:查询数据
[atlasdb@vm2 ~]$ psql
psql (11.5)
PSQL: Release 2.7.0
Connected to:
AtlasDB V2.7 Enterprise Edition Release - 64-bit Production
Type "help" for help.
atlasdb=# select pg_is_in_recovery();
pg_is_in_recovery
-------------------
t
(1 row)
atlasdb=# \dt+
List of relations
Schema | Name | Type | Owner | Size | Description
--------+----------+-------+---------+------------+-------------
public | test_tab | table | atlasdb | 8192 bytes |
(1 row)
atlasdb=# select * from test_tab ;
id | name | e_mail | d_id
----+-------+--------------+------
1 | hemny | [email protected] | 10
(1 row)
atlasdb=#
提升从库vm2为新主库
[atlasdb@vm2 ~]$ pg_ctl promote
waiting for server to promote.... done
server promoted
[atlasdb@vm2 ~]$ psql
psql (11.5)
PSQL: Release 2.7.0
Connected to:
AtlasDB V2.7 Enterprise Edition Release - 64-bit Production
Type "help" for help.
atlasdb=# select pg_is_in_recovery();
pg_is_in_recovery
-------------------
f
(1 row)
atlasdb=#
在旧主库vm1插入一条记录,模拟旧主库上的数据没有复制到新主库上
atlasdb=# insert into test_tab values(2,'hemny_pg1','[email protected]',10);
INSERT 0 1
atlasdb=# select * from test_tab ;
id | name | e_mail | d_id
----+-----------+------------------+------
1 | hemny | [email protected] | 10
2 | hemny_pg1 | [email protected] | 10
(2 rows)
atlasdb=#
在新主库vm2(原从库)插入一条记录
atlasdb=# insert into test_tab values(2,'hemny_pg2','[email protected]',10);
INSERT 0 1
atlasdb=# select * from test_tab ;
id | name | e_mail | d_id
----+-----------+------------------+------
1 | hemny | [email protected] | 10
2 | hemny_pg2 | [email protected] | 10
(2 rows)
atlasdb=#
将原主库变成新主库的从库
--停止原主库
[atlasdb@vm1 ~]$ pg_ctl -D /data/atlasdb -l logfile stop
waiting for server to shut down.... done
server stopped
--配置流复制文件和参数
[atlasdb@vm1 ~]$ vi $PGDATA/recovery.conf
[atlasdb@vm1 ~]$ cat $PGDATA/recovery.conf
standby_mode = 'on'
primary_conninfo = 'user=atlasdb password=atlasdb host=192.168.0.202 port=5432 sslmode=prefer sslcompression=0 target_session_attrs=any'
[atlasdb@vm1 ~]$
-- 启动数据库
[atlasdb@vm1 ~]$ pg_ctl -D /data/atlasdb -l logfile start
waiting for server to start.... done
server started
[atlasdb@vm1 ~]$ psql
psql (11.5)
PSQL: Release 2.7.0
Connected to:
AtlasDB V2.7 Enterprise Edition Release - 64-bit Production
Type "help" for help.
atlasdb=# select pg_is_in_recovery();
pg_is_in_recovery
-------------------
t
(1 row)
atlasdb=# select * from test_tab ;
id | name | e_mail | d_id
----+-----------+------------------+------
1 | hemny | [email protected] | 10
2 | hemny_pg1 | [email protected] | 10
(2 rows)
这里可以看到,新主库(vm2)上的记录没有复制过来
vm1的日志信息
[atlasdb@vm1 ~]$ cat logfile
2019-12-25 18:51:16.434 CST [3441] LOG: listening on IPv4 address "0.0.0.0", port 5432
2019-12-25 18:51:16.434 CST [3441] LOG: listening on IPv6 address "::", port 5432
2019-12-25 18:51:16.438 CST [3441] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2019-12-25 18:51:16.453 CST [3442] LOG: database system was shut down at 2019-12-25 18:46:51 CST
2019-12-25 18:51:16.453 CST [3442] LOG: entering standby mode
2019-12-25 18:51:16.457 CST [3442] LOG: consistent recovery state reached at 0/301BA58
2019-12-25 18:51:16.457 CST [3442] LOG: invalid record length at 0/301BA58: wanted 24, got 0
2019-12-25 18:51:16.457 CST [3441] LOG: database system is ready to accept read only connections
Customer:'debug user'
Begins On:'1970-01-01 08:00:01'
Expires On:'273795839-07-29 17:46:41'
2019-12-25 18:51:16.492 CST [3446] LOG: fetching timeline history file for timeline 2 from primary server
2019-12-25 18:51:16.499 CST [3446] LOG: started streaming WAL from primary at 0/3000000 on timeline 1
2019-12-25 18:51:16.500 CST [3446] LOG: replication terminated by primary server
2019-12-25 18:51:16.500 CST [3446] DETAIL: End of WAL reached on timeline 1 at 0/301B8B8.
2019-12-25 18:51:16.502 CST [3446] LOG: restarted WAL streaming at 0/3000000 on timeline 1
2019-12-25 18:51:16.503 CST [3446] LOG: replication terminated by primary server
2019-12-25 18:51:16.503 CST [3446] DETAIL: End of WAL reached on timeline 1 at 0/301B8B8.
2019-12-25 18:51:21.509 CST [3446] LOG: restarted WAL streaming at 0/3000000 on timeline 1
2019-12-25 18:51:21.511 CST [3446] LOG: replication terminated by primary server
2019-12-25 18:51:21.511 CST [3446] DETAIL: End of WAL reached on timeline 1 at 0/301B8B8.
2019-12-25 18:51:26.515 CST [3446] LOG: restarted WAL streaming at 0/3000000 on timeline 1
2019-12-25 18:51:26.516 CST [3446] LOG: replication terminated by primary server
2019-12-25 18:51:26.516 CST [3446] DETAIL: End of WAL reached on timeline 1 at 0/301B8B8.
vm1上停止新从库
[atlasdb@vm1 ~]$ pg_ctl -D /data/atlasdb -l logfile stop
waiting for server to shut down.... done
server stopped
[atlasdb@vm1 ~]$
在vm1上使得pg_rewind 同步数据库时间线
[atlasdb@vm1 ~]$ pg_rewind --target-pgdata=$PGDATA --source-server='host=192.168.0.202 port=5432 user=atlasdb password=atlasdb dbname=atlasdb' -P
connected to server
servers diverged at WAL location 0/301B8B8 on timeline 1
rewinding from last common checkpoint at 0/301B810 on timeline 1
reading source file list
reading target file list
reading WAL in target
need to copy 52 MB (total source directory size is 76 MB)
53300/53300 kB (100%) copied
creating backup label and updating control file
syncing target data directory
Done!
[atlasdb@vm1 ~]$
注意,pg_rewind之后,需要重新配置流复制文件
[atlasdb@vm1 ~]$ mv $PGDATA/recovery.done $PGDATA/recovery.conf
[atlasdb@vm1 ~]$ vi $PGDATA/recovery.conf
[atlasdb@vm1 ~]$ cat $PGDATA/recovery.conf
standby_mode = 'on'
primary_conninfo = 'user=atlasdb password=atlasdb host=192.168.0.202 port=5432 sslmode=prefer sslcompression=0 target_session_attrs=any'
[atlasdb@vm1 ~]$
注意要修改连接信息
在vm1上启动新的从库
[atlasdb@vm1 ~]$ pg_ctl -D /data/atlasdb -l logfile start
waiting for server to start.... done
server started
[atlasdb@vm1 ~]$ psql
psql (11.5)
PSQL: Release 2.7.0
Connected to:
AtlasDB V2.7 Enterprise Edition Release - 64-bit Production
Type "help" for help.
atlasdb=# select pg_is_in_recovery();
pg_is_in_recovery
-------------------
t
(1 row)
atlasdb=# select * from test_tab ;
id | name | e_mail | d_id
----+-----------+------------------+------
1 | hemny | [email protected] | 10
2 | hemny_pg2 | [email protected] | 10
(2 rows)
atlasdb=#
在新主库上插入的记录已同步,原主库没有复制到丛库的记录消失,完成
验证:
新主库vm2中
[atlasdb@vm2 ~]$ psql
psql (11.5)
PSQL: Release 2.7.0
Connected to:
AtlasDB V2.7 Enterprise Edition Release - 64-bit Production
Type "help" for help.
atlasdb=# select pg_is_in_recovery();
pg_is_in_recovery
-------------------
f
(1 row)
atlasdb=# select * from pg_stat_replication ;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag |
sync_priority | sync_state
------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+--------------+-----------+-----------+-----------+-----------+------------+-----------------+----------------+-----------------+
---------------+------------
3399 | 10 | atlasdb | walreceiver | 192.168.0.201 | | 38144 | 2019-12-25 18:59:59.275035+08 | | streaming | 0/30475D8 | 0/30475D8 | 0/30475D8 | 0/30475D8 | 00:00:00.006679 | 00:00:00.00858 | 00:00:00.008581 |
0 | async
(1 row)
atlasdb=# select * from test_tab ;
id | name | e_mail | d_id
----+-----------+------------------+------
1 | hemny | [email protected] | 10
2 | hemny_pg2 | [email protected] | 10
(2 rows)
atlasdb=# insert into test_tab values(3,'hemny_pg2_new','[email protected]',10);
INSERT 0 1
atlasdb=# select * from test_tab ;
id | name | e_mail | d_id
----+---------------+----------------------+------
1 | hemny | [email protected] | 10
2 | hemny_pg2 | [email protected] | 10
3 | hemny_pg2_new | [email protected] | 10
(3 rows)
atlasdb=#
新从库vm1中
atlasdb=# select * from test_tab ;
id | name | e_mail | d_id
----+---------------+----------------------+------
1 | hemny | [email protected] | 10
2 | hemny_pg2 | [email protected] | 10
3 | hemny_pg2_new | [email protected] | 10
(3 rows)
atlasdb=#