一、pg_rewind 简介

pg_rewind 是postgresql主丛数据库之同步数据目录的工具。

pg_rewind只复制表数据文件中更改的块;所有其他文件都被完整复制，包括配置文件。pg_rewind相对于使用pg_basebackup备份或rsync等工具的优势在于，pg_rewind不需要读取数据库中未更改的块。这使得在数据库很大且之间只有一小部分块不同的情况下，速度会快得多。

二、使用前提：

需要目标服务器在postgresql.conf 中允许wal_log_hints，
或者
在 initdb初始化集群时允许 checksums ，full_page_writes也必须为on

三、命令详解：

[atlasdb@vm1 ~]$ /usr/local/atlasdb/bin/pg_rewind --help
pg_rewind resynchronizes a PostgreSQL cluster with another copy of the cluster.

Usage:
  pg_rewind [OPTION]...

Options:
  -D, --target-pgdata=DIRECTORY  existing data directory to modify
      --source-pgdata=DIRECTORY  source data directory to synchronize with
      --source-server=CONNSTR    source server to synchronize with
  -n, --dry-run                  stop before modifying anything
  -P, --progress                 write progress messages
      --debug                    write a lot of debug messages
  -V, --version                  output version information, then exit
  -?, --help                     show this help, then exit

Report bugs to .

-D directory --target-pgdata=directory
此选项指定与源同步的目标数据目录。在运行pg_rewind之前，必须干净关闭目标服务器

--source-pgdata=directory
指定要与之同步的源服务器的数据目录的文件系统路径。此选项要求干净关闭源服务器

--source-server=connstr
指定要连接到源PostgreSQL服务器的libpq连接字符串。连接必须是具有超级用户访问权限的正常(非复制)连接。此选项要求源服务器正在运行，而不是处于恢复模式。

-n --dry-run
除了实际修改目标目录之外，执行所有操作。

-P --progress
输出进展报告。

--debug
输出很多Debug的信息。如果失败时，可以用此选项定位错误原因。

例如：

pg_rewind --target-pgdata=/data/atlasdb --source-server='host=192.168.0.201 port=5432 user=atlasdb password=atlasdb dbname=atlasdb' -P --debug

四、测试

环境准备
准备两台主机，安装postgresql v11.5。并配置好流复制

vm1： 192.168.0.201 主库
vm2： 192.168.0.202 从库
注：我这里用于测试的版本是 AtlasDB v2.7，基于 postgresql v11.5 优化的版本，显示可能不一样。

vm1、vm2环境变量设置：

export PGDATABASE=atlasdb
export PGHOST=localhost
export PGPORT=5432
export PGUSER=atlasdb
export PGHOME=/home/atlasdb/atlasdb
export PGDATA=/data/atlasdb
export PATH=$PGHOME/bin:$HOME/bin:$PATH
export LD_LIBRARY_PATH=$PGHOME/lib/:$LD_LIBRARY_PATH

vm1查询流复制情况

[atlasdb@vm1 ~]$ psql
psql (11.5)
PSQL: Release 2.7.0
Connected to:
AtlasDB V2.7 Enterprise Edition Release - 64-bit Production
Type "help" for help.

atlasdb=# select * from pg_stat_replication ;
 pid  | usesysid | usename | application_name |  client_addr  | client_hostname | client_port |         backend_start         | backend_xmin |   state   | sent_lsn  | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority |
 sync_state 
------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+--------------+-----------+-----------+-----------+-----------+------------+-----------+-----------+------------+---------------+
------------
 3421 |       10 | atlasdb | walreceiver      | 192.168.0.202 |                 |       33462 | 2019-12-25 18:38:34.168977+08 |              | streaming | 0/3000060 | 0/3000060 | 0/3000060 | 0/3000060  |           |           |            |             0 |
 async
(1 row)

atlasdb=# select pg_is_in_recovery();
 pg_is_in_recovery 
-------------------
 f
(1 row)

atlasdb=#

测试过程
主库vm1：创建测试表和测试数据

atlasdb=# create table test_tab(id int,name varchar(20),e_mail varchar(20),d_id int);
CREATE TABLE
atlasdb=# \dt+
                      List of relations
 Schema |   Name   | Type  |  Owner  |  Size   | Description 
--------+----------+-------+---------+---------+-------------
 public | test_tab | table | atlasdb | 0 bytes | 
(1 row)

atlasdb=# insert into test_tab values(1,'hemny','[email protected]',10);
INSERT 0 1
atlasdb=# select * from test_tab ;
 id | name  |    e_mail    | d_id 
----+-------+--------------+------
  1 | hemny | [email protected] |   10
(1 row)

atlasdb=#

从库vm2：查询数据

[atlasdb@vm2 ~]$ psql
psql (11.5)
PSQL: Release 2.7.0
Connected to:
AtlasDB V2.7 Enterprise Edition Release - 64-bit Production
Type "help" for help.

atlasdb=# select pg_is_in_recovery();
 pg_is_in_recovery 
-------------------
 t
(1 row)

atlasdb=# \dt+
                       List of relations
 Schema |   Name   | Type  |  Owner  |    Size    | Description 
--------+----------+-------+---------+------------+-------------
 public | test_tab | table | atlasdb | 8192 bytes | 
(1 row)

atlasdb=#  select * from test_tab ;
 id | name  |    e_mail    | d_id 
----+-------+--------------+------
  1 | hemny | [email protected] |   10
(1 row)

atlasdb=#

提升从库vm2为新主库

[atlasdb@vm2 ~]$ pg_ctl promote
waiting for server to promote.... done
server promoted
[atlasdb@vm2 ~]$ psql
psql (11.5)
PSQL: Release 2.7.0
Connected to:
AtlasDB V2.7 Enterprise Edition Release - 64-bit Production
Type "help" for help.

atlasdb=# select pg_is_in_recovery();
 pg_is_in_recovery 
-------------------
 f
(1 row)

atlasdb=#

在旧主库vm1插入一条记录，模拟旧主库上的数据没有复制到新主库上

atlasdb=# insert into test_tab values(2,'hemny_pg1','[email protected]',10);
INSERT 0 1
atlasdb=# select * from test_tab ; 
 id |   name    |      e_mail      | d_id 
----+-----------+------------------+------
  1 | hemny     | [email protected]     |   10
  2 | hemny_pg1 | [email protected] |   10
(2 rows)

atlasdb=#

在新主库vm2（原从库）插入一条记录

atlasdb=#  insert into test_tab values(2,'hemny_pg2','[email protected]',10);
INSERT 0 1
atlasdb=# select * from test_tab ;
 id |   name    |      e_mail      | d_id 
----+-----------+------------------+------
  1 | hemny     | [email protected]     |   10
  2 | hemny_pg2 | [email protected] |   10
(2 rows)

atlasdb=#

将原主库变成新主库的从库

--停止原主库
[atlasdb@vm1 ~]$  pg_ctl -D /data/atlasdb -l logfile stop
waiting for server to shut down.... done
server stopped

--配置流复制文件和参数
[atlasdb@vm1 ~]$ vi $PGDATA/recovery.conf
[atlasdb@vm1 ~]$ cat $PGDATA/recovery.conf 
standby_mode = 'on'
primary_conninfo = 'user=atlasdb password=atlasdb host=192.168.0.202 port=5432 sslmode=prefer sslcompression=0 target_session_attrs=any'
[atlasdb@vm1 ~]$ 

-- 启动数据库
[atlasdb@vm1 ~]$  pg_ctl -D /data/atlasdb -l logfile start
waiting for server to start.... done
server started
[atlasdb@vm1 ~]$ psql
psql (11.5)
PSQL: Release 2.7.0
Connected to:
AtlasDB V2.7 Enterprise Edition Release - 64-bit Production
Type "help" for help.

atlasdb=# select pg_is_in_recovery();                                                
 pg_is_in_recovery 
-------------------
 t
(1 row)

atlasdb=# select * from test_tab ; 
 id |   name    |      e_mail      | d_id 
----+-----------+------------------+------
  1 | hemny     | [email protected]     |   10
  2 | hemny_pg1 | [email protected] |   10
(2 rows)

这里可以看到，新主库（vm2）上的记录没有复制过来

vm1的日志信息

[atlasdb@vm1 ~]$ cat logfile 
2019-12-25 18:51:16.434 CST [3441] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2019-12-25 18:51:16.434 CST [3441] LOG:  listening on IPv6 address "::", port 5432
2019-12-25 18:51:16.438 CST [3441] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2019-12-25 18:51:16.453 CST [3442] LOG:  database system was shut down at 2019-12-25 18:46:51 CST
2019-12-25 18:51:16.453 CST [3442] LOG:  entering standby mode
2019-12-25 18:51:16.457 CST [3442] LOG:  consistent recovery state reached at 0/301BA58
2019-12-25 18:51:16.457 CST [3442] LOG:  invalid record length at 0/301BA58: wanted 24, got 0
2019-12-25 18:51:16.457 CST [3441] LOG:  database system is ready to accept read only connections
    Customer:'debug user'
   Begins On:'1970-01-01 08:00:01'
  Expires On:'273795839-07-29 17:46:41'
2019-12-25 18:51:16.492 CST [3446] LOG:  fetching timeline history file for timeline 2 from primary server
2019-12-25 18:51:16.499 CST [3446] LOG:  started streaming WAL from primary at 0/3000000 on timeline 1
2019-12-25 18:51:16.500 CST [3446] LOG:  replication terminated by primary server
2019-12-25 18:51:16.500 CST [3446] DETAIL:  End of WAL reached on timeline 1 at 0/301B8B8.
2019-12-25 18:51:16.502 CST [3446] LOG:  restarted WAL streaming at 0/3000000 on timeline 1
2019-12-25 18:51:16.503 CST [3446] LOG:  replication terminated by primary server
2019-12-25 18:51:16.503 CST [3446] DETAIL:  End of WAL reached on timeline 1 at 0/301B8B8.
2019-12-25 18:51:21.509 CST [3446] LOG:  restarted WAL streaming at 0/3000000 on timeline 1
2019-12-25 18:51:21.511 CST [3446] LOG:  replication terminated by primary server
2019-12-25 18:51:21.511 CST [3446] DETAIL:  End of WAL reached on timeline 1 at 0/301B8B8.
2019-12-25 18:51:26.515 CST [3446] LOG:  restarted WAL streaming at 0/3000000 on timeline 1
2019-12-25 18:51:26.516 CST [3446] LOG:  replication terminated by primary server
2019-12-25 18:51:26.516 CST [3446] DETAIL:  End of WAL reached on timeline 1 at 0/301B8B8.

vm1上停止新从库

[atlasdb@vm1 ~]$  pg_ctl -D /data/atlasdb -l logfile stop
waiting for server to shut down.... done
server stopped
[atlasdb@vm1 ~]$

在vm1上使得pg_rewind 同步数据库时间线

[atlasdb@vm1 ~]$ pg_rewind --target-pgdata=$PGDATA --source-server='host=192.168.0.202 port=5432 user=atlasdb password=atlasdb dbname=atlasdb' -P
connected to server
servers diverged at WAL location 0/301B8B8 on timeline 1
rewinding from last common checkpoint at 0/301B810 on timeline 1
reading source file list
reading target file list
reading WAL in target
need to copy 52 MB (total source directory size is 76 MB)
53300/53300 kB (100%) copied
creating backup label and updating control file
syncing target data directory
Done!
[atlasdb@vm1 ~]$

注意，pg_rewind之后，需要重新配置流复制文件

[atlasdb@vm1 ~]$ mv $PGDATA/recovery.done $PGDATA/recovery.conf
[atlasdb@vm1 ~]$ vi $PGDATA/recovery.conf
[atlasdb@vm1 ~]$ cat $PGDATA/recovery.conf                     
standby_mode = 'on'
primary_conninfo = 'user=atlasdb password=atlasdb host=192.168.0.202 port=5432 sslmode=prefer sslcompression=0 target_session_attrs=any'
[atlasdb@vm1 ~]$

注意要修改连接信息

在vm1上启动新的从库

[atlasdb@vm1 ~]$  pg_ctl -D /data/atlasdb -l logfile start
waiting for server to start.... done
server started
[atlasdb@vm1 ~]$ psql
psql (11.5)
PSQL: Release 2.7.0
Connected to:
AtlasDB V2.7 Enterprise Edition Release - 64-bit Production
Type "help" for help.

atlasdb=# select pg_is_in_recovery();
 pg_is_in_recovery 
-------------------
 t
(1 row)

atlasdb=# select * from test_tab ; 
 id |   name    |      e_mail      | d_id 
----+-----------+------------------+------
  1 | hemny     | [email protected]     |   10
  2 | hemny_pg2 | [email protected] |   10
(2 rows)

atlasdb=#

在新主库上插入的记录已同步，原主库没有复制到丛库的记录消失，完成

验证：

新主库vm2中

[atlasdb@vm2 ~]$ psql
psql (11.5)
PSQL: Release 2.7.0
Connected to:
AtlasDB V2.7 Enterprise Edition Release - 64-bit Production
Type "help" for help.

atlasdb=# select pg_is_in_recovery();
 pg_is_in_recovery 
-------------------
 f
(1 row)

atlasdb=# select * from pg_stat_replication ;
 pid  | usesysid | usename | application_name |  client_addr  | client_hostname | client_port |         backend_start         | backend_xmin |   state   | sent_lsn  | write_lsn | flush_lsn | replay_lsn |    write_lag    |   flush_lag    |   replay_lag    |
 sync_priority | sync_state 
------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+--------------+-----------+-----------+-----------+-----------+------------+-----------------+----------------+-----------------+
---------------+------------
 3399 |       10 | atlasdb | walreceiver      | 192.168.0.201 |                 |       38144 | 2019-12-25 18:59:59.275035+08 |              | streaming | 0/30475D8 | 0/30475D8 | 0/30475D8 | 0/30475D8  | 00:00:00.006679 | 00:00:00.00858 | 00:00:00.008581 |
             0 | async
(1 row)

atlasdb=# select * from test_tab ;
 id |   name    |      e_mail      | d_id 
----+-----------+------------------+------
  1 | hemny     | [email protected]     |   10
  2 | hemny_pg2 | [email protected] |   10
(2 rows)

atlasdb=# insert into  test_tab values(3,'hemny_pg2_new','[email protected]',10);
INSERT 0 1
atlasdb=# select * from test_tab ;
 id |     name      |        e_mail        | d_id 
----+---------------+----------------------+------
  1 | hemny         | [email protected]         |   10
  2 | hemny_pg2     | [email protected]     |   10
  3 | hemny_pg2_new | [email protected] |   10
(3 rows)

atlasdb=#

新从库vm1中

atlasdb=# select * from test_tab ; 
 id |     name      |        e_mail        | d_id 
----+---------------+----------------------+------
  1 | hemny         | [email protected]         |   10
  2 | hemny_pg2     | [email protected]     |   10
  3 | hemny_pg2_new | [email protected] |   10
(3 rows)

atlasdb=#

PostgreSQL 之 pg_rewind 详解

一、pg_rewind 简介

二、使用前提：

三、命令详解：

四、测试

在新主库上插入的记录已同步，原主库没有复制到丛库的记录消失，完成

验证：

你可能感兴趣的:(PostgreSQL 之 pg_rewind 详解)