postgresql13部署主从同步和切换+pg_rewind

环境:
Os:Centos 7
DB:13.8
主库:192.168.1.134
从库:192.168.1.135  

参考网站:https://www.cnblogs.com/hxlasky/p/16810443.html

################主从部署######################

  1. 主库创建流复制的用户
    postgres=# CREATE ROLE replica login replication encrypted password 'replica';

 

  1. 主库修改pg_hba.conf文件,允许备库IP通过复制用户访问数据库

切换root用户 su root

vi /opt/pg13/data/pg_hba.conf

# replication privilege.

local   replication     all                                     trust

host    replication     all             127.0.0.1/32            trust

host    replication     all             ::1/128                 trust

#此配置置于ipv4

host    replication     replica         192.168.1.0/24          md5 ## 新增的,我这里整个网段开放

或是具体指定ip

# replication privilege.

local   replication     all                                     trust

host    replication     all             127.0.0.1/32            trust

host    replication     all             ::1/128                 trust

host    replication     replica         192.168.1.135/32        md5 ## 具体指定ip

需要重新reload,否则报错连接不了
[postgres@host134 ~]$ pg_ctl -D /opt/pg13/data reload

3.停掉从库
su - postgres
pg_ctl -D /opt/pg13/data -l /opt/pg13/log/postgres.log stop

 

  1. 从库准备data目录
    从库安装完成后,不初始化,若已经初始化,删除其data目录
    若之前安装的pg有data目录的话需要将其删除掉,并创建一个空的相同的目录
    su - postgres
    [postgres@host135 ~]$ cd /opt/pg13
    [postgres@host135 pg13]$ mv data bakdata
    [postgres@host135 pg13]$ mkdir data

创建归档目录,保持与主库一致

[postgres@host135 pg13]$mkdir -p /opt/pg13/archivelog


注意权限要正确,不对的话需要进行修改,root用户下修改权限
[root@host135 ~]# chown -R postgres:postgres /opt/pg13
[root@host135 ~]# chmod 0700 /opt/pg13/data

 

5.备库上执行对于主库的基础备份
[postgres@host135 pg13]$pg_basebackup -h 192.168.1.134 -p 5432 -U replica --password -X stream -Fp --progress -D /opt/pg13/data -R
注意,备份选项上带有-R选项.

[postgres@host135 pg13]$ pg_basebackup -h 192.168.1.134 -p 5432 -U replica --password -X stream -Fp --progress -D /opt/pg13/data -R
Password:
pg_basebackup: error: FATAL: no pg_hba.conf entry for replication connection from host "192.168.1.135", user "replica", SSL off

原因1:

是主库修改了pg_hba.conf,没有reload,执行如下reload即可
pg_ctl -D /opt/pg13/data reload

原因2:

如果操作失败尝试:防火窗是否链拦截,把虚拟机中的防火墙清一下 

sudo iptables -F

打开主节点5432端口

firewall-cmd --permanent --zone=public --add-port=5432/tcp

firewall-cmd --state

firewall-cmd --reload

[postgres@host135 pg13]$ pg_basebackup -h 192.168.1.134 -p 5432 -U replica --password -X stream -Fp --progress -D /opt/pg13/data -R
Password:
32247/32247 kB (100%), 1/1 tablespace

执行了pg_basebackup命令,从库会把主库的 postgresql.conf,pg_hba.conf文件也拷贝过来了的
现在这两个文件的内容主从库是一致的.

若是在归档模式下的话,需要从库创建同样的归档目录 

6.备库就可以执行pg_ctl start启动了
这时,就可以看到备库服务器上自动生成了standby.signal文件,同时,也看到在$PGDATA路径下,数据库自动帮我们配置了关于流复制的主库的信息:

[postgres@host135 data]$ ls -1

backup_label

backup_manifestbase

current_logfilesglobal

log

pg_commit_ts

pg_dynshmem

pg_hba.conf

pg_ident.conf

pg_logical

pg_multixact

pg_notify

pg_replslot

pg_serial

pg_snapshots

pg_stat

pg_stat_tmp

pg_subtrans

pg_tblspc

pg_twophase

PG_VERSION

pg_wal

pg_xact

postgresql.auto.conf

postgresql.confstandby.signal

也看到在$PGDATA路径下,数据库会复制主库的pg_hba.conf,postgresql.conf这两个文件到从库,这个时候主从库配置文件保持了一致,若需要修改的,也可以修改,比如端口号.

同时postgresql.auto.conf,数据库自动帮我们配置了关于流复制的主库的信息
[postgres@host135 data]$ more postgresql.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
primary_conninfo = 'user=replica password=replica channel_binding=disable host=192.168.1.134 port=5432 sslmode=disable sslcompression=0 ssl_min_protocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres target_session_attrs=any'

当然了,如果我们没有使用-R来备份主库的话.我们完全可以在备库上手工创建standby.signal文件,然后手工编辑postgresql.conf(不是postgresql.auto.conf文件),并在其内容中配置主库的信息.

7.启动从库
pg_ctl -D /opt/pg13/data -l /opt/pg13/log/postgres.log start

报错:
2022-10-19 10:16:25 CST [32043]: [1-1] user=,db=,app=,client=LOG: redirecting log output to logging collector process
2022-10-19 10:16:25 CST [32043]: [2-1] user=,db=,app=,client=HINT: Future log output will appear in directory "/opt/pg13/log".
2022-10-19 10:57:31 CST [3551]: [1-1] user=,db=,app=,client=FATAL: data directory "/opt/pg13/data" has invalid permissions
2022-10-19 10:57:31 CST [3551]: [2-1] user=,db=,app=,client=DETAIL: Permissions should be u=rwx (0700) or u=rwx,g=rx (0750).

解决办法:
root用户下修改权限
[root@host135 ~]# chown -R postgres:postgres /opt/pg13
[root@host135 ~]# chmod 0700 /opt/pg13/data

 

  1. 主库查看数据库复制信息

进入数据库:psql -h localhost -U postgres -p 5432

postgres=# select * from pg_stat_replication;

 pid  | usesysid | usename | application_name |  client_addr   | client_hostname | client_port |         backend_start         | backend_xmin |   state   | sent_lsn  | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state |          reply_time           

------+----------+---------+------------------+----------------+-----------------+-------------+-------------------------------+--------------+-----------+-----------+-----------+-----------+------------+-----------+-----------+------------+---------------+------------+-------------------------------

 2197 |    16403 | replica | walreceiver      | 192.168.88.130 |                 |       34058 | 2023-06-09 19:23:29.105932+08 |              | streaming | 0/7000060 | 0/7000060 | 0/7000060 | 0/7000060  |           |           |            |             0 | async      | 2023-06-09 19:24:59.403341+08

(1 row)

9.进程查看
从库进程

[postgres@host135 data]$ ps -ef|grep postgres

postgres  3815     1  0 10:59 ?        00:00:00 /opt/pg13/bin/postgres -D /opt/pg13/data

postgres  3816  3815  0 10:59 ?        00:00:00 postgres: logger

postgres  3817  3815  0 10:59 ?        00:00:00 postgres: startup recovering 00000001000000000000001B

postgres  3818  3815  0 10:59 ?        00:00:00 postgres: checkpointer

postgres  3819  3815  0 10:59 ?        00:00:00 postgres: background writer

postgres  3820  3815  0 10:59 ?        00:00:00 postgres: stats collector

postgres  3821  3815  0 10:59 ?        00:00:00 postgres: walreceiver streaming 0/1B000148

postgres  3864 26618  0 11:00 pts/1    00:00:00 ps -ef

postgres  3865 26618  0 11:00 pts/1    00:00:00 grep --color=auto postgres

root     26617 25114  0 09:26 pts/1    00:00:00 su - postgres

postgres 26618 26617  0 09:26 pts/1    00:00:00 -bash

主库进程

[postgres@host134 data]$ ps -ef|grep postgres

postgres 11073     1  0 Oct18 ?        00:00:00 /opt/pg13/bin/postgres -D /opt/pg13/data

postgres 11074 11073  0 Oct18 ?        00:00:00 postgres: logger

postgres 11077 11073  0 Oct18 ?        00:00:00 postgres: checkpointer

postgres 11078 11073  0 Oct18 ?        00:00:00 postgres: background writer

postgres 11079 11073  0 Oct18 ?        00:00:00 postgres: walwriter

postgres 11080 11073  0 Oct18 ?        00:00:00 postgres: autovacuum launcher

postgres 11081 11073  0 Oct18 ?        00:00:00 postgres: archiver last was 00000001000000000000001A.00000028.backup

postgres 11082 11073  0 Oct18 ?        00:00:01 postgres: stats collector

postgres 11083 11073  0 Oct18 ?        00:00:00 postgres: logical replication launcher

postgres 11294 11073  0 Oct18 ?        00:00:00 postgres: postgres postgres 192.168.1.134(40882) idle

postgres 21407 11073  0 10:59 ?        00:00:00 postgres: walsender replica 192.168.1.135(50736) streaming 0/1B000148

主库
[postgres@host134 20221021]$ pg_controldata /opt/pg13/data/| grep 'Database cluster state'
Database cluster state: in production

备库
[postgres@host135 bin]$ pg_controldata /opt/pg13/data/| grep 'Database cluster state'
Database cluster state: in archive recovery

10.数据验证

登录从库

[postgres@host135 data]$ psql -h 192.168.1.135 -U postgres

Password for user postgres:

psql (13.8)

Type "help" for help.

postgres=# \c db_test;

You are now connected to database "db_test" as user "postgres".

db_test=# select * from tb_test;

 id | name  |         createtime         |         modifytime         ----+-------+----------------------------+----------------------------

  1 | name1 | 2022-10-18 11:32:33.649901 | 2022-10-18 11:32:33.649901

  2 | name2 | 2022-10-18 11:32:33.665863 | 2022-10-18 11:32:33.665863

  3 | name3 | 2022-10-18 11:32:33.691182 | 2022-10-18 11:32:33.691182

  4 | name4 | 2022-10-18 11:32:33.771843 | 2022-10-18 11:32:33.771843

  5 | name5 | 2022-10-18 11:32:34.496502 | 2022-10-18 11:32:34.496502

(5 rows)

主库写入:

[postgres@host134 data]$ psql -h 192.168.1.134 -U postgres

Password for user postgres:

psql (13.8)

Type "help" for help.

postgres=# \c db_test;

You are now connected to database "db_test" as user "postgres".

db_test=# select * from tb_test;

 id | name  |         createtime         |         modifytime         ----+-------+----------------------------+----------------------------

  1 | name1 | 2022-10-18 11:32:33.649901 | 2022-10-18 11:32:33.649901

  2 | name2 | 2022-10-18 11:32:33.665863 | 2022-10-18 11:32:33.665863

  3 | name3 | 2022-10-18 11:32:33.691182 | 2022-10-18 11:32:33.691182

  4 | name4 | 2022-10-18 11:32:33.771843 | 2022-10-18 11:32:33.771843

  5 | name5 | 2022-10-18 11:32:34.496502 | 2022-10-18 11:32:34.496502

(5 rows)

db_test=# insert into tb_test(name) values('name6');

INSERT 0 1

从库查询:

[postgres@host135 data]$ psql -h 192.168.1.135 -U postgres

Password for user postgres:

psql (13.8)

Type "help" for help.

postgres=# \c db_test;

You are now connected to database "db_test" as user "postgres".

db_test=# select * from tb_test;

 id | name  |         createtime         |         modifytime         ----+-------+----------------------------+----------------------------

  1 | name1 | 2022-10-18 11:32:33.649901 | 2022-10-18 11:32:33.649901

  2 | name2 | 2022-10-18 11:32:33.665863 | 2022-10-18 11:32:33.665863

  3 | name3 | 2022-10-18 11:32:33.691182 | 2022-10-18 11:32:33.691182

  4 | name4 | 2022-10-18 11:32:33.771843 | 2022-10-18 11:32:33.771843

  5 | name5 | 2022-10-18 11:32:34.496502 | 2022-10-18 11:32:34.496502

  6 | name6 | 2022-10-19 11:04:56.543939 | 2022-10-19 11:04:56.543939

(6 rows)

尝试从库写入数据
db_test=# insert into tb_test(name) values('name7');
ERROR: cannot execute INSERT in a read-only transaction

从库尝试归档
db_test=# select pg_switch_wal();
ERROR: recovery is in progress
HINT: WAL control functions cannot be executed during recovery.

 

 

#####################主从切换####################

 

1.主库停止,模拟故障
192.168.1.134上执行
##查看状态
[postgres@host134 data]$ pg_ctl -D /opt/pg13/data status
pg_ctl: server is running (PID: 24009)
/opt/pg13/bin/postgres "-D" "/opt/pg13/data"

[postgres@host134 data]$ pg_controldata /opt/pg13/data/| grep 'Database cluster state'
Database cluster state: in production

##停止数据库
[postgres@host134 data]$ pg_ctl -D /opt/pg13/data -l /opt/pg13/log/postgres.log stop -m fast
waiting for server to shut down.... done
server stopped

 

2.备库提升为新主库,对外提供服务
在备库192.168.1.135上执行
[postgres@host135 data]$ pg_ctl promote -D /opt/pg13/data
waiting for server to promote.... done
server promoted


重要1:启动备库为新主库的命令是pg_ctl promote。
提升备库为主库之后,可以看到,后台进程中不再有startup recovering,以及walreceiver streaming进程了.
同时,多了postgres: walwriter 写进程.

重要2:$PGDATA/standby.signal文件自动消失了. 这是告诉PostgreSQL,我现在不再是备库了,我的身份是主库了.

 

 

3.新主库删除primary_conninfo条目
192.168.1.135上操作

这里将之前主从同步的信息删除掉,postgresql.auto.conf文件中的 primary_conninfo

[postgres@host135 data]$ psql -h 192.168.1.135 -U postgres -p 5432

Password for user postgres:

psql (13.8)

Type "help" for help.

postgres=# show primary_conninfo;

                      primary_conninfo                      ------------------------------------------------------------

 user=replica password=replica host=192.168.1.135 port=5432

(1 row)

postgres=# alter system set primary_conninfo='';

ALTER SYSTEM

或者

alter system set primary_conninfo=default; ##postgresql.auto.conf会删除条目,若postgresql.conf中定义了该参数,重启后会读取该文件的参数

重新 reload

[postgres@host135 data]$ pg_ctl -D /opt/pg13/data reload

[postgres@host135 data]$ psql -h 192.168.1.135 -U postgres -p 5432

postgres=# show primary_conninfo;

 primary_conninfo ------------------

(1 row)

 

4.在新主库写入数据
192.168.1.135上执行

[postgres@host135 data]$ psql -h 192.168.1.135 -U hxl -d db_test -p 5432

insert into tb_test(name) values('name9');

insert into tb_test(name) values('name10');

insert into tb_test(name) values('name11');

insert into tb_test(name) values('name12');

insert into tb_test(name) values('name13');

insert into tb_test(name) values('name14');

insert into tb_test(name) values('name15');

insert into tb_test(name) values('name16');

insert into tb_test(name) values('name17');

insert into tb_test(name) values('name18');

insert into tb_test(name) values('name19');

insert into tb_test(name) values('name20');

db_test=> select * from tb_test;

 id |  name  |         createtime         |         modifytime         ----+--------+----------------------------+----------------------------

  1 | name1  | 2022-10-18 11:32:33.649901 | 2022-10-18 11:32:33.649901

  2 | name2  | 2022-10-18 11:32:33.665863 | 2022-10-18 11:32:33.665863

  3 | name3  | 2022-10-18 11:32:33.691182 | 2022-10-18 11:32:33.691182

  4 | name4  | 2022-10-18 11:32:33.771843 | 2022-10-18 11:32:33.771843

  5 | name5  | 2022-10-18 11:32:34.496502 | 2022-10-18 11:32:34.496502

  6 | name6  | 2022-10-19 11:04:56.543939 | 2022-10-19 11:04:56.543939

  7 | name7  | 2022-10-19 11:25:52.236651 | 2022-10-19 11:25:52.236651

  8 | name8  | 2022-10-20 09:21:51.977815 | 2022-10-20 09:21:51.977815

 41 | name9  | 2022-10-20 14:22:26.326255 | 2022-10-20 14:22:26.326255

 42 | name10 | 2022-10-20 14:22:26.34316  | 2022-10-20 14:22:26.34316

 43 | name11 | 2022-10-20 14:22:26.359988 | 2022-10-20 14:22:26.359988

 44 | name12 | 2022-10-20 14:22:26.433694 | 2022-10-20 14:22:26.433694

 45 | name13 | 2022-10-20 14:22:26.451945 | 2022-10-20 14:22:26.451945

 46 | name14 | 2022-10-20 14:22:26.469966 | 2022-10-20 14:22:26.469966

 47 | name15 | 2022-10-20 14:22:26.482091 | 2022-10-20 14:22:26.482091

 48 | name16 | 2022-10-20 14:22:26.498319 | 2022-10-20 14:22:26.498319

 49 | name17 | 2022-10-20 14:22:26.524554 | 2022-10-20 14:22:26.524554

 50 | name18 | 2022-10-20 14:22:26.555449 | 2022-10-20 14:22:26.555449

 51 | name19 | 2022-10-20 14:22:26.591774 | 2022-10-20 14:22:26.591774

 52 | name20 | 2022-10-20 14:22:27.587955 | 2022-10-20 14:22:27.587955

 

5.新主库修改pg_hba.conf文件
192.168.1.135上操作
修改新主库(原备库192.168.1.135)的$PGDATA/pg_hba.conf文件,在其中添加允许新备库(原主库192.168.1.134)可以通过replica用户访问数据库的条目信息。

vi /opt/pg13/data/pg_hba.conf

host replication all 192.168.1.134/32 md5

若之前就是以网段的方式开通的话,可以不需要修改,如下:
host replication replica 192.168.1.0/24 md5

修改了pg_hba.conf文件不需要重新启动,重新加载即可
[postgres@host135 data]$ pg_ctl -D /opt/pg13/data reload
server signaled

 

6.原主库新建$PGDATA/standby.signal文件
192.168.1.134上操作
[postgres@host134 data]$ cd /opt/pg13/data
[postgres@host134 data]$ touch standby.signal

[postgres@host134 data]$ pwd
/opt/pg13/data
[postgres@host134 data]$ ll standby.signal
-rw-rw-r-- 1 postgres postgres 0 Oct 20 14:27 standby.signal

注意:这一步骤非常非常重要,如果不配置该文件的话,那么原来的主库一旦重新启动话,就将成为了1个新的独立主库,脱离了主从数据库环境

  1. 原主库修改$PGDATA/postgresql.conf文件,添加复制条目
    192.168.1.134上操作
    [postgres@host134 data]$ vi postgresql.conf
    添加如下项:
    primary_conninfo='user=replica password=replica host=192.168.1.135 port=5432'

primary_conninfo='user=replica password=1q2!Q@ host=192.168.88.130 port=5432'

 

  1. 启动原主库,变为新备库

192.168.1.134上操作

[postgres@host134 data]$pg_ctl -D /opt/pg13/data -l /opt/pg13/log/postgres.log start

[postgres@host134 data]$ ps -ef|grep postgres

postgres  6975     1  2 15:34 ?        00:00:00 /opt/pg13/bin/postgres -D /opt/pg13/data

postgres  6976  6975  0 15:34 ?        00:00:00 postgres: logger

postgres  6977  6975  0 15:34 ?        00:00:00 postgres: startup recovering 000000010000000000000007

postgres  6979  6975  0 15:34 ?        00:00:00 postgres: checkpointer

postgres  6980  6975  0 15:34 ?        00:00:00 postgres: background writer

postgres  6981  6975  0 15:34 ?        00:00:00 postgres: stats collector

postgres  6982  6975  0 15:34 ?        00:00:00 postgres: walreceiver idle

发现这里进程是:walreceiver idle,说明没有原来主库无法加入作为备库加入集群,看错误日志:

[postgres@host134 log]$ pwd/opt/pg13/log

[postgres@host134 log]$ tail -2f postgresql-2022-10-21.log2022-10-21 15:36:39 CST [6982]: [25-1] user=,db=,app=,client=LOG:  primary server contains no more WAL on requested timeline 12022-10-21 15:36:39 CST [6977]: [28-1] user=,db=,app=,client=LOG:  new timeline 2 forked off current database system timeline 1 before current recovery point 0/70000A0

解决办法:

[postgres@host134 pg13]$ pg_ctl -D /opt/pg13/data -l /opt/pg13/log/postgres.log stop -m fast

waiting for server to shut down.... done

server stopped

[postgres@host134 pg13]$ pg_rewind -D /opt/pg13/data --source-server='host=192.168.1.135 port=5432 user=postgres dbname=postgres password=postgres'

pg_rewind: servers diverged at WAL location 0/7000000 on timeline 1

pg_rewind: error: could not open file "/opt/pg13/data/pg_wal/000000010000000000000006": No such file or directory

pg_rewind: fatal: could not find previous WAL record at 0/6000410

这里提示wal日志不存在000000010000000000000006,将不存在的归档文件拷贝到wal目录,若还是提示wal日志文件不存在需要继续拷贝到wal目录

[postgres@host134 20221021]$ pwd/opt/pg13/archivelog/20221021

[postgres@host135 20221021]$ cp 000000010000000000000006 /opt/pg13/data/pg_wal/

[postgres@host134 20221021]$ pg_rewind -D /opt/pg13/data --source-server='host=192.168.1.135 port=5432 user=postgres dbname=postgres password=postgres'

pg_rewind: servers diverged at WAL location 0/7000000 on timeline 1

pg_rewind: rewinding from last common checkpoint at 0/5000060 on timeline 1

pg_rewind: Done!

使用了 pg_rewind 后,系统会把主库的postgresql.auto.conf和postgresql.conf文件都拷贝过来了,这个时候需要重新修改postgresql.conf文件中的primary_conninfo,其他的参数看情况修改

 

9.原主库修改$PGDATA/postgresql.conf文件
192.168.1.134上操作

pg_rewind后添加,若没有pg_remind操作,上面的步骤7已结添加了条目,该步骤可以省略
[postgres@host134 data]$ vi postgresql.conf
添加如下项:
primary_conninfo='user=replica password=replica host=192.168.1.135 port=5432'

 

10.重新生成standby.signal文件
pg_rewind后没有了该文件standby.signal,需要重新生成
[postgres@host134 data]$ cd /opt/pg13/data
[postgres@host134 data]$ touch standby.signal

 

11.重启动新备库
[postgres@host134 data]$ pg_ctl -D /opt/pg13/data -l /opt/pg13/log/postgres.log start

 

12.数据验证
新从库
psql -h 192.168.1.134 -U hxl -d db_test -p 5432

新主库
psql -h 192.168.1.135 -U hxl -d db_test -p 5432

你可能感兴趣的:(数据库,postgresql,ubuntu,docker,redis,数据库)