主备库切换常用的有两种方式,第一种是使用触发文件,9.0之前的版本只能使用此种方式切换,第二个是使用命令promote的方式。
切换之前需要查看主备库的角色,查看角色的方式可以参考:
https://blog.csdn.net/m15217321304/article/details/86628843
文件触发方式的主要步骤如下:
1) 配置备库的recovery.conf文件trigger_file参数。
2)关闭主库,关闭方式建议使用 -m fast,干净的关闭
3)在备库上面创建trigger_file指定的文件,如果备库激活成功,recovery.conf会变成recovery.done
4) 原主库创建recovery.conf文件,然后按照配置备库的方式修改recovery.conf文件。
5)启动原主库(现在的备库)
1、主库IP 192.168.40.130 主机名:postgres 端口:5442
备库IP 192.168.40.131 主机名:postgreshot 端口:5442
2、查询当前数据库状态
postgres=# select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid | 43754
usesysid | 16384
usename | replica
application_name | walreceiver
client_addr | 192.168.40.131
client_hostname |
client_port | 36568
backend_start | 2019-01-24 16:01:03.500056-05
backend_xmin | 582
state | streaming
sent_lsn | 0/3033760
write_lsn | 0/3033760
flush_lsn | 0/3033760
replay_lsn | 0/3033760
write_lag |
flush_lag |
replay_lag |
sync_priority | 0
sync_state | async
3、配置备库的recovery.conf文件
[postgres@postgreshot pg11]$ hostname
postgreshot
[postgres@postgreshot pg11]$ cat recovery.conf |grep -iv '^#'
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=192.168.40.130 port=5442 user=replica' # e.g. 'host=localhost port=5432'
trigger_file = '/home/postgres/pg11/trigger'
[postgres@postgreshot pg11]$
备注:主备库的认证方式使用的.pgpass方式,建议不要把密码文件直接配置到recovery.conf文件里面。调整以上参数需要重启才能生效。
4、停止主库
[postgres@postgres ~]$ hostname
postgres
[postgres@postgres ~]$ pg_ctl stop -m fast
waiting for server to shut down.... done
server stopped
[postgres@postgres ~]$
5、在备库上激活成主库
激活之前的cluster state
[postgres@postgreshot pg11]$ pg_controldata | grep 'cluster'
Database cluster state: in archive recovery
[postgres@postgreshot pg11]$
创建激活需要的文件
[postgres@postgreshot pg11]$ touch /home/postgres/pg11/trigger
[postgres@postgreshot pg11]$ ls -ltr
total 140
drwx------ 4 postgres postgres 4096 Jan 24 08:03 pg_multixact
-rwx------ 1 postgres postgres 1636 Jan 24 08:03 pg_ident.conf
-rwx------ 1 postgres postgres 224 Jan 24 08:03 backup_label.old
-rwx------ 1 postgres postgres 88 Jan 24 08:03 postgresql.auto.conf
-rwx------ 1 postgres postgres 3 Jan 24 08:03 PG_VERSION
drwx------ 2 postgres postgres 4096 Jan 24 08:03 pg_commit_ts
drwx------ 2 postgres postgres 4096 Jan 24 08:03 pg_twophase
drwx------ 2 postgres postgres 4096 Jan 24 08:03 pg_tblspc
drwx------ 2 postgres postgres 4096 Jan 24 08:03 pg_serial
drwx------ 2 postgres postgres 4096 Jan 24 08:03 pg_replslot
drwx------ 2 postgres postgres 4096 Jan 24 08:03 pg_dynshmem
drwx------ 2 postgres postgres 4096 Jan 24 08:03 pg_xact
drwx------ 2 postgres postgres 4096 Jan 24 08:03 pg_snapshots
drwx------ 2 postgres postgres 4096 Jan 24 08:15 pg_subtrans
-rwx------ 1 postgres postgres 24406 Jan 24 10:31 postgresql.conf
drwx------ 5 postgres postgres 4096 Jan 24 10:36 base
-rwx------ 1 postgres postgres 4705 Jan 24 11:40 pg_hba.conf
-rwx------ 1 postgres postgres 5923 Jan 24 20:47 recovery.done <=====从recovery.conf变成了done
注:触发文件的位置必须和recovery.conf中配置的一致,激活之后conf变成了done。
激活之后的cluster state
[postgres@postgreshot pg11]$ pg_controldata |grep 'cluster'
Database cluster state: in production
[postgres@postgreshot pg11]$
6、将新主库(主机名:postgreshot)的recovery.done文件拷贝到原主库
[postgres@postgreshot pg11]$ scp recovery.done postgres:/home/postgres/pg11/
recovery.done 100% 5923 5.8KB/s 00:00
[postgres@postgreshot pg11]$
7、把recovery.done修改成recovery.conf,并修改如下内容
[postgres@postgres pg11]$ cat recovery.conf |grep -iv '^#'
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=192.168.40.131 port=5442 user=replica' # e.g. 'host=localhost port=5432'
trigger_file = '/home/postgres/pg11/trigger'
[postgres@postgres pg11]$
8、启动备库
[postgres@postgres pg11]$
[postgres@postgres pg11]$ hostname
postgres
[postgres@postgres pg11]$ pg_ctl start
waiting for server to start....2019-01-24 21:00:59.533 EST [77641] LOG: listening on IPv4 address "0.0.0.0", port 5442
2019-01-24 21:00:59.533 EST [77641] LOG: listening on IPv6 address "::", port 5442
2019-01-24 21:00:59.541 EST [77641] LOG: listening on Unix socket "/tmp/.s.PGSQL.5442"
2019-01-24 21:00:59.555 EST [77642] LOG: database system was shut down at 2019-01-24 20:51:19 EST
2019-01-24 21:00:59.555 EST [77642] LOG: entering standby mode
2019-01-24 21:00:59.581 EST [77642] LOG: consistent recovery state reached at 0/30337D0
2019-01-24 21:00:59.581 EST [77642] LOG: invalid record length at 0/30337D0: wanted 24, got 0
2019-01-24 21:00:59.582 EST [77641] LOG: database system is ready to accept read only connections
2019-01-24 21:00:59.659 EST [77646] LOG: fetching timeline history file for timeline 2 from primary server
done
server started
[postgres@postgres pg11]$ 2019-01-24 21:00:59.690 EST [77646] LOG: started streaming WAL from primary at 0/3000000 on timeline 1
2019-01-24 21:00:59.693 EST [77646] LOG: replication terminated by primary server
2019-01-24 21:00:59.693 EST [77646] DETAIL: End of WAL reached on timeline 1 at 0/30337D0.
2019-01-24 21:00:59.704 EST [77642] LOG: new target timeline is 2
2019-01-24 21:00:59.706 EST [77646] LOG: restarted WAL streaming at 0/3000000 on timeline 2
2019-01-24 21:00:59.973 EST [77642] LOG: redo starts at 0/30337D0
[postgres@postgres pg11]$ ps -ef |grep postgres
root 64511 63718 0 19:05 pts/4 00:00:00 su - postgres
postgres 64512 64511 0 19:05 pts/4 00:00:00 -bash
postgres 77641 1 0 21:00 pts/4 00:00:00 /opt/pgsql11/bin/postgres
postgres 77642 77641 0 21:00 ? 00:00:00 postgres: startup recovering 000000020000000000000003
postgres 77643 77641 0 21:00 ? 00:00:00 postgres: checkpointer
postgres 77644 77641 0 21:00 ? 00:00:00 postgres: background writer
postgres 77645 77641 0 21:00 ? 00:00:00 postgres: stats collector
postgres 77646 77641 0 21:00 ? 00:00:00 postgres: walreceiver streaming 0/3033918
postgres 77683 64512 0 21:01 pts/4 00:00:00 ps -ef
postgres 77684 64512 0 21:01 pts/4 00:00:00 grep postgres
[postgres@postgres pg11]$
9、查询cluster state
[postgres@postgres pg11]$ pg_controldata |grep 'cluster'
Database cluster state: in archive recovery
[postgres@postgres pg11]$
可以看到已经切换成功。
二、第二种切换方式pg_ctl promote方式
命令格式如下:
pg_ctl promote [-D datadir]
promote命令发出后,运行中的备库将停止恢复模式并切换成读写模式的主库。
切换步骤:
1)关闭主库,建议使用-m fast的模式
2)在备库上面执行pg_ctl promote 命令激活成主库,如果recovery.conf 变成了recovery.done表示备库已切换成主库。
3)在原主库创建recovery.conf文件。
4)启动原主库
演示过程
因为上面已经把主库postgres切换成了备库,postgreshot备库已经切换成主库.
当前主库
[postgres@postgreshot pg11]$ hostname
postgreshot
[postgres@postgreshot pg11]$ pg_controldata | grep 'cluster'
Database cluster state: in production
[postgres@postgreshot pg11]$
当前备库
[postgres@postgres pg11]$ hostname
postgres
[postgres@postgres pg11]$ pg_controldata |grep 'cluster'
Database cluster state: in archive recovery
[postgres@postgres pg11]$
1、关闭主库
[postgres@postgreshot pg11]$ hostname
postgreshot
[postgres@postgreshot pg11]$ pg_ctl stop -m fast
waiting for server to shut down.... done
server stopped
[postgres@postgreshot pg11]$
2、在备库执行promote命名激活成主库
[postgres@postgres pg11]$ cat recovery.conf |grep -iv '^#'
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=192.168.40.131 port=5442 user=replica' # e.g. 'host=localhost port=5432'
[postgres@postgres pg11]$
[postgres@postgres pg11]$ pg_ctl promote -D /home/postgres/pg11
waiting for server to promote.... done
server promoted
[postgres@postgres pg11]$
[postgres@postgres pg11]$
[postgres@postgres pg11]$ ls -ltr recovery*
-rwx------ 1 postgres postgres 5923 Jan 24 20:59 recovery.done
[postgres@postgres pg11]$ pg_controldata |grep 'cluster'
Database cluster state: in production
[postgres@postgres pg11]$
已经切换成主备。
3、把原主库的recovery.done修改成recovery.conf
[postgres@postgreshot pg11]$ mv recovery.done recovery.conf
[postgres@postgreshot pg11]$ cat recovery.conf |grep -iv '^#'
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=192.168.40.130 port=5442 user=replica' # e.g. 'host=localhost port=5432'
[postgres@postgreshot pg11]$
4、启动原主库
[postgres@postgreshot pg11]$ pg_ctl start
done
server started
[postgres@postgreshot pg11]$
[postgres@postgreshot pg11]$ pg_controldata |grep 'cluster'
Database cluster state: in archive recovery
[postgres@postgreshot pg11]$
原主库变成了备库,切换成功。