os: ubuntu 16.04
db: postgresql 9.6.8
pacemaker: Pacemaker 1.1.14 Written by Andrew Beekhof
corosync: Corosync Cluster Engine, version ‘2.3.5’
root@node2:~# crm_mon -Afr -1
Last updated: Mon Feb 18 16:41:52 2019 Last change: Mon Feb 18 16:37:09 2019 by root via crm_attribute on node2
Stack: corosync
Current DC: node1 (version 1.1.14-70404b0) - partition with quorum
3 nodes and 7 resources configured
Online: [ node1 node2 node3 ]
Full list of resources:
Master/Slave Set: msPostgresql [pgsql]
Masters: [ node2 ]
Slaves: [ node1 node3 ]
Resource Group: master-group
vip-mas (ocf::heartbeat:IPaddr2): Started node2
vip-sla (ocf::heartbeat:IPaddr2): Started node2
Node Attributes:
* Node node1:
+ master-pgsql : -INFINITY
+ pgsql-data-status : STREAMING|ASYNC
+ pgsql-status : HS:async
* Node node2:
+ master-pgsql : 1000
+ pgsql-data-status : LATEST
+ pgsql-master-baseline : 0000000006000140
+ pgsql-status : PRI
* Node node3:
+ master-pgsql : -INFINITY
+ pgsql-data-status : STREAMING|ASYNC
+ pgsql-status : HS:async
+ pgsql-xlog-loc : 00000000060001E8
Migration Summary:
* Node node1:
* Node node3:
* Node node2:
可以看出现在 node2 是 master 角色.
# killall -9 postgres
稍等片刻后,查看集群情况
root@node2:~# crm_mon -Afr -1
Last updated: Mon Feb 18 16:43:44 2019 Last change: Mon Feb 18 16:43:41 2019 by root via crm_attribute on node1
Stack: corosync
Current DC: node1 (version 1.1.14-70404b0) - partition with quorum
3 nodes and 7 resources configured
Online: [ node1 node2 node3 ]
Full list of resources:
Master/Slave Set: msPostgresql [pgsql]
Masters: [ node1 ]
Slaves: [ node3 ]
Stopped: [ node2 ]
Resource Group: master-group
vip-mas (ocf::heartbeat:IPaddr2): Started node1
vip-sla (ocf::heartbeat:IPaddr2): Started node1
Node Attributes:
* Node node1:
+ master-pgsql : 1000
+ pgsql-data-status : LATEST
+ pgsql-master-baseline : 00000000060001E8
+ pgsql-status : PRI
* Node node2:
+ master-pgsql : -INFINITY
+ pgsql-data-status : DISCONNECT
+ pgsql-status : STOP
* Node node3:
+ master-pgsql : -INFINITY
+ pgsql-data-status : STREAMING|ASYNC
+ pgsql-status : HS:async
Migration Summary:
* Node node1:
* Node node3:
* Node node2:
pgsql: migration-threshold=1 fail-count=1 last-failure='Mon Feb 18 16:43:19 2019'
Failed Actions:
* pgsql_monitor_3000 on node2 'not running' (7): call=111, status=complete, exitreason='none',
last-rc-change='Mon Feb 18 16:43:19 2019', queued=0ms, exec=0ms
棒棒棒,Stopped: [ node2 ] 显示 node2 上的 pgsql 已经停掉了,node1 上的 pgsql 为 master.
查看 node1 上的ip地址
root@node1:~# ip a
1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp0s3: mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:f6:4a:79 brd ff:ff:ff:ff:ff:ff
inet 10.0.2.101/24 brd 10.0.2.255 scope global enp0s3
valid_lft forever preferred_lft forever
inet6 fe80::a00:27ff:fef6:4a79/64 scope link
valid_lft forever preferred_lft forever
3: eno1: mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:c4:82:0d brd ff:ff:ff:ff:ff:ff
inet 192.168.56.92/24 brd 192.168.56.255 scope global eno1
valid_lft forever preferred_lft forever
inet 192.168.56.119/24 brd 192.168.56.255 scope global secondary eno1
valid_lft forever preferred_lft forever
inet 192.168.56.120/24 brd 192.168.56.255 scope global secondary eno1
valid_lft forever preferred_lft forever
inet6 fe80::a00:27ff:fec4:820d/64 scope link
valid_lft forever preferred_lft forever
方法1: node2 上执行 pg_basebackup,数据库太大时需要花费较长时间
# killall -9 postgres
# su - postgres
$ rm /var/lib/pgsql/tmp/PGSQL.lock
$ rm -rf /data/pg9.6/main/*
$ pg_basebackup -h 192.168.56.120 -U repl -D /data/pg9.6/main -X stream -P
$ exit
方法2: node2 上执行 pg_rewind
# /etc/init.d/postgresql start
# /etc/init.d/postgresql stop
# rm /var/lib/pgsql/tmp/PGSQL.lock
# su - postgres
$ pg_rewind --target-pgdata=/data/pg9.6/main --source-server='host=192.168.56.120 port=5432 user=postgres password=xxoo dbname=postgres' -P
$ exit
两种方法二选一,之后在node2 上重新启动 postgresql
# rm /var/lib/pgsql/tmp/PGSQL.lock
# pcs resource cleanup msPostgresql
查看结果
root@node1:~# crm_mon -Afr -1
Last updated: Mon Feb 18 17:03:20 2019 Last change: Mon Feb 18 17:03:06 2019 by hacluster via crmd on node1
Stack: corosync
Current DC: node1 (version 1.1.14-70404b0) - partition with quorum
3 nodes and 7 resources configured
Online: [ node1 node2 node3 ]
Full list of resources:
Master/Slave Set: msPostgresql [pgsql]
Masters: [ node1 ]
Slaves: [ node2 node3 ]
Resource Group: master-group
vip-mas (ocf::heartbeat:IPaddr2): Started node1
vip-sla (ocf::heartbeat:IPaddr2): Started node1
Node Attributes:
* Node node1:
+ master-pgsql : 1000
+ pgsql-data-status : LATEST
+ pgsql-master-baseline : 00000000060001E8
+ pgsql-status : PRI
* Node node2:
+ master-pgsql : -INFINITY
+ pgsql-data-status : STREAMING|ASYNC
+ pgsql-status : HS:async
+ pgsql-xlog-loc : 0000000008000060
* Node node3:
+ master-pgsql : -INFINITY
+ pgsql-data-status : STREAMING|ASYNC
+ pgsql-status : HS:async
+ pgsql-xlog-loc : 0000000008000060
Migration Summary:
* Node node1:
* Node node3:
* Node node2:
如果需要宕掉的postgresql 重新自动变为新的 master 的 slave,就需要:
postgresql.conf 设置
archive_mode = on
archive_command = 'cp %p /data/backup/pgwalarchive/%f ’
cluster.pcs 设置 archive_command 和 restore_command.
restore_command=“cp /data/backup/pgwalarchive/%f %p”
参考:
https://wiki.clusterlabs.org/wiki/PgSQL_Replicated_Cluster