postgresql 高可用 pacemaker + corosync 之四 master failover

os: ubuntu 16.04
db: postgresql 9.6.8
pacemaker: Pacemaker 1.1.14 Written by Andrew Beekhof
corosync: Corosync Cluster Engine, version ‘2.3.5’

查看集群情况

root@node2:~# crm_mon -Afr -1

Last updated: Mon Feb 18 16:41:52 2019		Last change: Mon Feb 18 16:37:09 2019 by root via crm_attribute on node2
Stack: corosync
Current DC: node1 (version 1.1.14-70404b0) - partition with quorum
3 nodes and 7 resources configured

Online: [ node1 node2 node3 ]

Full list of resources:

 Master/Slave Set: msPostgresql [pgsql]
     Masters: [ node2 ]
     Slaves: [ node1 node3 ]
 Resource Group: master-group
     vip-mas	(ocf::heartbeat:IPaddr2):	Started node2
     vip-sla	(ocf::heartbeat:IPaddr2):	Started node2

Node Attributes:
* Node node1:
    + master-pgsql                    	: -INFINITY 
    + pgsql-data-status               	: STREAMING|ASYNC
    + pgsql-status                    	: HS:async  
* Node node2:
    + master-pgsql                    	: 1000      
    + pgsql-data-status               	: LATEST    
    + pgsql-master-baseline           	: 0000000006000140
    + pgsql-status                    	: PRI       
* Node node3:
    + master-pgsql                    	: -INFINITY 
    + pgsql-data-status               	: STREAMING|ASYNC
    + pgsql-status                    	: HS:async  
    + pgsql-xlog-loc                  	: 00000000060001E8

Migration Summary:
* Node node1:
* Node node3:
* Node node2:


可以看出现在 node2 是 master 角色.

杀掉 node2 上的 postgresql

# killall -9 postgres

稍等片刻后,查看集群情况

root@node2:~# crm_mon -Afr -1

Last updated: Mon Feb 18 16:43:44 2019		Last change: Mon Feb 18 16:43:41 2019 by root via crm_attribute on node1
Stack: corosync
Current DC: node1 (version 1.1.14-70404b0) - partition with quorum
3 nodes and 7 resources configured

Online: [ node1 node2 node3 ]

Full list of resources:

 Master/Slave Set: msPostgresql [pgsql]
     Masters: [ node1 ]
     Slaves: [ node3 ]
     Stopped: [ node2 ]
 Resource Group: master-group
     vip-mas	(ocf::heartbeat:IPaddr2):	Started node1
     vip-sla	(ocf::heartbeat:IPaddr2):	Started node1

Node Attributes:
* Node node1:
    + master-pgsql                    	: 1000      
    + pgsql-data-status               	: LATEST    
    + pgsql-master-baseline           	: 00000000060001E8
    + pgsql-status                    	: PRI       
* Node node2:
    + master-pgsql                    	: -INFINITY 
    + pgsql-data-status               	: DISCONNECT
    + pgsql-status                    	: STOP      
* Node node3:
    + master-pgsql                    	: -INFINITY 
    + pgsql-data-status               	: STREAMING|ASYNC
    + pgsql-status                    	: HS:async  

Migration Summary:
* Node node1:
* Node node3:
* Node node2:
   pgsql: migration-threshold=1 fail-count=1 last-failure='Mon Feb 18 16:43:19 2019'

Failed Actions:
* pgsql_monitor_3000 on node2 'not running' (7): call=111, status=complete, exitreason='none',
    last-rc-change='Mon Feb 18 16:43:19 2019', queued=0ms, exec=0ms

棒棒棒,Stopped: [ node2 ] 显示 node2 上的 pgsql 已经停掉了,node1 上的 pgsql 为 master.
查看 node1 上的ip地址

root@node1:~# ip a
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp0s3:  mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:f6:4a:79 brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.101/24 brd 10.0.2.255 scope global enp0s3
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fef6:4a79/64 scope link 
       valid_lft forever preferred_lft forever
3: eno1:  mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:c4:82:0d brd ff:ff:ff:ff:ff:ff
    inet 192.168.56.92/24 brd 192.168.56.255 scope global eno1
       valid_lft forever preferred_lft forever
    inet 192.168.56.119/24 brd 192.168.56.255 scope global secondary eno1
       valid_lft forever preferred_lft forever
    inet 192.168.56.120/24 brd 192.168.56.255 scope global secondary eno1
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fec4:820d/64 scope link 
       valid_lft forever preferred_lft forever

node2 上的 postgresql 变为slave

方法1: node2 上执行 pg_basebackup,数据库太大时需要花费较长时间

# killall -9 postgres
# su - postgres
$ rm /var/lib/pgsql/tmp/PGSQL.lock
$ rm -rf /data/pg9.6/main/*
$ pg_basebackup -h 192.168.56.120 -U repl -D /data/pg9.6/main -X stream -P
$ exit

方法2: node2 上执行 pg_rewind

# /etc/init.d/postgresql start
# /etc/init.d/postgresql stop
# rm /var/lib/pgsql/tmp/PGSQL.lock
# su - postgres
$ pg_rewind --target-pgdata=/data/pg9.6/main --source-server='host=192.168.56.120 port=5432 user=postgres password=xxoo dbname=postgres' -P
$ exit

两种方法二选一,之后在node2 上重新启动 postgresql

# rm /var/lib/pgsql/tmp/PGSQL.lock
# pcs resource cleanup msPostgresql

查看结果

root@node1:~# crm_mon -Afr -1

Last updated: Mon Feb 18 17:03:20 2019		Last change: Mon Feb 18 17:03:06 2019 by hacluster via crmd on node1
Stack: corosync
Current DC: node1 (version 1.1.14-70404b0) - partition with quorum
3 nodes and 7 resources configured

Online: [ node1 node2 node3 ]

Full list of resources:

 Master/Slave Set: msPostgresql [pgsql]
     Masters: [ node1 ]
     Slaves: [ node2 node3 ]
 Resource Group: master-group
     vip-mas	(ocf::heartbeat:IPaddr2):	Started node1
     vip-sla	(ocf::heartbeat:IPaddr2):	Started node1

Node Attributes:
* Node node1:
    + master-pgsql                    	: 1000      
    + pgsql-data-status               	: LATEST    
    + pgsql-master-baseline           	: 00000000060001E8
    + pgsql-status                    	: PRI       
* Node node2:
    + master-pgsql                    	: -INFINITY 
    + pgsql-data-status               	: STREAMING|ASYNC
    + pgsql-status                    	: HS:async  
    + pgsql-xlog-loc                  	: 0000000008000060
* Node node3:
    + master-pgsql                    	: -INFINITY 
    + pgsql-data-status               	: STREAMING|ASYNC
    + pgsql-status                    	: HS:async  
    + pgsql-xlog-loc                  	: 0000000008000060

Migration Summary:
* Node node1:
* Node node3:
* Node node2:

如果需要宕掉的postgresql 重新自动变为新的 master 的 slave,就需要:

  1. postgresql.conf 设置
    archive_mode = on
    archive_command = 'cp %p /data/backup/pgwalarchive/%f ’

  2. cluster.pcs 设置 archive_command 和 restore_command.
    restore_command=“cp /data/backup/pgwalarchive/%f %p”

参考:
https://wiki.clusterlabs.org/wiki/PgSQL_Replicated_Cluster

你可能感兴趣的:(#,postgresql,ha,pacemaker)