os: ubuntu 16.04
db: postgresql 9.6.8
pacemaker: Pacemaker 1.1.14 Written by Andrew Beekhof
corosync: Corosync Cluster Engine, version ‘2.3.5’
pacemaker 集群资源管理器(Cluster Resource Management)
corosync 集群消息事务层(Massage Layer)
pcs CRM的管理接口工具
ip 规划
vip-mas 192.168.56.119
vip-sla 192.168.56.120
node1 192.168.56.92
node2 192.168.56.90
node3 192.168.56.88
vip-mas 绑定在 master 节点的网卡上,作为 write ip
vip-sla 绑定在 slave 节点的网卡上,作为 read ip.
# iptables -F
# systemctl stop ufw;
systemctl disable ufw;
禁用selinux,有的话就修改,没有就不修改(依赖policycoreutils)
# vi /etc/selinux/config
SELINUX=disabled
# vi /etc/hosts
192.168.56.92 node1
192.168.56.90 node2
192.168.56.88 node3
配置 ssh 信任
# ssh-keygen -t rsa
# ssh-copy-id -i ~/.ssh/id_rsa.pub root@node1;
ssh-copy-id -i ~/.ssh/id_rsa.pub root@node2;
ssh-copy-id -i ~/.ssh/id_rsa.pub root@node3;
安装配置好 1 master 2 slave async stream replication.
详细过程可以参考另外的blog,注意需要禁止 postgresql 随机启动,用 pacemaker + corosync 来管理 postgresql.
# systemctl disable postgresql
留意 2224 端口的使用情况
# netstat -lntp |grep -i 2224
所有节点都需要安装对应的软件
# apt list |grep -Ei "pacemaker|corosync|corosync-dev|pcs|psmisc|fence-agents"
# apt install -y pacemaker corosync corosync-dev pcs psmisc fence-agents crmsh
# dpkg -l |grep -Ei "pacemaker|corosync|pcs|psmisc|fence-agents|crmsh"
ii corosync 2.3.5-3ubuntu2.1 amd64 cluster engine daemon and utilities
ii corosync-dev 2.3.5-3ubuntu2.1 all cluster engine generic development (transitional package)
ii crmsh 2.2.0-1 amd64 CRM shell for the pacemaker cluster manager
ii fence-agents 4.0.22-2 amd64 Fence Agents for Red Hat Cluster
ii libcorosync-common-dev:amd64 2.3.5-3ubuntu2.1 amd64 cluster engine common development
ii libcorosync-common4:amd64 2.3.5-3ubuntu2.1 amd64 cluster engine common library
ii pacemaker 1.1.14-2ubuntu1.4 amd64 cluster resource manager
ii pacemaker-cli-utils 1.1.14-2ubuntu1.4 amd64 cluster resource manager command line utilities
ii pacemaker-common 1.1.14-2ubuntu1.4 all cluster resource manager common files
ii pacemaker-resource-agents 1.1.14-2ubuntu1.4 all cluster resource manager general resource agents
ii pcs 0.9.149-1ubuntu1.1 amd64 Pacemaker Configuration System
ii psmisc 22.21-2.1build1 amd64 utilities that use the proc file system
对应的完全卸载指令
# apt-get -y remove --purge corosync corosync-dev libcorosync-common-dev libcorosync-common4 pacemaker pacemaker-cli-utils pacemaker-common pacemaker-resource-agents pcs psmisc fence-agents crmsh
修改 hacluster 用户密码
# passwd hacluster
在各个节点上确保三个服务的启用及自启动状态,一般情况下,初次安装完后就是启动的状态
# systemctl status pacemaker corosync pcsd
# systemctl enable pacemaker corosync pcsd
# systemctl restart pacemaker corosync pcsd
# ls -l /lib/systemd/system/corosync.service;
ls -l /lib/systemd/system/pacemaker.service;
ls -l /lib/systemd/system/pcsd.service;
所有节点都需要设置,注意修改 totem => interface => member => memberaddr
# cp /etc/corosync/corosync.conf /etc/corosync/corosync.conf.bak
# cat /dev/null > /etc/corosync/corosync.conf
# vi /etc/corosync/corosync.conf
#节点间进行心跳传播的协议,ring 0代表不需要向任何信息就能到达
totem {
version: 2
cluster_name: pgcluster
token: 3000
token_retransmits_before_loss_const: 10
clear_node_high_bit: yes
secauth: off
crypto_cipher: none
crypto_hash: none
interface {
ringnumber: 0
bindnetaddr: 192.168.56.0
broadcast: yes
#mcastaddr: 239.255.1.1
mcastport: 5405
ttl: 1
}
}
nodelist {
node {
ring0_addr: 192.168.56.92
name: node1
nodeid: 1
}
node {
ring0_addr: 192.168.56.90
name: node2
nodeid: 2
}
node {
ring0_addr: 192.168.56.88
name: node3
nodeid: 3
}
}
quorum {
provider: corosync_votequorum
expected_votes: 2
}
aisexec {
user: root
group: root
}
service {
name: pacemaker
ver: 0
}
logging {
to_logfile: yes
logfile: /var/log/corosync/corosync.log
to_syslog: yes
timestamp: on
logger_subsys {
subsys: QUORUM
debug: off
}
}
所有节点都重启下
# systemctl restart corosync
所有节点都需要设置
# rm -f /var/lib/pacemaker/cib/cib*
所有节点都重启下 pacemaker
# systemctl restart pacemaker pcsd
在 node1 节点上检查状态
# crm_mon -Afr -1
Last updated: Fri Feb 15 16:58:45 2019 Last change: Fri Feb 15 16:57:23 2019 by hacluster via crmd on node2
Stack: corosync
Current DC: node1 (version 1.1.14-70404b0) - partition with quorum
3 nodes and 0 resources configured
Online: [ node1 node2 node3 ]
Full list of resources:
Node Attributes:
* Node node1:
* Node node2:
* Node node3:
Migration Summary:
* Node node2:
* Node node1:
* Node node3:
node1 节点上执行
# pcs cluster auth 192.168.56.92 192.168.56.90 192.168.56.88
192.168.56.88: Authorized
192.168.56.90: Authorized
192.168.56.92: Authorized
Username 输入 hacluster
Password 输入 rootroot
或者 添加 -u -p
# pcs cluster auth -u hacluster -p rootroot 192.168.56.92 192.168.56.90 192.168.56.88
查看 tokens
# cat /var/lib/pcsd/tokens
{
"format_version": 2,
"data_version": 1,
"tokens": {
"192.168.56.88": "17148b7b-634c-4ffc-a013-89cf0727fa1d",
"192.168.56.90": "8869aa0f-56a8-4e72-9746-474df8bef48e",
"192.168.56.92": "a03d15fe-c9d5-4845-b16e-23b418f4c0b3"
}
}
# cat /var/lib/pcsd/pcs_users.conf
[
{
"username": "hacluster",
"token": "a03d15fe-c9d5-4845-b16e-23b418f4c0b3",
"creation_date": "2019-02-15 17:06:17 +0800"
}
]
node1 节点上创建 /root/cluster.pcs 文件,内容如下
内容根据具体的要求有差异
pcs cluster cib pgsql_cfg
pcs -f pgsql_cfg property set no-quorum-policy="ignore"
pcs -f pgsql_cfg property set stonith-enabled="false"
pcs -f pgsql_cfg resource defaults resource-stickiness="INFINITY"
pcs -f pgsql_cfg resource defaults migration-threshold="1"
pcs -f pgsql_cfg resource create vip-mas IPaddr2 \
ip="192.168.56.119" \
nic="eno1" \
cidr_netmask="24" \
op start timeout="60s" interval="0s" on-fail="restart" \
op monitor timeout="60s" interval="10s" on-fail="restart" \
op stop timeout="60s" interval="0s" on-fail="block"
pcs -f pgsql_cfg resource create vip-sla IPaddr2 \
ip="192.168.56.120" \
nic="eno1" \
cidr_netmask="24" \
meta migration-threshold="0" \
op start timeout="60s" interval="0s" on-fail="stop" \
op monitor timeout="60s" interval="10s" on-fail="restart" \
op stop timeout="60s" interval="0s" on-fail="ignore"
pcs -f pgsql_cfg resource create pgsql pgsql \
pgctl="/usr/lib/postgresql/9.6/bin/pg_ctl" \
psql="/usr/lib/postgresql/9.6/bin/psql" \
pgdata="/data/pg9.6/main/" \
config="/etc/postgresql/9.6/main/postgresql.conf" \
socketdir="/var/run/postgresql" \
rep_mode="async" \
node_list="node1 node2 node3" \
master_ip="192.168.56.119" \
repuser="repl" \
primary_conninfo_opt="password=pass0rd!@123 keepalives_idle=60 keepalives_interval=5 keepalives_count=5" \
restart_on_promote='true' \
op start timeout="60s" interval="0s" on-fail="restart" \
op monitor timeout="60s" interval="4s" on-fail="restart" \
op monitor timeout="60s" interval="3s" on-fail="restart" role="Master" \
op promote timeout="60s" interval="0s" on-fail="restart" \
op demote timeout="60s" interval="0s" on-fail="stop" \
op stop timeout="60s" interval="0s" on-fail="block" \
op notify timeout="60s" interval="0s"
pcs -f pgsql_cfg resource master msPostgresql pgsql \
master-max=1 master-node-max=1 clone-max=5 clone-node-max=1 notify=true
pcs -f pgsql_cfg resource group add master-group vip-mas
pcs -f pgsql_cfg resource group add slave-group vip-sla
pcs -f pgsql_cfg constraint colocation add master-group with master msPostgresql INFINITY
pcs -f pgsql_cfg constraint order promote msPostgresql then start master-group symmetrical=false score=INFINITY
pcs -f pgsql_cfg constraint order demote msPostgresql then stop master-group symmetrical=false score=0
pcs -f pgsql_cfg constraint colocation add slave-group with slave msPostgresql INFINITY
pcs -f pgsql_cfg constraint order promote msPostgresql then start slave-group symmetrical=false score=INFINITY
pcs -f pgsql_cfg constraint order demote msPostgresql then stop slave-group symmetrical=false score=0
pcs cluster cib-push pgsql_cfg
加载配置文件
# sh /root/cluster.pcs
Adding msPostgresql master-group (score: INFINITY) (Options: first-action=promote then-action=start symmetrical=false)
Adding msPostgresql master-group (score: 0) (Options: first-action=demote then-action=stop symmetrical=false)
Adding msPostgresql slave-group (score: INFINITY) (Options: first-action=promote then-action=start symmetrical=false)
Adding msPostgresql slave-group (score: 0) (Options: first-action=demote then-action=stop symmetrical=false)
CIB updated
node1 节点先执行,node2,node3 依次重启相关服务
# systemctl restart corosync pacemaker pcsd
检查是否创建成功
# crm_mon -Afr -1
Last updated: Fri Feb 15 17:28:23 2019 Last change: Fri Feb 15 17:00:04 2019 by root via crm_attribute on node1
Stack: corosync
Current DC: node2 (version 1.1.14-70404b0) - partition with quorum
3 nodes and 7 resources configured
Online: [ node1 node2 node3 ]
Full list of resources:
Master/Slave Set: msPostgresql [pgsql]
Masters: [ node1 ]
Slaves: [ node2 node3 ]
Resource Group: master-group
vip-mas (ocf::heartbeat:IPaddr2): Started node1
Resource Group: slave-group
vip-sla (ocf::heartbeat:IPaddr2): Started node2
Node Attributes:
* Node node1:
+ master-pgsql : 1000
+ pgsql-data-status : LATEST
+ pgsql-master-baseline : 0000000007000098
+ pgsql-status : PRI
* Node node2:
+ master-pgsql : -INFINITY
+ pgsql-data-status : STREAMING|ASYNC
+ pgsql-status : HS:async
* Node node3:
+ master-pgsql : -INFINITY
+ pgsql-data-status : STREAMING|ASYNC
+ pgsql-status : HS:async
Migration Summary:
* Node node2:
* Node node1:
* Node node3:
node1 节点上启动 cluster
# pcs cluster start --all
192.168.56.88: Starting Cluster...
192.168.56.90: Starting Cluster...
192.168.56.92: Starting Cluster...
# pcs cluster enable --all
192.168.56.92: Cluster Enabled
192.168.56.90: Cluster Enabled
192.168.56.88: Cluster Enabled
检查 cluster 状态
# pcs cluster status
Cluster Status:
Last updated: Fri Feb 15 17:29:18 2019 Last change: Fri Feb 15 17:00:04 2019 by root via crm_attribute on node1
Stack: corosync
Current DC: node2 (version 1.1.14-70404b0) - partition with quorum
3 nodes and 7 resources configured
Online: [ node1 node2 node3 ]
PCSD Status:
node1 (192.168.56.92): Online
node2 (192.168.56.90): Online
node3 (192.168.56.88): Online
检查 corosync
# pcs status corosync
Membership information
----------------------
Nodeid Votes Name
3 1 192.168.56.88
2 1 192.168.56.90
1 1 192.168.56.92 (local)
检查 pacemaker
# pcs status
Cluster name: pgcluster
WARNING: corosync and pacemaker node names do not match (IPs used in setup?)
Last updated: Fri Feb 15 17:29:46 2019 Last change: Fri Feb 15 17:00:04 2019 by root via crm_attribute on node1
Stack: corosync
Current DC: node2 (version 1.1.14-70404b0) - partition with quorum
3 nodes and 7 resources configured
Online: [ node1 node2 node3 ]
Full list of resources:
Master/Slave Set: msPostgresql [pgsql]
Masters: [ node1 ]
Slaves: [ node2 node3 ]
Resource Group: master-group
vip-mas (ocf::heartbeat:IPaddr2): Started node1
Resource Group: slave-group
vip-sla (ocf::heartbeat:IPaddr2): Started node2
PCSD Status:
node1 (192.168.56.92): Online
node2 (192.168.56.90): Online
node3 (192.168.56.88): Online
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
参考:
https://www.clusterlabs.org/
http://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/index.html
https://wiki.clusterlabs.org/wiki/Pacemaker
https://wiki.clusterlabs.org/wiki/PgSQL_Replicated_Cluster
https://ubuntuforums.org/showthread.php?t=2329725
https://my.oschina.net/aven92/blog/518928