postgresql 高可用 pacemaker + corosync 之一 setup vip-mas 绑定 master,vip-sla 绑定 slave

os: ubuntu 16.04
db: postgresql 9.6.8
pacemaker: Pacemaker 1.1.14 Written by Andrew Beekhof
corosync: Corosync Cluster Engine, version ‘2.3.5’

pacemaker 集群资源管理器(Cluster Resource Management)
corosync 集群消息事务层(Massage Layer)
pcs CRM的管理接口工具

ip 规划

vip-mas 192.168.56.119 
vip-sla 192.168.56.120

node1 192.168.56.92
node2 192.168.56.90
node3 192.168.56.88

vip-mas 绑定在 master 节点的网卡上,作为 write ip
vip-sla 绑定在 slave 节点的网卡上,作为 read ip.

os 设置

# iptables -F

# systemctl stop ufw;
systemctl disable ufw;

禁用selinux,有的话就修改,没有就不修改(依赖policycoreutils)

# vi /etc/selinux/config 

SELINUX=disabled

# vi /etc/hosts

192.168.56.92 node1
192.168.56.90 node2
192.168.56.88 node3

配置 ssh 信任

# ssh-keygen -t rsa
# ssh-copy-id -i ~/.ssh/id_rsa.pub root@node1;
ssh-copy-id -i ~/.ssh/id_rsa.pub root@node2;
ssh-copy-id -i ~/.ssh/id_rsa.pub root@node3;

postgresql stream replication

安装配置好 1 master 2 slave async stream replication.
详细过程可以参考另外的blog,注意需要禁止 postgresql 随机启动,用 pacemaker + corosync 来管理 postgresql.

# systemctl disable postgresql

安装 pacemaker corosync pcs

留意 2224 端口的使用情况

# netstat -lntp |grep -i 2224

所有节点都需要安装对应的软件

# apt list |grep -Ei "pacemaker|corosync|corosync-dev|pcs|psmisc|fence-agents"

# apt install -y pacemaker corosync corosync-dev pcs psmisc fence-agents crmsh

# dpkg -l |grep -Ei "pacemaker|corosync|pcs|psmisc|fence-agents|crmsh"

ii  corosync                              2.3.5-3ubuntu2.1                           amd64        cluster engine daemon and utilities
ii  corosync-dev                          2.3.5-3ubuntu2.1                           all          cluster engine generic development (transitional package)
ii  crmsh                                 2.2.0-1                                    amd64        CRM shell for the pacemaker cluster manager
ii  fence-agents                          4.0.22-2                                   amd64        Fence Agents for Red Hat Cluster
ii  libcorosync-common-dev:amd64          2.3.5-3ubuntu2.1                           amd64        cluster engine common development
ii  libcorosync-common4:amd64             2.3.5-3ubuntu2.1                           amd64        cluster engine common library
ii  pacemaker                             1.1.14-2ubuntu1.4                          amd64        cluster resource manager
ii  pacemaker-cli-utils                   1.1.14-2ubuntu1.4                          amd64        cluster resource manager command line utilities
ii  pacemaker-common                      1.1.14-2ubuntu1.4                          all          cluster resource manager common files
ii  pacemaker-resource-agents             1.1.14-2ubuntu1.4                          all          cluster resource manager general resource agents
ii  pcs                                   0.9.149-1ubuntu1.1                         amd64        Pacemaker Configuration System
ii  psmisc                                22.21-2.1build1                            amd64        utilities that use the proc file system


对应的完全卸载指令

# apt-get -y remove --purge corosync corosync-dev libcorosync-common-dev libcorosync-common4 pacemaker pacemaker-cli-utils pacemaker-common pacemaker-resource-agents pcs psmisc fence-agents crmsh

修改 hacluster 用户密码

# passwd hacluster

启动 pacemaker corosync pcs

在各个节点上确保三个服务的启用及自启动状态,一般情况下,初次安装完后就是启动的状态

# systemctl status pacemaker corosync pcsd
# systemctl enable pacemaker corosync pcsd

# systemctl restart pacemaker corosync pcsd

# ls -l /lib/systemd/system/corosync.service;
ls -l /lib/systemd/system/pacemaker.service;
ls -l /lib/systemd/system/pcsd.service;

设置 corosync

所有节点都需要设置,注意修改 totem => interface => member => memberaddr

# cp /etc/corosync/corosync.conf /etc/corosync/corosync.conf.bak
# cat /dev/null > /etc/corosync/corosync.conf
# vi /etc/corosync/corosync.conf

#节点间进行心跳传播的协议,ring 0代表不需要向任何信息就能到达
totem {
    version: 2
    cluster_name: pgcluster
    token: 3000
    token_retransmits_before_loss_const: 10
    clear_node_high_bit: yes
    secauth: off
    crypto_cipher: none
    crypto_hash: none
    interface {
        ringnumber: 0
        bindnetaddr: 192.168.56.0
        broadcast: yes
		#mcastaddr: 239.255.1.1
        mcastport: 5405
        ttl: 1
    }
}
nodelist {
  node {
    ring0_addr: 192.168.56.92
	name: node1
    nodeid: 1
  }
  node {
    ring0_addr: 192.168.56.90
	name: node2
    nodeid: 2
  }
  node {
    ring0_addr: 192.168.56.88
	name: node3
    nodeid: 3
  }
}
quorum {
    provider: corosync_votequorum
    expected_votes: 2
}
aisexec {
    user: root
    group: root
}
service {
    name: pacemaker
    ver: 0
}
logging {
    to_logfile: yes
    logfile: /var/log/corosync/corosync.log
    to_syslog: yes
    timestamp: on
    logger_subsys {
        subsys: QUORUM
        debug: off
    }
}

所有节点都重启下

# systemctl restart corosync

设置 pacemaker

所有节点都需要设置

# rm -f /var/lib/pacemaker/cib/cib*

所有节点都重启下 pacemaker

# systemctl restart pacemaker pcsd

在 node1 节点上检查状态

# crm_mon -Afr -1

Last updated: Fri Feb 15 16:58:45 2019		Last change: Fri Feb 15 16:57:23 2019 by hacluster via crmd on node2
Stack: corosync
Current DC: node1 (version 1.1.14-70404b0) - partition with quorum
3 nodes and 0 resources configured

Online: [ node1 node2 node3 ]

Full list of resources:


Node Attributes:
* Node node1:
* Node node2:
* Node node3:

Migration Summary:
* Node node2:
* Node node1:
* Node node3:

集群认证

node1 节点上执行

# pcs cluster auth 192.168.56.92 192.168.56.90 192.168.56.88

192.168.56.88: Authorized
192.168.56.90: Authorized
192.168.56.92: Authorized

Username 输入 hacluster
Password 输入 rootroot

或者 添加 -u -p

# pcs cluster auth -u hacluster -p rootroot 192.168.56.92 192.168.56.90 192.168.56.88

查看 tokens

# cat /var/lib/pcsd/tokens
{
  "format_version": 2,
  "data_version": 1,
  "tokens": {
    "192.168.56.88": "17148b7b-634c-4ffc-a013-89cf0727fa1d",
    "192.168.56.90": "8869aa0f-56a8-4e72-9746-474df8bef48e",
    "192.168.56.92": "a03d15fe-c9d5-4845-b16e-23b418f4c0b3"
  }
}

# cat /var/lib/pcsd/pcs_users.conf 
[
  {
    "username": "hacluster",
    "token": "a03d15fe-c9d5-4845-b16e-23b418f4c0b3",
    "creation_date": "2019-02-15 17:06:17 +0800"
  }
]

创建资源配置文件

node1 节点上创建 /root/cluster.pcs 文件,内容如下
内容根据具体的要求有差异

pcs cluster cib pgsql_cfg
 
pcs -f pgsql_cfg property set no-quorum-policy="ignore"
pcs -f pgsql_cfg property set stonith-enabled="false"
pcs -f pgsql_cfg resource defaults resource-stickiness="INFINITY"
pcs -f pgsql_cfg resource defaults migration-threshold="1"
 
pcs -f pgsql_cfg resource create vip-mas IPaddr2 \
   ip="192.168.56.119" \
   nic="eno1" \
   cidr_netmask="24" \
   op start   timeout="60s" interval="0s"  on-fail="restart" \
   op monitor timeout="60s" interval="10s" on-fail="restart" \
   op stop    timeout="60s" interval="0s"  on-fail="block"
 
pcs -f pgsql_cfg resource create vip-sla IPaddr2 \
   ip="192.168.56.120" \
   nic="eno1" \
   cidr_netmask="24" \
   meta migration-threshold="0" \
   op start   timeout="60s" interval="0s"  on-fail="stop" \
   op monitor timeout="60s" interval="10s" on-fail="restart" \
   op stop    timeout="60s" interval="0s"  on-fail="ignore"
 
pcs -f pgsql_cfg resource create pgsql pgsql \
   pgctl="/usr/lib/postgresql/9.6/bin/pg_ctl" \
   psql="/usr/lib/postgresql/9.6/bin/psql" \
   pgdata="/data/pg9.6/main/" \
   config="/etc/postgresql/9.6/main/postgresql.conf" \
   socketdir="/var/run/postgresql" \
   rep_mode="async" \
   node_list="node1 node2 node3" \
   master_ip="192.168.56.119" \
   repuser="repl" \
   primary_conninfo_opt="password=pass0rd!@123 keepalives_idle=60 keepalives_interval=5 keepalives_count=5" \
   restart_on_promote='true' \
   op start   timeout="60s" interval="0s"  on-fail="restart" \
   op monitor timeout="60s" interval="4s"  on-fail="restart" \
   op monitor timeout="60s" interval="3s"  on-fail="restart" role="Master" \
   op promote timeout="60s" interval="0s"  on-fail="restart" \
   op demote  timeout="60s" interval="0s"  on-fail="stop" \
   op stop    timeout="60s" interval="0s"  on-fail="block" \
   op notify  timeout="60s" interval="0s"
 
pcs -f pgsql_cfg resource master msPostgresql pgsql \
   master-max=1 master-node-max=1 clone-max=5 clone-node-max=1 notify=true
 
pcs -f pgsql_cfg resource group add master-group vip-mas
pcs -f pgsql_cfg resource group add slave-group vip-sla
 
pcs -f pgsql_cfg constraint colocation add master-group with master msPostgresql INFINITY
pcs -f pgsql_cfg constraint order promote msPostgresql then start master-group symmetrical=false score=INFINITY
pcs -f pgsql_cfg constraint order demote  msPostgresql then stop  master-group symmetrical=false score=0
 
pcs -f pgsql_cfg constraint colocation add slave-group with slave msPostgresql INFINITY
pcs -f pgsql_cfg constraint order promote msPostgresql then start slave-group symmetrical=false score=INFINITY         
pcs -f pgsql_cfg constraint order demote  msPostgresql then stop  slave-group symmetrical=false score=0 
 
pcs cluster cib-push pgsql_cfg

加载配置文件

# sh /root/cluster.pcs

Adding msPostgresql master-group (score: INFINITY) (Options: first-action=promote then-action=start symmetrical=false)
Adding msPostgresql master-group (score: 0) (Options: first-action=demote then-action=stop symmetrical=false)
Adding msPostgresql slave-group (score: INFINITY) (Options: first-action=promote then-action=start symmetrical=false)
Adding msPostgresql slave-group (score: 0) (Options: first-action=demote then-action=stop symmetrical=false)
CIB updated

node1 节点先执行,node2,node3 依次重启相关服务

# systemctl restart corosync pacemaker pcsd

检查是否创建成功

# crm_mon -Afr -1

Last updated: Fri Feb 15 17:28:23 2019		Last change: Fri Feb 15 17:00:04 2019 by root via crm_attribute on node1
Stack: corosync
Current DC: node2 (version 1.1.14-70404b0) - partition with quorum
3 nodes and 7 resources configured

Online: [ node1 node2 node3 ]

Full list of resources:

 Master/Slave Set: msPostgresql [pgsql]
     Masters: [ node1 ]
     Slaves: [ node2 node3 ]
 Resource Group: master-group
     vip-mas	(ocf::heartbeat:IPaddr2):	Started node1
 Resource Group: slave-group
     vip-sla	(ocf::heartbeat:IPaddr2):	Started node2

Node Attributes:
* Node node1:
    + master-pgsql                    	: 1000      
    + pgsql-data-status               	: LATEST    
    + pgsql-master-baseline           	: 0000000007000098
    + pgsql-status                    	: PRI       
* Node node2:
    + master-pgsql                    	: -INFINITY 
    + pgsql-data-status               	: STREAMING|ASYNC
    + pgsql-status                    	: HS:async  
* Node node3:
    + master-pgsql                    	: -INFINITY 
    + pgsql-data-status               	: STREAMING|ASYNC
    + pgsql-status                    	: HS:async  

Migration Summary:
* Node node2:
* Node node1:
* Node node3:

启动

node1 节点上启动 cluster

# pcs cluster start --all

192.168.56.88: Starting Cluster...
192.168.56.90: Starting Cluster...
192.168.56.92: Starting Cluster...

# pcs cluster enable --all

192.168.56.92: Cluster Enabled
192.168.56.90: Cluster Enabled
192.168.56.88: Cluster Enabled

检查 cluster 状态

# pcs cluster status

Cluster Status:
 Last updated: Fri Feb 15 17:29:18 2019		Last change: Fri Feb 15 17:00:04 2019 by root via crm_attribute on node1
 Stack: corosync
 Current DC: node2 (version 1.1.14-70404b0) - partition with quorum
 3 nodes and 7 resources configured
 Online: [ node1 node2 node3 ]

PCSD Status:
  node1 (192.168.56.92): Online
  node2 (192.168.56.90): Online
  node3 (192.168.56.88): Online


检查 corosync

# pcs status corosync

Membership information
----------------------
    Nodeid      Votes Name
         3          1 192.168.56.88
         2          1 192.168.56.90
         1          1 192.168.56.92 (local)

检查 pacemaker

# pcs status

Cluster name: pgcluster
WARNING: corosync and pacemaker node names do not match (IPs used in setup?)
Last updated: Fri Feb 15 17:29:46 2019		Last change: Fri Feb 15 17:00:04 2019 by root via crm_attribute on node1
Stack: corosync
Current DC: node2 (version 1.1.14-70404b0) - partition with quorum
3 nodes and 7 resources configured

Online: [ node1 node2 node3 ]

Full list of resources:

 Master/Slave Set: msPostgresql [pgsql]
     Masters: [ node1 ]
     Slaves: [ node2 node3 ]
 Resource Group: master-group
     vip-mas	(ocf::heartbeat:IPaddr2):	Started node1
 Resource Group: slave-group
     vip-sla	(ocf::heartbeat:IPaddr2):	Started node2

PCSD Status:
  node1 (192.168.56.92): Online
  node2 (192.168.56.90): Online
  node3 (192.168.56.88): Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

  

参考:
https://www.clusterlabs.org/
http://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/index.html

https://wiki.clusterlabs.org/wiki/Pacemaker
https://wiki.clusterlabs.org/wiki/PgSQL_Replicated_Cluster

https://ubuntuforums.org/showthread.php?t=2329725

https://my.oschina.net/aven92/blog/518928

你可能感兴趣的:(#,postgresql,ha,pacemaker)