此文以PostgreSQL 10版本为例!
如未指定,下述命令在所有节点执行!
节点名称 | 系统名称 | CPU/内存 | 网卡 | 磁盘 | IP地址 | OS | 节点角色 |
---|---|---|---|---|---|---|---|
PGSQL1 | pgsql1 | 2C/4G | ens33 | 128G | 192.168.0.11 | CentOS7 | PostgreSQL、Pacemaker、Corosync |
PGSQL2 | pgsql2 | 2C/4G | ens33 | 128G | 192.168.0.12 | CentOS7 | PostgreSQL、Pacemaker、Corosync |
PGSQL3 | pgsql3 | 2C/4G | ens33 | 128G | 192.168.0.13 | CentOS7 | PostgreSQL、Pacemaker、Corosync |
yum -y install vim lrzsz bash-completion
echo 192.168.0.11 pgsql1 >> /etc/hosts
echo 192.168.0.12 pgsql2 >> /etc/hosts
echo 192.168.0.13 pgsql3 >> /etc/hosts
yum -y install chrony
systemctl start chronyd
systemctl enable chronyd
systemctl status chronyd
chronyc sources
systemctl stop firewalld
systemctl disable firewalld
setenforce 0
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
安装Pacemaker和Corosync:
yum -y install pacemaker corosync pcs ipvsadm
启动pcsd,并设置自启动:
systemctl start pcsd
systemctl enable pcsd
systemctl status pcsd
设置hacluster用户密码:
echo hacluster | passwd hacluster --stdin
在任何节点上启用集群认证:
pcs cluster auth -u hacluster -p hacluster pgsql1 pgsql2 pgsql3
在任何节点上同步配置:
pcs cluster setup --last_man_standing=1 --name pgcluster pgsql1 pgsql2 pgsql3
在任何节点上启动集群:
pcs cluster start --all
设置Pacemaker和Corosync自启动:
systemctl enable pacemaker
systemctl enable corosync
查看Pacemaker支持的PostgreSQL版本:
cat /usr/lib/ocf/resource.d/heartbeat/pgsql | grep ocf_version_cmp
配置YUM源:
参考地址:https://www.postgresql.org/download
yum -y install https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm
安装PostgreSQL:
yum -y install postgresql10-server
配置环境变量:
su - postgres
修改.bash_profile,添加如下内容:
export PATH=$PATH:/usr/pgsql-10/bin
在PGSQL1节点上初始化PostgreSQL:
/usr/pgsql-10/bin/postgresql-10-setup initdb
在PGSQL1节点上配置远程登录和复制权限:
修改/var/lib/pgsql/10/data/postgresql.conf:
listen_addresses = '*'
修改/var/lib/pgsql/10/data/pg_hba.conf,添加如下内容:
# IPv4 local connections:
host all all 0.0.0.0/0 md5
# replication privilege.
host replication repluser 192.168.0.0/24 md5
在PGSQL1节点上启动PostgreSQL:
su - postgres
pg_ctl start
pg_ctl status
在PGSQL1节点上修改数据库密码:
su - postgres
psql -U postgres
ALTER USER postgres WITH ENCRYPTED PASSWORD '111111';
\du
\q
在PGSQL1节点上创建复制用户:
su - postgres
psql
CREATE USER repluser WITH REPLICATION PASSWORD '111111';
\du
\q
在PGSQL2和PGSQL3节点上备份PGSQL1节点数据:
su - postgres
pg_basebackup -h pgsql1 -U repluser -D /var/lib/pgsql/10/data -P -v
在PGSQL1节点上停止PostgreSQL:
su - postgres
pg_ctl stop
pg_ctl status
在PGSQL1节点上配置PCS资源:
创建cib配置文件:
pcs cluster cib pgsql_cfg
在Pacemaker级别忽略Quorum:
pcs -f pgsql_cfg property set no-quorum-policy=ignore
禁用STONITH:
pcs -f pgsql_cfg property set stonith-enabled=false
设置资源粘性,防止节点在故障恢复后迁移:
pcs -f pgsql_cfg resource defaults resource-stickiness=INFINITY
设置3次失败后迁移:
pcs -f pgsql_cfg resource defaults migration-threshold=3
设置Master节点虚IP:
pcs -f pgsql_cfg resource create vip-master IPaddr2 ip=192.168.0.10 cidr_netmask=24 op start timeout=60s interval=0s on-fail=restart op monitor timeout=60s interval=10s on-fail=restart op stop timeout=60s interval=0s on-fail=block
设置Slave节点虚IP:
pcs -f pgsql_cfg resource create vip-slave IPaddr2 ip=192.168.0.20 cidr_netmask=24 op start timeout=60s interval=0s on-fail=restart op monitor timeout=60s interval=10s on-fail=restart op stop timeout=60s interval=0s on-fail=block
设置pgsql集群资源:
pcs -f pgsql_cfg resource create pgsql pgsql pgctl=/usr/pgsql-10/bin/pg_ctl psql=/usr/pgsql-10/bin/psql pgdata=/var/lib/pgsql/10/data config=/var/lib/pgsql/10/data/postgresql.conf rep_mode=sync node_list="pgsql1 pgsql2 pgsql3" master_ip=192.168.0.10 repuser=repluser primary_conninfo_opt="password=111111 keepalives_idle=60 keepalives_interval=5 keepalives_count=5" restart_on_promote=true op start timeout=60s interval=0s on-fail=restart op monitor timeout=60s interval=4s on-fail=restart op monitor timeout=60s interval=3s on-fail=restart role=Master op promote timeout=60s interval=0s on-fail=restart op demote timeout=60s interval=0s on-fail=stop op stop timeout=60s interval=0s on-fail=block
设置Master/Slave模式:
pcs -f pgsql_cfg resource master pgsql-cluster pgsql master-max=1 master-node-max=1 clone-max=3 clone-node-max=1 notify=true
配置Master IP组:
pcs -f pgsql_cfg resource group add master-group vip-master
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-OVZ3ntTT-1647757067015)(media/552a15c4fd8b06d77d82f8531cd72771.png)]
配置Slave IP组:
pcs -f pgsql_cfg resource group add slave-group vip-slave
配置Master IP组绑定Master节点:
pcs -f pgsql_cfg constraint colocation add master-group with master pgsql-cluster INFINITY
配置启动Master节点:
pcs -f pgsql_cfg constraint order promote pgsql-cluster then start master-group symmetrical=false score=INFINITY
配置停止Master节点:
pcs -f pgsql_cfg constraint order demote pgsql-cluster then stop master-group symmetrical=false score=0
配置Slave IP组绑定Slave节点:
pcs -f pgsql_cfg constraint colocation add slave-group with slave pgsql-cluster INFINITY
配置启动Slave节点:
pcs -f pgsql_cfg constraint order promote pgsql-cluster then start slave-group symmetrical=false score=INFINITY
配置停止Slave节点:
pcs -f pgsql_cfg constraint order demote pgsql-cluster then stop slave-group symmetrical=false score=0
把配置文件push到cib:
pcs cluster cib-push pgsql_cfg
如果修改集群配置,执行如下命令实现:
cibadmin --query > tmp.xml
vim tmp.xml
cibadmin --replace --xml-file tmp.xml
查看集群状态:
pcs status corosync
pcs status
PGSQL1节点为Master节点
通过虚IP连接数据库:
psql -U postgres -h 192.168.0.10
创建数据库和表:
CREATE DATABASE db;
\c db
CREATE TABLE tb (
id int NOT NULL,
name varchar(255) NULL,
PRIMARY KEY (id)
);
插入数据:
INSERT INTO tb (id,name) VALUES (1,'MySQL');
查看数据:
SELECT * FROM tb;
\q
关闭PGSQL1节点,模拟节点故障
在任意健康节点上查看集群状态:
pcs status corosync
pcs status
此时PGSQL3节点为Master节点
通过虚IP连接数据库:
psql -U postgres -h 192.168.0.10
插入数据:
\c db
INSERT INTO tb (id,name) VALUES (2,'Redis');
查看数据:
SELECT * FROM tb;
\q
数据库读写正常
原Master节点恢复后,出现如下问题,需要删除lock文件并清除资源状态与错误计数:
rm -rf /var/lib/pgsql/tmp/PGSQL.lock
pcs resource cleanup