主机名 |
IP |
端口号 |
备注 |
rac1 |
10.45.53.30 |
5432 |
PG主库(版本10.7) |
rac2 |
10.45.53.31 |
5432 |
PG备库(版本10.7) |
VIP |
10.45.53.33 |
NULL |
浮动IP,被pgpool管控 |
rac1 |
10.45.53.30 |
9998 |
pgpool(版本3.7) |
rac2 |
10.45.53.31 |
9998 |
pgpool(版本3.7) |
1. 下载rpm包到主备机
postgresql10-10.7-1PGDG.rhel7.x86_64.rpm
postgresql10-contrib-10.7-1PGDG.rhel7.x86_64.rpm
postgresql10-libs-10.7-1PGDG.rhel7.x86_64.rpm
postgresql10-server-10.7-1PGDG.rhel7.x86_64.rpm
下载地址:https://download.postgresql.org/pub/repos/yum/10/redhat/rhel-7.4-x86_64/
2. 安装pg
rpm -ivh postgresql10-libs-10.7-1PGDG.rhel7.x86_64.rpm
rpm -ivh postgresql10-10.7-1PGDG.rhel7.x86_64.rpm
rpm -ivh postgresql10-contrib-10.7-1PGDG.rhel7.x86_64.rpm
rpm -ivh postgresql10-server-10.7-1PGDG.rhel7.x86_64.rpm
pg的脚本自动创建在/usr/pgsql-10
安装过程中已经自动建了postgres用户,并且家目录为/var/lib/pgsql
usermod -d /postgresql postgres
password postgres
su - postgres
mkdir data
mkdir pg_archive
chmod 700 data
chmod 700 pg_archive
postgres@rac1[/postgresql]$cat .bash_profile
export PG_HOME=/usr/pgsql-10
export PGDATA=/postgresql/data
export PATH=$PATH:$HOME/bin:$PG_HOME/bin
initdb -D /postgresql/data
cd /postgresql/data
cat postgresql.conf
data_directory = '/postgresql/data'
port = 5432
listen_addresses = '*'
max_connections = 999
max_standby_streaming_delay = 30s
wal_receiver_status_interval = 10s
hot_standby_feedback = on
archive_mode = on
archive_command = 'cp %p /postgresql/pg_archive/%f'
wal_level = hot_standby
hot_standby = on
wal_sender_timeout = 60s
restart_after_crash = off
wal_log_hints = on ##很重要,pg_rewind用得到
max_wal_senders=5
host all all 10.45.53.30/32 trust
host all all 10.45.53.31/32 trust
host replication replica 0.0.0.0/0 trust
cp /usr/pgsql-10/share/recovery.conf.sample /postgresql/data/recovery.done
postgres@rac1[/postgresql/data]$vi recovery.done
standby_mode = on
primary_conninfo = 'host=10.45.53.31 port=5432 user=replica password=replica'
recovery_target_timeline = 'latest'
trigger_file = '/tmp/trigger_file0'
pg_ctl start
psql
CREATE ROLE replica login replication encrypted password 'replica';
ALTER USER postgres WITH PASSWORD 'postgres';
3. 初始化从库
su - postgres
pg_basebackup -h 10.45.53.30 -U replica -D /postgresql/data -X stream -P
cd /postgresql/data
mv recovery.done recovery.conf
vi recovery.conf
standby_mode = on
primary_conninfo = 'host=10.45.53.30 port=5432 user=replica password=replica'
recovery_target_timeline = 'latest'
trigger_file = '/tmp/trigger_file0'
pg_ctl start
4. 确认主从
postgres@rac1[/postgresql/data]$psql -h 10.45.53.30 -p 5432
psql (10.7)
Type "help" for help.
postgres=# select client_addr,sync_state from pg_stat_replication;
client_addr | sync_state
-------------+------------
10.45.53.31 | async
(1 row)
postgres=# select pg_is_in_recovery from pg_is_in_recovery();
pg_is_in_recovery
-------------------
f
(1 row)
postgres@rac1[/postgresql/data]$psql -h 10.45.53.31 -p 5432
psql (10.7)
Type "help" for help.
postgres=# select pg_is_in_recovery from pg_is_in_recovery();
pg_is_in_recovery
-------------------
t
(1 row)
pg_is_in_recovery为f表示false,是主库,t表示true,是主库
5. 测试主从
从库只可读,不可写,在主库执行
create database test;
\l查看从库所有库,发现同步成功
pg_ctl promote -D /postgresql/data
mv recovery.done recovery.conf
pg_ctl start
1. pgpool功能介绍
pgpool-II是PostgreSQL服务器之间一种有效的中间件和PostgreSQL数据库客户端,可实现pg主从集群的负载均衡:select操作分配在主从库,insert,create等操作分配在主库上
2. 下载,主备库安装rpm包
下载地址 http://www.pgpool.net/yum/rpms/3.7/redhat/rhel-7-x86_64/
rpm -ivh pgpool-II-pg10-3.7.0-1pgdg.rhel7.x86_64.rpm
默认安装在/etc/pgpool-II
chown -R /etc/pgpool-II
3. 配置参数文件
主库:
su - postgres
mkdir pgpool
cd pgpool
mkdir log
root@rac1[/etc/pgpool-II]#cat pgpool.conf
# CONNECTIONS
listen_addresses = '*'
port = 9998
pcp_listen_addresses = '*'
pcp_port = 9898
# - Backend Connection Settings -
backend_hostname0 = 'rac1'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/postgresql/data'
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_hostname1 = 'rac2'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/postgresql/data'
backend_flag1 = 'ALLOW_TO_FAILOVER'
# - Authentication -
enable_pool_hba = on
pool_passwd = 'pool_passwd'
# FILE LOCATIONS
pid_file_name = '/postgresql/pgpool/pgpool.pid'
logdir = '/postgresql/pgpool/log'
replication_mode = off
load_balance_mode = on
master_slave_mode = on
master_slave_sub_mode = 'stream'
sr_check_period = 5
sr_check_user = 'repuser'
sr_check_password = 'repuser'
sr_check_database = 'postgres'
#------------------------------------------------------------------------------
# HEALTH CHECK 健康检查
#------------------------------------------------------------------------------
health_check_period = 10 # Health check period
# Disabled (0) by default
health_check_timeout = 20
# Health check timeout
# 0 means no timeout
health_check_user = 'repuser'
# Health check user
health_check_password = 'repuser' #数据库密码
# Password for health check user
health_check_database = 'postgres'
#必须设置,否则primary数据库down了,pgpool不知道,不能及时切换。从库流复制还在连接数据,报连接失败。
#只有下次使用pgpool登录时,发现连接不上,然后报错,这时候,才知道挂了,pgpool进行切换。
#主备切换的命令行配置
#------------------------------------------------------------------------------
# FAILOVER AND FAILBACK
#------------------------------------------------------------------------------
failover_command = '/postgresql/pgpool/failover_stream.sh %d %H %P'
#------------------------------------------------------------------------------
# WATCHDOG
#------------------------------------------------------------------------------
# - Enabling -
use_watchdog = on
# - Watchdog communication Settings -
wd_hostname = 'rac1'
# Host name or IP address of this watchdog
# (change requires restart)
wd_port = 9000
# port number for watchdog service
# (change requires restart)
# - Virtual IP control Setting -
delegate_IP = '10.45.53.33'
# delegate IP address
# If this is empty, virtual IP never bring up.
# (change requires restart)
if_cmd_path = '/sbin'
# path to the directory where if_up/down_cmd
# (change requires restart)
if_up_cmd = 'ifconfig ens192:5 $_IP_$ netmask 255.255.255.0'
# startup delegate IP command
# (change requires restart)
if_down_cmd = 'ifconfig ens192:5 down'
# shutdown delegate IP command
# (change requires restart)
# -- heartbeat mode --
wd_heartbeat_port = 9694
# Port number for receiving heartbeat signal
# (change requires restart)
wd_heartbeat_keepalive = 2
# Interval time of sending heartbeat signal (sec)
# (change requires restart)
wd_heartbeat_deadtime = 30
# Deadtime interval for heartbeat signal (sec)
# (change requires restart)
heartbeat_destination0 = 'rac2'
# Host name or IP address of destination 0
# for sending heartbeat signal.
# (change requires restart)
heartbeat_destination_port0 = 9694
# Port number of destination 0 for sending
# heartbeat signal. Usually this is the
# same as wd_heartbeat_port.
# (change requires restart)
heartbeat_device0 = 'ens192'
# Name of NIC device (such like 'eth0')
# used for sending/receiving heartbeat
# signal to/from destination 0.
# This works only when this is not empty
# and pgpool has root privilege.
# (change requires restart)
# - Other pgpool Connection Settings -
other_pgpool_hostname0 = 'rac2' #对端
# Host name or IP address to connect to for
# (change requires restart)
other_pgpool_port0 = 9998
# Port number for othet pgpool 0
# (change requires restart)
other_wd_port0 = 9000
# Port number for othet watchdog 0
# (change requires restart)
添加以下到pool_hba.conf
host all all 0.0.0.0/0 trust
host replication replica 0/0 trust
使用md5加密得到postgres的密码的密文
postgres@rac1[/postgresql]$pg_md5 postgres
e8a48653851e28c69d0506508fb27fc5
设置pcp.conf
# USERID:MD5PASSWD
postgres:e8a48653851e28c69d0506508fb27fc5
备库
postgres@rac2[/etc/pgpool-II]$cat pgpool.conf
# CONNECTIONS
listen_addresses = '*'
port = 9998
pcp_listen_addresses = '*'
pcp_port = 9898
# - Backend Connection Settings -
backend_hostname0 = 'rac1'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/postgresql/data'
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_hostname1 = 'rac2'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/postgresql/data'
backend_flag1 = 'ALLOW_TO_FAILOVER'
# - Authentication -
enable_pool_hba = on
pool_passwd = 'pool_passwd'
# FILE LOCATIONS
pid_file_name = '/postgresql/pgpool/pgpool.pid'
logdir = '/postgresql/pgpool/log'
replication_mode = off
load_balance_mode = on
master_slave_mode = on
master_slave_sub_mode = 'stream'
sr_check_period = 5
sr_check_user = 'repuser'
sr_check_password = 'repuser'
sr_check_database = 'postgres'
#------------------------------------------------------------------------------
# HEALTH CHECK 健康检查
#------------------------------------------------------------------------------
health_check_period = 10 # Health check period
# Disabled (0) by default
health_check_timeout = 20
# Health check timeout
# 0 means no timeout
health_check_user = 'repuser'
# Health check user
health_check_password = 'repuser' #数据库密码
# Password for health check user
health_check_database = 'postgres'
#必须设置,否则primary数据库down了,pgpool不知道,不能及时切换。从库流复制还在连接数据,报连接失败。
#只有下次使用pgpool登录时,发现连接不上,然后报错,这时候,才知道挂了,pgpool进行切换。
#主备切换的命令行配置
#------------------------------------------------------------------------------
# FAILOVER AND FAILBACK
#------------------------------------------------------------------------------
failover_command = '/postgresql/pgpool/failover_stream.sh %d %H %P'
#------------------------------------------------------------------------------
# WATCHDOG
#------------------------------------------------------------------------------
# - Enabling -
use_watchdog = on
# - Watchdog communication Settings -
wd_hostname = 'rac2' #本端
# Host name or IP address of this watchdog
# (change requires restart)
wd_port = 9000
# port number for watchdog service
# (change requires restart)
# - Virtual IP control Setting -
delegate_IP = '10.45.53.33'
# delegate IP address
# If this is empty, virtual IP never bring up.
# (change requires restart)
if_cmd_path = '/sbin'
# path to the directory where if_up/down_cmd exists
# (change requires restart)
if_up_cmd = 'ifconfig ens192:5 inet $_IP_$ netmask 255.255.255.0'
# startup delegate IP command
# (change requires restart)
if_down_cmd = 'ifconfig ens192:5 down'
# shutdown delegate IP command
# (change requires restart)
# -- heartbeat mode --
wd_heartbeat_port = 9694
# Port number for receiving heartbeat signal
# (change requires restart)
wd_heartbeat_keepalive = 2
# Interval time of sending heartbeat signal (sec)
# (change requires restart)
wd_heartbeat_deadtime = 30
# Deadtime interval for heartbeat signal (sec)
# (change requires restart)
heartbeat_destination0 = 'rac1' #对端
# Host name or IP address of destination 0
# for sending heartbeat signal.
# (change requires restart)
heartbeat_destination_port0 = 9694
# Port number of destination 0 for sending
# heartbeat signal. Usually this is the
# same as wd_heartbeat_port.
# (change requires restart)
heartbeat_device0 = 'ens192'
# Name of NIC device (such like 'eth0')
# used for sending/receiving heartbeat
# signal to/from destination 0.
# This works only when this is not empty
# and pgpool has root privilege.
# (change requires restart)
# - Other pgpool Connection Settings -
other_pgpool_hostname0 = 'rac1' #对端
# Host name or IP address to connect to for
# (change requires restart)
other_pgpool_port0 = 9998
# Port number for othet pgpool 0
# (change requires restart)
other_wd_port0 = 9000
# Port number for othet watchdog 0
# (change requires restart)
postgres@rac2[/etc/pgpool-II]$cat pcp.conf
# USERID:MD5PASSWD
postgres:e8a48653851e28c69d0506508fb27fc5
postgres@rac2[/etc/pgpool-II]$cat pool_hba.conf
# "local" is for Unix domain socket connections only
local all all trust
# IPv4 local connections:
host all all 127.0.0.1/32 trust
host all all ::1/128 trust
host all all 0.0.0.0/0 trust
host replication replica 0/0 trust
4. failover_stream.sh脚本(主备都需要)
chmod 700 failover_stream.sh
postgres@rac2[/postgresql/pgpool]$cat failover_stream.sh
#!/bin/bash
# Special values:
# 1. %d = node id,我测试了下,%d指的是当前的node id,比如主库是0,从库是1
# 2. %h = host name
# 3. %p = port number
# 4. %D = database cluster path
# 5. %m = new master node id
# 6. %H = hostname of the new master node %M指的是新主库的node id
# 7. %M = old master node id
# 8. %P = old primary node id %P指当前的主库node id
# 9. %r = new master port number
# 10.%R = new master database cluster path
# 11.%% = '%' character
failed_node_id=$1
new_master_host_name=$2
old_primary_node_id=$3
promote_command="/usr/pgsql-10/bin/pg_ctl promote -D /postgresql/data"
echo $old_primary_node_id,$new_master_host_name,$failed_node_id >> /postgresql/pgpool/log/pool.log
/usr/bin/ssh -T $new_master_host_name $promote_command
exit 0;
5. 其他配置
10.45.53.30 rac1
10.45.53.31 rac2
10.45.53.33 vip
ssh-keygen -t rsa
ssh-copy-id -i /postgresql/.ssh/id_rsa 10.45.53.31
ssh-copy-id -i /postgresql/.ssh/id_rsa 10.45.53.30
create user repuser with password 'repuser';
6. 启动pgpool
在postgres用户下启动:
主库:
pgpool -n -d -D > /postgresql/pgpool/log/pgpool.log 2>&1 &
等到主库的虚拟ip起来,可以启动备库
pgpool -n -d -D > /postgresql/pgpool/log/pgpool.log 2>&1 &
7. 测试pgpool
psql -h 10.45.53.33 -p 9998
8. 测试pgpool负载均衡功能
for i in `seq 1 10`; do psql -h 10.45.53.33 -p 9998 -c "select pg_is_in_recovery from pg_is_in_recovery();"; done
说明select语句实现主库和从库负载均衡
pgpool -m fast stop来停止pgpool
1. 备库pg断
postgres@rac2[/postgresql/pgpool]$pg_ctl stop
waiting for server to shut down.... done
server stopped
可以看出,从库down后,primary还是原主库
把备库重新加入pgpool
从库执行:
pg_ctl start
pcp_attach_node -d -U postgres -h 10.45.53.33 -p 9898 -n 1
输入的密码为
2. 主库pg断
postgres@rac1[/postgresql/data/log]$pg_ctl stop
waiting for server to shut down.... done
server stopped
可以看出failover_stream.sh脚本自动将从库promote为primary,可读写
将原主库重新加到pgpool
pcp_attach_node -d -U postgres -h 10.45.53.33 -p 9898 -n 0
3. 主库pgpool进程中断
此时的主库为rac2,将pgpool进程停掉
pgpool -m fast stop
此时由于rac1上的pgpool还在运行,接管资源,所以无影响,实现高可用
将rac2上的pgpool重新拉起即可,若不拉起,则rac1上的pgpool中断,或者rac1服务器down,则10.45.53.33将无法访问
pgpool -n -d -D > /postgresql/pgpool/log/pgpool.log 2>&1 &
4. 主库服务器down
此时主库为rac2,将rac2 reboot,若原先vip起在rac2上的,此时连接vip会有中断,过2秒便可连上,因为rac1上会自动起vip
primary自动切换为rac1
等待rac2起来后,需要初始化rac2
su - postgres
mv data data_bk
pg_basebackup -h 10.45.53.30 -U replica -D /postgresql/data -X stream -P
cd /postgresql/data
mv recovery.done recovery.conf
将recovery.conf文件修改
cat recovery.conf
standby_mode = on
primary_conninfo = 'host=10.45.53.30 port=5432 user=replica password=replica'
recovery_target_timeline = 'latest'
trigger_file = '/tmp/trigger_file0'
pg_ctl start
pgpool -n -d -D > /postgresql/pgpool/log/pgpool.log 2>&1 &
pcp_attach_node -d -U postgres -h 10.45.53.33 -p 9898 -n 1
pg_ctl start
pgpool -n -d -D > /postgresql/pgpool/log/pgpool.log 2>&1 &
pcp_attach_node -d -U postgres -h 10.45.53.33 -p 9898 -n 1
1. "Address already in use"
pgpool无法启动,log显示"Address already in use" ,此时有两种可能:
(1) 端口被占用
发现9999端口已被服务器上的另一个程序调用,可以改为另一个没被占用的9998
(2)/tmp/.s.PGSQL.9999文件没清除
直接/tmp/.s.PGSQL.9999即可
2. failover_stream.sh脚本问题
网上各种切换脚本,经测试发现现有的脚本可以应对大多的failover的情况
3. ssh互信设置不全
测试过程中发现,主切到备之后,备无法切回主
经调查,发现如果vip在rac1上,当前使用的pgpool进程就是rac1上的pgpool进程;若主机failover,rac2升为primary,此时pgpool还是用的rac1的,即failover脚本还是使用的rac1上的脚本;此时需要rac1能ssh到rac1;
rac2同理
4. pg_md5问题
刚开始不太懂,直接没带任何参数执行了pg_md5,导致密码不对,pcp_attach_node时总是提示用户名密码不匹配
经https://www.cmd5.com/这个网站查询发现,当时密文的明文时pwd,而非postgres
故重新执行pg_md5 postgres得到e8a48653851e28c69d0506508fb27fc5,问题得以解决
5. recovery.done问题
主机(rac1) pg停了后,primary由rac2代理,重新将rac1起来并添加到pgpool的过程中,忘记
mv recovery.done recovery.conf
导致rac1上的pg起来后仍为主库
将rac1上的pg停了,然后mv recovery.done recovery.conf再起pg后,数据无法从rac2同步到rac1,从而导致主从结构备破坏,只能初始化rac1,将rac1备库重建
pg_basebackup -h 10.45.53.31 -U replica -D /postgresql/data -X stream -P
mv recovery.done recovery.conf
将recovery.conf文件修改等
6. pg_rewind
当报如下错的时候可以用pg_rewind 来解决,但是经测试,一旦从节点被pg_rewind 后,主从数据无法同步,所以pg_rewind 的用途待商榷
pg_rewind --target-pgdata=/postgresql/data --source-server='host=rac1 port=5432 user=postgres'
使用pgadmin可以管理pg库,下载地址https://www.postgresql.org/ftp/pgadmin/pgadmin4/v4.10/windows/
下载后是网页管理