PostgreSQL主从复制切换和负载均衡

1. 主机规划

 

主机名

IP

端口号

备注

rac1

10.45.53.30

5432

PG主库(版本10.7)

rac2

10.45.53.31

5432

PG备库(版本10.7)

VIP

10.45.53.33

NULL

浮动IP,被pgpool管控

rac1

10.45.53.30

9998

pgpool(版本3.7)

rac2

10.45.53.31

9998

pgpool(版本3.7)

 

2. 主从复制搭建

1. 下载rpm包到主备机

postgresql10-10.7-1PGDG.rhel7.x86_64.rpm

postgresql10-contrib-10.7-1PGDG.rhel7.x86_64.rpm

postgresql10-libs-10.7-1PGDG.rhel7.x86_64.rpm

postgresql10-server-10.7-1PGDG.rhel7.x86_64.rpm

下载地址:https://download.postgresql.org/pub/repos/yum/10/redhat/rhel-7.4-x86_64/

2. 安装pg

  • 主备安装rpm

rpm -ivh postgresql10-libs-10.7-1PGDG.rhel7.x86_64.rpm

rpm -ivh postgresql10-10.7-1PGDG.rhel7.x86_64.rpm

rpm -ivh postgresql10-contrib-10.7-1PGDG.rhel7.x86_64.rpm

rpm -ivh postgresql10-server-10.7-1PGDG.rhel7.x86_64.rpm

pg的脚本自动创建在/usr/pgsql-10

安装过程中已经自动建了postgres用户,并且家目录为/var/lib/pgsql

  • 主备改变postgres的家目录为业务目录

usermod -d /postgresql postgres

  • 主备修改postgres用户的密码

password postgres

su - postgres

mkdir data

mkdir pg_archive

chmod 700 data

chmod 700 pg_archive

  • 主备配置.bash_profile

postgres@rac1[/postgresql]$cat .bash_profile

export PG_HOME=/usr/pgsql-10

export PGDATA=/postgresql/data

export PATH=$PATH:$HOME/bin:$PG_HOME/bin

  • 主库初始化PG

initdb -D /postgresql/data

cd /postgresql/data

  • 主库配置postgresql.conf文件

cat postgresql.conf

data_directory = '/postgresql/data'

port = 5432

listen_addresses = '*'

max_connections = 999

max_standby_streaming_delay = 30s

wal_receiver_status_interval = 10s

hot_standby_feedback = on

archive_mode = on

archive_command = 'cp %p /postgresql/pg_archive/%f'

wal_level = hot_standby

hot_standby = on

wal_sender_timeout = 60s

restart_after_crash = off

wal_log_hints = on   ##很重要,pg_rewind用得到

max_wal_senders=5

  • 主库添加以下到pg_hba.conf

host   all             all              10.45.53.30/32      trust

host   all             all              10.45.53.31/32      trust

host   replication      replica         0.0.0.0/0      trust

  • 主库设置recovery.done以备主备切换

cp /usr/pgsql-10/share/recovery.conf.sample /postgresql/data/recovery.done

postgres@rac1[/postgresql/data]$vi recovery.done
standby_mode = on

primary_conninfo = 'host=10.45.53.31 port=5432 user=replica password=replica'

recovery_target_timeline = 'latest'

trigger_file = '/tmp/trigger_file0'

  • 主库启动pg并建立复制用户,修改密码

pg_ctl start

psql

CREATE ROLE replica login replication encrypted password 'replica';

ALTER USER postgres WITH PASSWORD 'postgres';

3. 初始化从库

su - postgres

pg_basebackup -h 10.45.53.30 -U replica -D /postgresql/data -X stream -P

cd /postgresql/data

mv recovery.done recovery.conf

vi recovery.conf

standby_mode = on

primary_conninfo = 'host=10.45.53.30 port=5432 user=replica password=replica'

recovery_target_timeline = 'latest'

trigger_file = '/tmp/trigger_file0'

 

pg_ctl start

4. 确认主从

postgres@rac1[/postgresql/data]$psql -h 10.45.53.30 -p 5432

psql (10.7)

Type "help" for help.

postgres=# select client_addr,sync_state from pg_stat_replication;

client_addr | sync_state

-------------+------------

10.45.53.31 | async

(1 row)

postgres=# select pg_is_in_recovery  from pg_is_in_recovery();

pg_is_in_recovery

-------------------

f

(1 row)

postgres@rac1[/postgresql/data]$psql -h 10.45.53.31 -p 5432

psql (10.7)

Type "help" for help.

postgres=# select pg_is_in_recovery  from pg_is_in_recovery();

pg_is_in_recovery

-------------------

t

(1 row)

pg_is_in_recovery为f表示false,是主库,t表示true,是主库

5. 测试主从

从库只可读,不可写,在主库执行

create database test;

\l查看从库所有库,发现同步成功

 

3. 主从复制切换

  • 如果主库down或者pg程序断掉,可以通过promote将从库升级为新主库

pg_ctl promote -D /postgresql/data

  • 若原主库修复好,可以把原主库设置为新主库的从库

mv recovery.done recovery.conf

pg_ctl start

4. pgpool实现负载均衡

1. pgpool功能介绍

pgpool-II是PostgreSQL服务器之间一种有效的中间件和PostgreSQL数据库客户端,可实现pg主从集群的负载均衡:select操作分配在主从库,insert,create等操作分配在主库上

2. 下载,主备库安装rpm包

下载地址 http://www.pgpool.net/yum/rpms/3.7/redhat/rhel-7-x86_64/

rpm -ivh pgpool-II-pg10-3.7.0-1pgdg.rhel7.x86_64.rpm

默认安装在/etc/pgpool-II

chown -R /etc/pgpool-II

 

3. 配置参数文件

主库:

su - postgres

mkdir pgpool

cd pgpool

mkdir log

root@rac1[/etc/pgpool-II]#cat pgpool.conf

# CONNECTIONS

listen_addresses = '*'

port = 9998

pcp_listen_addresses = '*'

pcp_port = 9898

# - Backend Connection Settings -

backend_hostname0 = 'rac1'

backend_port0 = 5432

backend_weight0 = 1

backend_data_directory0 = '/postgresql/data'

backend_flag0 = 'ALLOW_TO_FAILOVER'

backend_hostname1 = 'rac2'

backend_port1 = 5432

backend_weight1 = 1

backend_data_directory1 = '/postgresql/data'

backend_flag1 = 'ALLOW_TO_FAILOVER'

# - Authentication -

enable_pool_hba = on

pool_passwd = 'pool_passwd'

# FILE LOCATIONS

pid_file_name = '/postgresql/pgpool/pgpool.pid'

logdir = '/postgresql/pgpool/log'

replication_mode = off

load_balance_mode = on

master_slave_mode = on

master_slave_sub_mode = 'stream'

sr_check_period = 5

sr_check_user = 'repuser'

sr_check_password = 'repuser'

sr_check_database = 'postgres'

#------------------------------------------------------------------------------

# HEALTH CHECK 健康检查

#------------------------------------------------------------------------------

health_check_period = 10 # Health check period

                                 # Disabled (0) by default

health_check_timeout = 20

                                 # Health check timeout

                                 # 0 means no timeout

health_check_user = 'repuser'

                                # Health check user

health_check_password = 'repuser'    #数据库密码

                                 # Password for health check user

health_check_database = 'postgres'

#必须设置,否则primary数据库down了,pgpool不知道,不能及时切换。从库流复制还在连接数据,报连接失败。

#只有下次使用pgpool登录时,发现连接不上,然后报错,这时候,才知道挂了,pgpool进行切换。

#主备切换的命令行配置

#------------------------------------------------------------------------------

# FAILOVER AND FAILBACK

#------------------------------------------------------------------------------

failover_command = '/postgresql/pgpool/failover_stream.sh %d %H %P'

#------------------------------------------------------------------------------

# WATCHDOG

#------------------------------------------------------------------------------

# - Enabling -

use_watchdog = on

# - Watchdog communication Settings -

wd_hostname = 'rac1'

                                  # Host name or IP address of this watchdog

                                  # (change requires restart)

wd_port = 9000

                                  # port number for watchdog service

                                  # (change requires restart)

# - Virtual IP control Setting -

delegate_IP = '10.45.53.33'

                                  # delegate IP address

                                  # If this is empty, virtual IP never bring up.

                                  # (change requires restart)

if_cmd_path = '/sbin'

                                  # path to the directory where if_up/down_cmd

                                  # (change requires restart)

if_up_cmd = 'ifconfig ens192:5 $_IP_$ netmask 255.255.255.0'

                                  # startup delegate IP command

                                  # (change requires restart)

if_down_cmd = 'ifconfig ens192:5 down'

                                  # shutdown delegate IP command

                                  # (change requires restart)

# -- heartbeat mode --

wd_heartbeat_port = 9694

                                  # Port number for receiving heartbeat signal

                                  # (change requires restart)

wd_heartbeat_keepalive = 2

                                  # Interval time of sending heartbeat signal (sec)

                                  # (change requires restart)

wd_heartbeat_deadtime = 30

                                  # Deadtime interval for heartbeat signal (sec)

                                  # (change requires restart)

heartbeat_destination0 = 'rac2'

                                  # Host name or IP address of destination 0

                                  # for sending heartbeat signal.

                                  # (change requires restart)

heartbeat_destination_port0 = 9694

                                  # Port number of destination 0 for sending

                                  # heartbeat signal. Usually this is the

                                  # same as wd_heartbeat_port.

                                  # (change requires restart)

heartbeat_device0 = 'ens192'

                                  # Name of NIC device (such like 'eth0')

                                  # used for sending/receiving heartbeat

                                  # signal to/from destination 0.

                                  # This works only when this is not empty

                                  # and pgpool has root privilege.

                                  # (change requires restart)

# - Other pgpool Connection Settings -

other_pgpool_hostname0 = 'rac2'        #对端

                                  # Host name or IP address to connect to for

                                  # (change requires restart)

other_pgpool_port0 = 9998

                                  # Port number for othet pgpool 0

                                  # (change requires restart)

other_wd_port0 = 9000

                                  # Port number for othet watchdog 0

                                  # (change requires restart)

添加以下到pool_hba.conf

host   all             all              0.0.0.0/0      trust

host   replication      replica          0/0      trust

使用md5加密得到postgres的密码的密文

postgres@rac1[/postgresql]$pg_md5 postgres

e8a48653851e28c69d0506508fb27fc5

设置pcp.conf

# USERID:MD5PASSWD

postgres:e8a48653851e28c69d0506508fb27fc5

 

备库

postgres@rac2[/etc/pgpool-II]$cat pgpool.conf

# CONNECTIONS

listen_addresses = '*'

port = 9998

pcp_listen_addresses = '*'

pcp_port = 9898

# - Backend Connection Settings -

backend_hostname0 = 'rac1'

backend_port0 = 5432

backend_weight0 = 1

backend_data_directory0 = '/postgresql/data'

backend_flag0 = 'ALLOW_TO_FAILOVER'

backend_hostname1 = 'rac2'

backend_port1 = 5432

backend_weight1 = 1

backend_data_directory1 = '/postgresql/data'

backend_flag1 = 'ALLOW_TO_FAILOVER'

# - Authentication -

enable_pool_hba = on

pool_passwd = 'pool_passwd'

# FILE LOCATIONS

pid_file_name = '/postgresql/pgpool/pgpool.pid'

logdir = '/postgresql/pgpool/log'

replication_mode = off

load_balance_mode = on

master_slave_mode = on

master_slave_sub_mode = 'stream'

sr_check_period = 5

sr_check_user = 'repuser'

sr_check_password = 'repuser'

sr_check_database = 'postgres'

#------------------------------------------------------------------------------

# HEALTH CHECK 健康检查

#------------------------------------------------------------------------------

health_check_period = 10 # Health check period

                                 # Disabled (0) by default

health_check_timeout = 20

                                 # Health check timeout

                                 # 0 means no timeout

health_check_user = 'repuser'

                                 # Health check user

health_check_password = 'repuser' #数据库密码

                                 # Password for health check user

health_check_database = 'postgres'

#必须设置,否则primary数据库down了,pgpool不知道,不能及时切换。从库流复制还在连接数据,报连接失败。

#只有下次使用pgpool登录时,发现连接不上,然后报错,这时候,才知道挂了,pgpool进行切换。

#主备切换的命令行配置

#------------------------------------------------------------------------------

# FAILOVER AND FAILBACK

#------------------------------------------------------------------------------

failover_command = '/postgresql/pgpool/failover_stream.sh %d %H %P'

#------------------------------------------------------------------------------

# WATCHDOG

#------------------------------------------------------------------------------

# - Enabling -

use_watchdog = on

# - Watchdog communication Settings -

wd_hostname = 'rac2'  #本端

                                  # Host name or IP address of this watchdog

                                  # (change requires restart)

wd_port = 9000

                                  # port number for watchdog service

                                  # (change requires restart)

# - Virtual IP control Setting -

delegate_IP = '10.45.53.33'

                                  # delegate IP address

                                  # If this is empty, virtual IP never bring up.

                                  # (change requires restart)

if_cmd_path = '/sbin'

                                  # path to the directory where if_up/down_cmd exists

                                  # (change requires restart)

if_up_cmd = 'ifconfig ens192:5 inet $_IP_$ netmask 255.255.255.0'

                                  # startup delegate IP command

                                  # (change requires restart)

if_down_cmd = 'ifconfig ens192:5 down'

                                  # shutdown delegate IP command

                                  # (change requires restart)

# -- heartbeat mode --

wd_heartbeat_port = 9694

                                  # Port number for receiving heartbeat signal

                                  # (change requires restart)

wd_heartbeat_keepalive = 2

                                  # Interval time of sending heartbeat signal (sec)

                                  # (change requires restart)

wd_heartbeat_deadtime = 30

                                  # Deadtime interval for heartbeat signal (sec)

                                  # (change requires restart)

heartbeat_destination0 = 'rac1'      #对端

                                  # Host name or IP address of destination 0

                                  # for sending heartbeat signal.

                                  # (change requires restart)

heartbeat_destination_port0 = 9694

                                  # Port number of destination 0 for sending

                                  # heartbeat signal. Usually this is the

                                  # same as wd_heartbeat_port.

                                  # (change requires restart)

heartbeat_device0 = 'ens192'

                                  # Name of NIC device (such like 'eth0')

                                  # used for sending/receiving heartbeat

                                  # signal to/from destination 0.

                                  # This works only when this is not empty

                                  # and pgpool has root privilege.

                                  # (change requires restart)

# - Other pgpool Connection Settings -

other_pgpool_hostname0 = 'rac1'      #对端

                                  # Host name or IP address to connect to for

                                  # (change requires restart)

other_pgpool_port0 = 9998

                                  # Port number for othet pgpool 0

                                  # (change requires restart)

other_wd_port0 = 9000

                                  # Port number for othet watchdog 0

                                  # (change requires restart)

 

postgres@rac2[/etc/pgpool-II]$cat pcp.conf

# USERID:MD5PASSWD

postgres:e8a48653851e28c69d0506508fb27fc5

 

postgres@rac2[/etc/pgpool-II]$cat pool_hba.conf

# "local" is for Unix domain socket connections only

local   all         all                               trust

# IPv4 local connections:

host    all         all         127.0.0.1/32          trust

host    all         all         ::1/128               trust

host   all             all              0.0.0.0/0      trust

host   replication      replica          0/0      trust

4. failover_stream.sh脚本(主备都需要)

chmod 700 failover_stream.sh
postgres@rac2[/postgresql/pgpool]$cat failover_stream.sh

#!/bin/bash

# Special values:

#  1. %d = node id,我测试了下,%d指的是当前的node id,比如主库是0,从库是1

#  2. %h = host name

#  3. %p = port number

#  4. %D = database cluster path

#  5. %m = new master node id

#  6. %H = hostname of the new master node %M指的是新主库的node id

#  7. %M = old master node id

#  8. %P = old primary node id  %P指当前的主库node id

#  9. %r = new master port number

#  10.%R = new master database cluster path

#  11.%% = '%' character

failed_node_id=$1

new_master_host_name=$2

old_primary_node_id=$3

promote_command="/usr/pgsql-10/bin/pg_ctl promote -D /postgresql/data"

echo $old_primary_node_id,$new_master_host_name,$failed_node_id >> /postgresql/pgpool/log/pool.log

/usr/bin/ssh -T $new_master_host_name $promote_command

exit 0;

5. 其他配置

  1. vi /ect/hosts

10.45.53.30 rac1

10.45.53.31 rac2

10.45.53.33 vip

  1. 主备库配置ssh互信,rac1和rac2互信,rac1和rac1互信,rac2和rac2互信

ssh-keygen -t rsa

ssh-copy-id -i /postgresql/.ssh/id_rsa 10.45.53.31

ssh-copy-id -i /postgresql/.ssh/id_rsa 10.45.53.30

  1. 在主库创建用户

create user repuser with password 'repuser';

  1. 确保5432 9998 9898 9000 9694端口未被占用,并且防火墙开放

6. 启动pgpool

在postgres用户下启动:

主库:

pgpool -n -d -D > /postgresql/pgpool/log/pgpool.log 2>&1 &

 

等到主库的虚拟ip起来,可以启动备库

pgpool -n -d -D > /postgresql/pgpool/log/pgpool.log 2>&1 &

7. 测试pgpool

psql -h 10.45.53.33 -p 9998

PostgreSQL主从复制切换和负载均衡_第1张图片

8. 测试pgpool负载均衡功能

for i in `seq 1 10`; do psql -h 10.45.53.33 -p 9998  -c "select pg_is_in_recovery  from pg_is_in_recovery();"; done

PostgreSQL主从复制切换和负载均衡_第2张图片

说明select语句实现主库和从库负载均衡

pgpool -m fast stop来停止pgpool

5. 测试各种情况

1. 备库pg断

postgres@rac2[/postgresql/pgpool]$pg_ctl stop

waiting for server to shut down.... done

server stopped

PostgreSQL主从复制切换和负载均衡_第3张图片

可以看出,从库down后,primary还是原主库

把备库重新加入pgpool

从库执行:

pg_ctl start

pcp_attach_node -d -U postgres -h 10.45.53.33 -p 9898 -n 1

输入的密码为

PostgreSQL主从复制切换和负载均衡_第4张图片

2. 主库pg断

postgres@rac1[/postgresql/data/log]$pg_ctl stop

waiting for server to shut down.... done

server stopped

PostgreSQL主从复制切换和负载均衡_第5张图片

可以看出failover_stream.sh脚本自动将从库promote为primary,可读写

将原主库重新加到pgpool

pcp_attach_node -d -U postgres -h 10.45.53.33 -p 9898 -n 0

PostgreSQL主从复制切换和负载均衡_第6张图片

3. 主库pgpool进程中断

此时的主库为rac2,将pgpool进程停掉

pgpool -m fast stop

PostgreSQL主从复制切换和负载均衡_第7张图片

此时由于rac1上的pgpool还在运行,接管资源,所以无影响,实现高可用

将rac2上的pgpool重新拉起即可,若不拉起,则rac1上的pgpool中断,或者rac1服务器down,则10.45.53.33将无法访问

pgpool -n -d -D > /postgresql/pgpool/log/pgpool.log 2>&1 &

4. 主库服务器down

此时主库为rac2,将rac2 reboot,若原先vip起在rac2上的,此时连接vip会有中断,过2秒便可连上,因为rac1上会自动起vip        PostgreSQL主从复制切换和负载均衡_第8张图片                        

primary自动切换为rac1

等待rac2起来后,需要初始化rac2

su - postgres

mv data data_bk

pg_basebackup -h 10.45.53.30 -U replica -D /postgresql/data -X stream -P

cd /postgresql/data

mv recovery.done recovery.conf

将recovery.conf文件修改

cat recovery.conf

standby_mode = on

primary_conninfo = 'host=10.45.53.30 port=5432 user=replica password=replica'

recovery_target_timeline = 'latest'

trigger_file = '/tmp/trigger_file0'

pg_ctl start

pgpool -n -d -D > /postgresql/pgpool/log/pgpool.log 2>&1 &

pcp_attach_node -d -U postgres -h 10.45.53.33 -p 9898 -n 1

 

pg_ctl start

pgpool -n -d -D > /postgresql/pgpool/log/pgpool.log 2>&1 &

pcp_attach_node -d -U postgres -h 10.45.53.33 -p 9898 -n 1

 

6. pgpool遇到的坑

1. "Address already in use"

pgpool无法启动,log显示"Address already in use" ,此时有两种可能:

(1) 端口被占用

发现9999端口已被服务器上的另一个程序调用,可以改为另一个没被占用的9998

(2)/tmp/.s.PGSQL.9999文件没清除

直接/tmp/.s.PGSQL.9999即可

2. failover_stream.sh脚本问题

网上各种切换脚本,经测试发现现有的脚本可以应对大多的failover的情况

3. ssh互信设置不全

测试过程中发现,主切到备之后,备无法切回主

经调查,发现如果vip在rac1上,当前使用的pgpool进程就是rac1上的pgpool进程;若主机failover,rac2升为primary,此时pgpool还是用的rac1的,即failover脚本还是使用的rac1上的脚本;此时需要rac1能ssh到rac1;

rac2同理

4. pg_md5问题

刚开始不太懂,直接没带任何参数执行了pg_md5,导致密码不对,pcp_attach_node时总是提示用户名密码不匹配

经https://www.cmd5.com/这个网站查询发现,当时密文的明文时pwd,而非postgres

故重新执行pg_md5 postgres得到e8a48653851e28c69d0506508fb27fc5,问题得以解决

5. recovery.done问题

主机(rac1) pg停了后,primary由rac2代理,重新将rac1起来并添加到pgpool的过程中,忘记

mv recovery.done recovery.conf

导致rac1上的pg起来后仍为主库

将rac1上的pg停了,然后mv recovery.done recovery.conf再起pg后,数据无法从rac2同步到rac1,从而导致主从结构备破坏,只能初始化rac1,将rac1备库重建

pg_basebackup -h 10.45.53.31 -U replica -D /postgresql/data -X stream -P

mv recovery.done recovery.conf

将recovery.conf文件修改等

6. pg_rewind

当报如下错的时候可以用pg_rewind 来解决,但是经测试,一旦从节点被pg_rewind 后,主从数据无法同步,所以pg_rewind 的用途待商榷

image.png

pg_rewind  --target-pgdata=/postgresql/data --source-server='host=rac1 port=5432 user=postgres'

7. PG客户端

使用pgadmin可以管理pg库,下载地址https://www.postgresql.org/ftp/pgadmin/pgadmin4/v4.10/windows/

下载后是网页管理

PostgreSQL主从复制切换和负载均衡_第9张图片

PostgreSQL主从复制切换和负载均衡_第10张图片

你可能感兴趣的:(postgresql,postgresql)