架构设计:
服务器A: 192.168.173.145 postgresql primary keepalived primary
服务器B: 192.168.173.122 postgresql standby keepalived slave
VIP:192.168.173.188
注意,如果没特殊注明,两台机器均需部署
配置内核参数和资源限制
cat /usr/lib/sysctl.d/00-system.conf
添加以下内核参数
kernel.shmmni = 4096
kernel.sem = 50100 64128000 50100 1280
fs.file-max = 7672460
net.core.rmem_default = 1048576
net.core.rmem_max = 4194304
net.core.wmem_default = 262144
net.core.wmem_max = 1048576
net.core.netdev_max_backlog = 10000
fs.aio-max-nr = 1048576
net.ipv4.ip_local_port_range = 9000 65500
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.tcp_timestamps = 0
net.ipv4.tcp_keepalive_time=72
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_intvl = 7
vm.overcommit_memory = 0
vm.swappiness=0
vm.dirty_background_bytes=102400000
vm.dirty_bytes=102400000
系统资源限制添加以下条目
/etc/security/limits.conf
* soft nofile 131072
* hard nofile 131072
* soft nproc 131072
* hard nproc 131072
* soft core unlimited
* hard core unlimited
* soft memlock 50000000
* hard memlock 50000000
配置环境变量
--创建系统用户
useradd postgres
postgres用户添加环境变量
export PS1="$USER@`/bin/hostname -s`-> "
export PGUSER=postgres
export PGPORT=1921
export PGDATA=/opt/pgdata/pg_root
export LANG=en_US.utf8
export PGHOME=/opt/pgsql11.5
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:
export DATE=`date +"%Y%m%d%H%M"`
export PATH=$PGHOME/bin:$PATH:.
export MANPATH=$PGHOME/share/man:$MANPATH
alias rm='rm -i'
alias ll='ls -lh'
下载安装包
wget https://ftp.postgresql.org/pub/source/v11.5/postgresql-11.5.tar.gz
安装依赖包以及编译安装数据库
--安装依赖包
yum -y install coreutils glib2 lrzsz mpstat dstat sysstat e4fsprogs xfsprogs ntp readline-devel zlib-devel openssl-devel pam-devel libxml2-devel libxslt-devel python-devel tcl-devel gcc make smartmontools flex bison perl-devel perl-Ext Utils* openldap-devel jadetex openjade bzip2 nc mutt
--停止防火墙
systemctl stop firewalld.service
--解压并创建相关目录,并赋予postgres用户权限
tar -zxvf postgresql-11.5.tar.gz
mkdir -p /opt/pgsql11.5
mkdir -p /opt/pgdata/pg_root
chown -R postgres.postgres /opt/pgsql11.5 /opt/pgdata/pg_root
--编译安装
cd postgresql-11.5/
./configure --prefix=/opt/pgsql11.5 --with-pgport=1921 --with-segsize=8 --with-perl --with-python --with-openssl --with-pam --with-ldap --with-libxml --with-libxslt --enable-thread-safety
gmake world && gmake install-world
进入数据库,创建流复制用户(只需在primary操作),但是,.pgpass文件需要两台机器都存在
在postgres用户下,初始化数据库并启动数据库
initdb -D /opt/pgdata/pg_root
pg_ctl start
--创建流复制用户
CREATE USER repuser
REPLICATION
LOGIN
CONNECTION LIMIT 5
ENCRYPTED PASSWORD 'LKJKisdh767GHGHJshhsdh';
--添加.pgpass免密登陆
cat .pgpass
192.168.173.122:1921:replication:repuser:LKJKisdh767GHGHJshhsdh
192.168.173.145:1921:replication:repuser:LKJKisdh767GHGHJshhsdh
192.168.173.188:1921:replication:repuser:LKJKisdh767GHGHJshhsdh
chmod 0600 ~/.pgpass
--sky_pg_cluster 数据库配置
--初始数据部署
create role sky_pg_cluster superuser nocreatedb nocreaterole noinherit login encrypted password 'shakjhsaduy2uieyuJKHKJsd3';
create database sky_pg_cluster with template template1 encoding 'UTF8' owner sky_pg_cluster;
\c sky_pg_cluster sky_pg_cluster
create schema sky_pg_cluster ;
create table cluster_status (id int unique default 1, last_alive timestamp(0) without time zone);
--限制cluster_status表有且只有一行 :
CREATE FUNCTION cannt_delete ()
RETURNS trigger
LANGUAGE plpgsql AS $$
BEGIN
RAISE EXCEPTION 'You can not delete!';
END; $$;
--创建触发器
CREATE TRIGGER cannt_delete BEFORE DELETE ON cluster_status FOR EACH ROW EXECUTE PROCEDURE cannt_delete();
CREATE TRIGGER cannt_truncate BEFORE TRUNCATE ON cluster_status FOR STATEMENT EXECUTE PROCEDURE cannt_delete();
-- 插入初始数据
insert into cluster_status values (1, now());
数据库配置参数如下:(可自己根据需求配置)
listen_addresses = '*' # what IP address(es) to listen on;
port = 1921 # (change requires restart)
max_connections = 1000 # (change requires restart)
superuser_reserved_connections = 13 # (change requires restart)
tcp_keepalives_idle = 60 # TCP_KEEPIDLE, in seconds;
tcp_keepalives_interval = 20 # TCP_KEEPINTVL, in seconds;
shared_buffers = 16384MB # min 128kB
work_mem = 4MB # min 64kB
maintenance_work_mem = 1024MB # min 1MB
dynamic_shared_memory_type = posix # the default is the first option
wal_level = replica # minimal, replica, or logical
synchronous_commit = off # synchronization level;
full_page_writes = on # recover from partial page writes
wal_log_hints = on # also do full page writes of non-critical updates
wal_writer_delay = 20ms # 1-10000 milliseconds
max_wal_size = 8GB
min_wal_size = 128MB
archive_mode = on # enables archiving; off, on, or always
archive_command = '/usr/bin/date' # command to use to archive a logfile segment
wal_keep_segments = 256 # in logfile segments; 0 disables
hot_standby = on # "off" disallows queries during recovery
effective_cache_size = 60GB
log_destination = 'csvlog' # Valid values are combinations of
logging_collector = on # Enable capturing of stderr and csvlog
log_directory = 'pg_log' # directory where log files are written,
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # log file name pattern,
log_file_mode = 0600 # creation mode for log files,
log_rotation_age = 1d # Automatic rotation of logfiles will
log_rotation_size = 10MB # Automatic rotation of logfiles will
log_checkpoints = on
log_connections = on
log_statement = 'ddl' # none, ddl, mod, all
log_timezone = 'Asia/Shanghai'
track_activities = on
track_counts = on
track_functions = none # none, pl, all
autovacuum = on # Enable autovacuum subprocess? 'on'
autovacuum_max_workers = 4 # max number of autovacuum subprocesses
autovacuum_naptime = 1min # time between autovacuum runs
autovacuum_vacuum_threshold = 50 # min number of row updates before
autovacuum_analyze_threshold = 50 # min number of row updates before
autovacuum_vacuum_scale_factor = 0.2 # fraction of table size before vacuum
autovacuum_analyze_scale_factor = 0.1 # fraction of table size before analyze
autovacuum_freeze_max_age = 1500000000 # maximum XID age before forced vacuum
datestyle = 'iso, mdy'
timezone = 'Asia/Shanghai'
lc_messages = 'en_US.utf8' # locale for system error message
lc_monetary = 'en_US.utf8' # locale for monetary formatting
lc_numeric = 'en_US.utf8' # locale for number formatting
lc_time = 'en_US.utf8' # locale for time formatting
default_text_search_config = 'pg_catalog.english'
shared_preload_libraries = '' # (change requires restart)
pg_basebackup -D $PGDATA -Fp -Xs -v -P -h 192.168.173.145 -p 1921 -U repuser
cd $PGDATA
mv recovery.done recovery.conf
--recovery.conf内容如下
cat $PGDATA/recovery.conf
standby_mode = 'on'
recovery_target_timeline='latest'
primary_conninfo = 'host=192.168.173.188 port=1921 user=repuser keepalives_idle=60'
#restore_command = 'cp /path/to/archive/%f %p'
#archive_cleanup_command = 'pg_archivecleanup /path/to/archive %r'
下载keepalived,并编译安装
wget https://www.keepalived.org/software/keepalived-2.0.18.tar.gz
tar -zxvf keepalived-2.0.18.tar.gz
cd keepalived-2.0.18
./configure --prefix=/usr/local/keepalived --sysconf=/etc
make && make install
配置文件如下:
cat /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {
notification_email {
[email protected]
}
smtp_server 127.0.0.1
smtp_connect_timeout 30
router_id DB1_PG_HA
}
vrrp_script check_pg_alived {
script "/usr/local/bin/pg_moniter.sh"
interval 10 # 每10秒执行脚本检查一次
fall 5 # 5次失败就KO
}
vrrp_instance VI_1 {
state BACKUP #注意主备都是BACKUP,如果设置为MASTER-BACKUP,VIP会被MASTER抢占
nopreempt #不抢占VIP
interface eth0
virtual_router_id 12
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass t9rveMP0Z9S1
}
track_script {
check_pg_alived
}
virtual_ipaddress {
192.168.173.188
}
smtp_alert
notify_master /usr/local/bin/active_standby.sh
}
/usr/local/bin/pg_moniter.sh内容如下
#!/bin/bash
# Load Env
source /home/postgres/.bash_profile
export PGPORT=1921
export PGUSER=sky_pg_cluster
export PGDBNAME=sky_pg_cluster
export PGDATA=/opt/pgdata/pg_root
export LANG=en_US.utf8
export PGHOME=/opt/pgsql11.5
export PATH=$PGHOME/bin:$PATH:.
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib
MONITOR_LOG="/tmp/pg_monitor.log"
SQL1="update cluster_status set last_alive = now();"
SQL2='select 1;'
# 如果是备库,则退出,此脚本不检查备库存活状态
standby_flg=`psql -p $PGPORT -U postgres -At -c "select pg_is_in_recovery();"`
if [ ${standby_flg} == 't' ]; then
echo -e "`date +%F\ %T`: This is a standby database, exit!\n" >> $MONITOR_LOG
exit 0
fi
# 主库上更新 cluster_state 表
echo $SQL1 | psql -At -h 127.0.0.1 -p $PGPORT -U $PGUSER -d $PGDBNAME >> $MONITOR_LOG
# 判断自己端口是否可用
CMD=`nc -v -z 192.168.173.145 1921`
if [ $? -eq 0 ];then
ret="yes"
else
ret="no"
fi
COUNT=`ps -ef|grep postgres|wc -l`
if [ $COUNT -ge 5 ];then
process="yes"
else
process="no"
fi
echo $SQL2 | psql -At -h 127.0.0.1 -p $PGPORT -U $PGUSER -d $PGDBNAME
if [[ $? -eq 0 ]] && [[ $ret = "yes" ]] && [[ $process = "yes" ]]; then
echo -e "`date +%F\ %T`: Primary db is health." >> $MONITOR_LOG
exit 0
else
echo -e "`date +%F\ %T`: Attention: Primary db is not health!" >> $MONITOR_LOG
exit 1
fi
/usr/local/bin/active_standby.sh内容如下
#!/bin/bash
# 环境变量
source /home/postgres/.bash_profile
export PGPORT=1921
export PGUSER=sky_pg_cluster
export PG_OS_USER=postgres
export PGDBNAME=sky_pg_cluster
export PGDATA=/opt/pgdata/pg_root
export LANG=en_US.utf8
export PGHOME=/opt/pgsql11.5
export PATH=$PGHOME/bin:$PATH:.
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib
# 配置信息, LAG_MINUTES 配置允许的延迟时间
LAG_MINUTES=3
HOST_IP=`hostname -i`
NOTICE_EMAIL="[email protected]"
FAILOVE_LOG='/tmp/failover.log'
SQL1="select 'this_is_standby' as cluster_role from ( select pg_is_in_recovery() as std ) t where t.std is true;"
SQL2="select 'standby_in_allowed_lag' as cluster_lag from cluster_status where now()-last_alive < interval '$LAG_MINUTES min';"
# VIP 已发生漂移,记录到日志文件
echo -e "`date +%F\ %T`: keepalived VIP switchover!" >> $FAILOVE_LOG
# VIP 已漂移,邮件通知
echo -e "`date +%F\ %T`: ${HOST_IP}/${PGPORT} VIP 发生漂移,需排查问题!" | mutt -s "Error: 数据库 VIP 发生漂移 " ${NOTICE_EMAIL}
# pg_failover 函数,用于主库故障时激活从库
pg_failover()
{
# PROMOTE_STATUS 表示激活备库成功标志,1 表示失败,0 表示成功
PROMOTE_STATUS=1
# 激活备库
su - $PG_OS_USER -c "pg_ctl promote"
if [ $? -eq 0 ]; then
echo -e "`date +%F\ %T`: `hostname` promote standby success. "
PROMOTE_STATUS=0
fi
if [ $PROMOTE_STATUS -ne 0 ]; then
echo -e "`date +%F\ %T`: promote standby failed."
return $PROMOTE_STATUS
fi
echo -e "`date +%F\ %T`: pg_failover() function call success."
return 0
}
# 故障切换过程
# standby是否正常的标记(is in recovery), CNT=1 表示正常.
CNT=`echo $SQL1 | psql -At -h 127.0.0.1 -p $PGPORT -U $PGUSER -d $PGDBNAME -f - | grep -c this_is_standby`
#echo -e "CNT: $CNT"
# 判断 standby lag 是否在接受范围内的标记, LAG=1 表示正常.
LAG=`echo $SQL2 | psql -At -h 127.0.0.1 -p $PGPORT -U $PGUSER -d $PGDBNAME | grep -c standby_in_allowed_lag`
if [ $CNT -eq 1 ] && [ $LAG -eq 1 ] ; then
pg_failover >> $FAILOVE_LOG
if [ $? -ne 0 ]; then
echo -e "`date +%F\ %T`: pg_failover failed." >> $FAILOVE_LOG
exit 1
fi
else
echo -e "`date +%F\ %T`: `hostname` standby is not ok or laged far $LAG_MINUTES mintues from primary , failover not allowed! " >> $FAILOVE_LOG
exit 1
fi
--查看eth0,可见VIP在192.168.173.145上
ip addr show eth0
2: eth0: mtu 1500 qdisc mq state UP group default qlen 1000
link/ether b8:ac:6f:12:fd:b4 brd ff:ff:ff:ff:ff:ff
inet 192.168.173.145/24 brd 192.168.173.255 scope global eth0
valid_lft forever preferred_lft forever
inet 192.168.173.188/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::baac:6fff:fe12:fdb4/64 scope link
valid_lft forever preferred_lft forever
--模拟故障,关闭数据库
pg_ctl stop -m fast
--观察日志
tailf /tmp/pg_monitor.log
UPDATE 1
2019-09-20 14:39:11: Primary db is health.
2019-09-20 14:39:21: Attention: Primary db is not health!
2019-09-20 14:39:31: Attention: Primary db is not health!
2019-09-20 14:39:41: Attention: Primary db is not health!
2019-09-20 14:39:51: Attention: Primary db is not health!
2019-09-20 14:40:01: Attention: Primary db is not health!
五次之后被KO,192.168.173.122可观察到切换为主的日志
tailf /var/log/messages
Sep 20 14:40:01 192-168-173-122 Keepalived_vrrp[29340]: (VI_1) Backup received priority 0 advertisement
Sep 20 14:40:01 192-168-173-122 Keepalived_vrrp[29340]: (VI_1) Backup received priority 0 advertisement
Sep 20 14:40:02 192-168-173-122 Keepalived_vrrp[29340]: (VI_1) Receive advertisement timeout
Sep 20 14:40:02 192-168-173-122 Keepalived_vrrp[29340]: (VI_1) Entering MASTER STATE
Sep 20 14:40:02 192-168-173-122 Keepalived_vrrp[29340]: (VI_1) setting VIPs.
Sep 20 14:40:02 192-168-173-122 Keepalived_vrrp[29340]: Sending gratuitous ARP on eth0 for 192.168.173.188
Sep 20 14:40:02 192-168-173-122 Keepalived_vrrp[29340]: (VI_1) Sending/queueing gratuitous ARPs on eth0 for 192.168.173.188
Sep 20 14:40:02 192-168-173-122 Keepalived_vrrp[29340]: Sending gratuitous ARP on eth0 for 192.168.173.188
Sep 20 14:40:02 192-168-173-122 Keepalived_vrrp[29340]: Sending gratuitous ARP on eth0 for 192.168.173.188
Sep 20 14:40:02 192-168-173-122 Keepalived_vrrp[29340]: Sending gratuitous ARP on eth0 for 192.168.173.188
Sep 20 14:40:02 192-168-173-122 Keepalived_vrrp[29340]: Sending gratuitous ARP on eth0 for 192.168.173.188
Sep 20 14:40:02 192-168-173-122 Keepalived_vrrp[29340]: Remote SMTP server [127.0.0.1]:25 connected.
Sep 20 14:40:02 192-168-173-122 Keepalived_vrrp[29340]: SMTP alert successfully sent.
以下可见数据库已切换为primary
su - postgres
pg_controldata | grep 'cluster state'
Database cluster state: in production
...
如果原主库类似我这样干净的关闭,可以切换为备机,如果不是非干净关闭,无法正常成为备节点,请使用pg_rewind,注意使用前提:要么在初始化数据库的时候打开checksums,要么后面修改wal_log_hints为on,而且 full_page_writes 也要设置为on
如:在192.168.173.145执行pg_rewind重建备节点
pg_rewind --target-pgdata /opt/pgdata/pg_root --source-server='host=192.168.173.122 port=1921 user=postgres dbname=postgres' -P
cd $PGDATA
mv recovery.done recovery.conf
pg_controldata | grep 'cluster state'
Database cluster state: in archive recovery
--在新主节点192.168.173.122查看流复制状态,并新建一张表测试
psql
psql (11.5)
Type "help" for help.
--查看流复制状态
\x
Expanded display is on.
postgres=# select * from pg_stat_replication ;
-[ RECORD 1 ]----+------------------------------
pid | 17876
usesysid | 16384
usename | repuser
application_name | walreceiver
client_addr | 192.168.173.145
client_hostname |
client_port | 17519
backend_start | 2019-09-20 14:43:16.554519+08
backend_xmin |
state | streaming
sent_lsn | 0/1401D388
write_lsn | 0/1401D388
flush_lsn | 0/1401D388
replay_lsn | 0/1401D388
write_lag | 00:00:00.0006
flush_lag | 00:00:00.000757
replay_lag | 00:00:00.000985
sync_priority | 0
sync_state | async
--创建测试表
postgres=# create table tb1 (a int);
CREATE TABLE
--在新备节192.168.173.145点验证,tb1也已经存在
psql
psql (11.5)
Type "help" for help.
postgres=# \d tb1
Table "public.tb1"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
a | integer | | |
参考:
https://www.keepalived.org/manpage.html
https://www.postgresql.org/docs/11/high-availability.html
https://github.com/francs/PostgreSQL-Keepalived-HA/blob/master/install.txt