目录
文档用途
详细信息
文档用途
HG_REPMGR自动故障转移配置参考
详细信息
配置集群自动故障转移(failover),需要为集群中的每个节点开启 repmgrd 守护进程。当主节点出现故障后,会自动将合适的备节点提升为新主节点,继
续对外提供服务。示例如下。
1. 配置 postgresql.replication.conf 文件(所有节点)
在上述 postgresql.replication.conf 的基础上,添加如下参数:
shared_preload_libraries = 'repmgr'
或者
alter system set shared_preload_libraries =pg_pathman,timescaledb,repmgr;
|
重启数据库:
pg_ctl restart |
2. 配置 hg_repmgr.conf(所有节点)
在现有的 hg_repmgr.conf 文件中添加如下参数:
failover=automatic promote_command='repmgr -f /opt/highgo/5.6.1/conf/hg_repmgr.conf standby promote' follow_command='repmgr -f /opt/highgo/5.6.1/conf/hg_repmgr.conf standby follow --upstream-node-id=%n' |
如果需要将 repmgr 的日志定位到固定的日志文件可添加 log_file 参数,如 下:
log_file='/opt/highgo/5.6.1/conf/data/log/hg_repmgr.log' |
为了防止上述日志文件不断膨胀,可配置系统的 logrotate。(详细步骤略)
3. 开启 repmgrd 进程(所有节点)
repmgrd -f /opt/highgo/5.6.1/conf/hg_repmgr.conf -d -p /tmp/hg_repmgrd.pid
[highgo@dbrs conf]$ repmgrd -d -p /tmp/hg_repmgrd.pid [2019-05-06 14:02:42] [NOTICE] repmgrd (repmgrd 4.2) starting up [2019-05-06 14:02:42] [INFO] connecting to database "" [2019-05-06 14:02:43] [ERROR] repmgr extension not found on this node [2019-05-06 14:02:43] [DETAIL] repmgr extension is available but not installed in database "highgo" [2019-05-06 14:02:43] [HINT] check that this node is part of a repmgr cluster [highgo@dbrs conf]$
highgo=# \c You are now connected to database "highgo" as user "highgo".
create extension repmgr;
[highgo@dbrs conf]$ repmgrd -f /opt/highgo/5.6.1/conf/hg_repmgr.conf -d -p /tmp/hg_repmgrd.pid [2019-05-06 14:21:21] [NOTICE] repmgrd (repmgrd 4.2) starting up [2019-05-06 14:21:21] [INFO] connecting to database "host=dbrs user=hgrepmgr dbname=hgrepmgr connect_timeout=2" [highgo@dbrs conf]$ хϢ: set_repmgrd_pid(): provided pidfile is /tmp/hg_repmgrd.pid [2019-05-06 14:21:21] [NOTICE] starting monitoring of node "dbrs" (ID: 1) [2019-05-06 14:21:21] [NOTICE] monitoring cluster primary "dbrs" (node ID: 1)
[highgo@dbrs2 conf]$ repmgrd -f /opt/highgo/5.6.1/conf/hg_repmgr.conf -d -p /tmp/hg_repmgrd.pid [2019-05-06 14:21:50] [NOTICE] repmgrd (repmgrd 4.2) starting up [2019-05-06 14:21:50] [INFO] connecting to database "host=dbrs2 user=hgrepmgr dbname=hgrepmgr connect_timeout=2" [highgo@dbrs2 conf]$ хϢ: set_repmgrd_pid(): provided pidfile is /tmp/hg_repmgrd.pid [2019-05-06 14:21:50] [NOTICE] starting monitoring of node "dbrs2" (ID: 2) [2019-05-06 14:21:50] [INFO] monitoring connection to upstream node "dbrs" (node ID: 1)
[highgo@dbrs conf]$ ls -atl /tmp/hg_repmgrd.pid -rw-rw-r--. 1 highgo highgo 5 May 6 14:21 /tmp/hg_repmgrd.pid [highgo@dbrs conf]$ [highgo@dbrs2 conf]$ ls -atl /tmp/hg_repmgrd.pid -rw-rw-r--. 1 highgo highgo 5 May 6 14:21 /tmp/hg_repmgrd.pid [highgo@dbrs2 conf]$ |
提示:这个后台进程,每次重启服务器,都要手动启动吗?
开发回复:目前是,后期会修改为自动
查看集群状态
[highgo@dbrs conf]$ repmgr -f /opt/highgo/5.6.1/conf/hg_repmgr.conf cluster show ID | Name | Role | Status | Upstream | Location | Connection string ----+-------+---------+-----------+----------+----------+------------------------------------------------------------ 1 | dbrs | primary | * running | | default | host=dbrs user=hgrepmgr dbname=hgrepmgr connect_timeout=2 2 | dbrs2 | standby | running | dbrs | default | host=dbrs2 user=hgrepmgr dbname=hgrepmgr connect_timeout=2 [highgo@dbrs conf]$ |
模拟主节点故障
1)在 node1 上关闭数据库 pg_ctl stop 2)在 node2 上查看集群状态 [highgo@dbrs2 conf]$ repmgr -f /opt/highgo/5.6.1/conf/hg_repmgr.conf cluster show ID | Name | Role | Status | Upstream | Location | Connection string ----+-------+---------+-----------+----------+----------+------------------------------------------------------------ 1 | dbrs | primary | - failed | | default | host=dbrs user=hgrepmgr dbname=hgrepmgr connect_timeout=2 2 | dbrs2 | primary | * running | | default | host=dbrs2 user=hgrepmgr dbname=hgrepmgr connect_timeout=2
WARNING: following issues were detected - unable to connect to node "dbrs" (ID: 1) [highgo@dbrs2 conf]$
此时 node2 已经提升为 primary
日志 [highgo@dbrs2 conf]$ [2019-05-06 14:24:14] [WARNING] unable to connect to upstream node "dbrs" (node ID: 1) [2019-05-06 14:24:14] [INFO] checking state of node 1, 1 of 6 attempts [2019-05-06 14:24:14] [INFO] sleeping 10 seconds until next reconnection attempt [2019-05-06 14:24:24] [INFO] checking state of node 1, 2 of 6 attempts [2019-05-06 14:24:24] [INFO] sleeping 10 seconds until next reconnection attempt [2019-05-06 14:24:34] [INFO] checking state of node 1, 3 of 6 attempts [2019-05-06 14:24:34] [INFO] sleeping 10 seconds until next reconnection attempt [2019-05-06 14:24:44] [INFO] checking state of node 1, 4 of 6 attempts [2019-05-06 14:24:44] [INFO] sleeping 10 seconds until next reconnection attempt [2019-05-06 14:24:54] [INFO] checking state of node 1, 5 of 6 attempts [2019-05-06 14:24:54] [INFO] sleeping 10 seconds until next reconnection attempt
[highgo@dbrs2 conf]$ [2019-05-06 14:25:04] [INFO] checking state of node 1, 6 of 6 attempts [2019-05-06 14:25:04] [WARNING] unable to reconnect to node 1 after 6 attempts [2019-05-06 14:25:04] [NOTICE] this node is the only available candidate and will now promote itself [2019-05-06 14:25:04] [INFO] promote_command is: "repmgr -f /opt/highgo/5.6.1/conf/hg_repmgr.conf standby promote" NOTICE: promoting standby to primary DETAIL: promoting server "dbrs2" (ID: 2) using "/opt/highgo/5.6.1/bin/pg_ctl -w -D '/opt/highgo/5.6.1/data' promote" DETAIL: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete NOTICE: STANDBY PROMOTE successful DETAIL: server "dbrs2" (ID: 2) was successfully promoted to primary [2019-05-06 14:25:10] [INFO] switching to primary monitoring mode [2019-05-06 14:25:10] [NOTICE] monitoring cluster primary "dbrs2" (node ID: 2) |
更多详细信息请登录【瀚高技术支持平台】 查看https://support.highgo.com/#/index/docContent/fcf01ba201b12041