HG_REPMGR autofailvoer自动故障转移

原文链接: https://support.highgo.com/#/index/docContent/fcf01ba201b12041

目录

文档用途

详细信息

文档用途

HG_REPMGR自动故障转移配置参考

详细信息

配置集群自动故障转移(failover),需要为集群中的每个节点开启 repmgrd 守护进程。当主节点出现故障后,会自动将合适的备节点提升为新主节点,继

续对外提供服务。示例如下。

 

1. 配置 postgresql.replication.conf 文件(所有节点)

在上述 postgresql.replication.conf 的基础上,添加如下参数: 

 

shared_preload_libraries = 'repmgr'

 

或者

 

alter system set shared_preload_libraries =pg_pathman,timescaledb,repmgr;

  

 

重启数据库:  

 

pg_ctl restart

 

 

2. 配置 hg_repmgr.conf(所有节点)

在现有的 hg_repmgr.conf 文件中添加如下参数: 

 

failover=automatic     

promote_command='repmgr -f /opt/highgo/5.6.1/conf/hg_repmgr.conf standby promote'

follow_command='repmgr -f /opt/highgo/5.6.1/conf/hg_repmgr.conf standby follow --upstream-node-id=%n' 

 

 

如果需要将 repmgr 的日志定位到固定的日志文件可添加 log_file 参数,如 下:

 

log_file='/opt/highgo/5.6.1/conf/data/log/hg_repmgr.log'

  

为了防止上述日志文件不断膨胀,可配置系统的 logrotate。(详细步骤略)

 

3. 开启 repmgrd 进程(所有节点) 

 

 

repmgrd  -f /opt/highgo/5.6.1/conf/hg_repmgr.conf -d  -p /tmp/hg_repmgrd.pid

 

[highgo@dbrs conf]$ repmgrd  -d  -p /tmp/hg_repmgrd.pid

[2019-05-06 14:02:42] [NOTICE] repmgrd (repmgrd 4.2) starting up

[2019-05-06 14:02:42] [INFO] connecting to database ""

[2019-05-06 14:02:43] [ERROR] repmgr extension not found on this node

[2019-05-06 14:02:43] [DETAIL] repmgr extension is available but not installed in database "highgo"

[2019-05-06 14:02:43] [HINT] check that this node is part of a repmgr cluster

[highgo@dbrs conf]$

 

highgo=# \c

You are now connected to database "highgo" as user "highgo".

 

 

create extension repmgr;

 

[highgo@dbrs conf]$ repmgrd  -f /opt/highgo/5.6.1/conf/hg_repmgr.conf -d  -p /tmp/hg_repmgrd.pid

[2019-05-06 14:21:21] [NOTICE] repmgrd (repmgrd 4.2) starting up

[2019-05-06 14:21:21] [INFO] connecting to database "host=dbrs user=hgrepmgr dbname=hgrepmgr connect_timeout=2"

[highgo@dbrs conf]$ хϢ:  set_repmgrd_pid(): provided pidfile is /tmp/hg_repmgrd.pid

[2019-05-06 14:21:21] [NOTICE] starting monitoring of node "dbrs" (ID: 1)

[2019-05-06 14:21:21] [NOTICE] monitoring cluster primary "dbrs" (node ID: 1)

 

[highgo@dbrs2 conf]$ repmgrd  -f /opt/highgo/5.6.1/conf/hg_repmgr.conf -d  -p /tmp/hg_repmgrd.pid

[2019-05-06 14:21:50] [NOTICE] repmgrd (repmgrd 4.2) starting up

[2019-05-06 14:21:50] [INFO] connecting to database "host=dbrs2 user=hgrepmgr dbname=hgrepmgr connect_timeout=2"

[highgo@dbrs2 conf]$ хϢ:  set_repmgrd_pid(): provided pidfile is /tmp/hg_repmgrd.pid

[2019-05-06 14:21:50] [NOTICE] starting monitoring of node "dbrs2" (ID: 2)

[2019-05-06 14:21:50] [INFO] monitoring connection to upstream node "dbrs" (node ID: 1)

 

[highgo@dbrs conf]$ ls -atl /tmp/hg_repmgrd.pid

-rw-rw-r--. 1 highgo highgo 5 May  6 14:21 /tmp/hg_repmgrd.pid

[highgo@dbrs conf]$

[highgo@dbrs2 conf]$ ls -atl /tmp/hg_repmgrd.pid

-rw-rw-r--. 1 highgo highgo 5 May  6 14:21 /tmp/hg_repmgrd.pid

[highgo@dbrs2 conf]$

 

 提示:这个后台进程,每次重启服务器,都要手动启动吗?

开发回复:目前是,后期会修改为自动

 

查看集群状态

 

[highgo@dbrs conf]$ repmgr -f /opt/highgo/5.6.1/conf/hg_repmgr.conf cluster show

 ID | Name  | Role    | Status    | Upstream | Location | Connection string                                        

----+-------+---------+-----------+----------+----------+------------------------------------------------------------

 1  | dbrs  | primary | * running |          | default  | host=dbrs user=hgrepmgr dbname=hgrepmgr connect_timeout=2

 2  | dbrs2 | standby |   running | dbrs     | default  | host=dbrs2 user=hgrepmgr dbname=hgrepmgr connect_timeout=2

[highgo@dbrs conf]$

 

 

模拟主节点故障

 

1)在 node1 上关闭数据库

pg_ctl stop

2)在 node2 上查看集群状态

[highgo@dbrs2 conf]$ repmgr -f /opt/highgo/5.6.1/conf/hg_repmgr.conf cluster show

 ID | Name  | Role    | Status    | Upstream | Location | Connection string                                        

----+-------+---------+-----------+----------+----------+------------------------------------------------------------

 1  | dbrs  | primary | - failed  |          | default  | host=dbrs user=hgrepmgr dbname=hgrepmgr connect_timeout=2

 2  | dbrs2 | primary | * running |          | default  | host=dbrs2 user=hgrepmgr dbname=hgrepmgr connect_timeout=2

 

WARNING: following issues were detected

  - unable to connect to node "dbrs" (ID: 1)

[highgo@dbrs2 conf]$

 

此时 node2 已经提升为 primary

 

 

日志

[highgo@dbrs2 conf]$ [2019-05-06 14:24:14] [WARNING] unable to connect to upstream node "dbrs" (node ID: 1)

[2019-05-06 14:24:14] [INFO] checking state of node 1, 1 of 6 attempts

[2019-05-06 14:24:14] [INFO] sleeping 10 seconds until next reconnection attempt

[2019-05-06 14:24:24] [INFO] checking state of node 1, 2 of 6 attempts

[2019-05-06 14:24:24] [INFO] sleeping 10 seconds until next reconnection attempt

[2019-05-06 14:24:34] [INFO] checking state of node 1, 3 of 6 attempts

[2019-05-06 14:24:34] [INFO] sleeping 10 seconds until next reconnection attempt

[2019-05-06 14:24:44] [INFO] checking state of node 1, 4 of 6 attempts

[2019-05-06 14:24:44] [INFO] sleeping 10 seconds until next reconnection attempt

[2019-05-06 14:24:54] [INFO] checking state of node 1, 5 of 6 attempts

[2019-05-06 14:24:54] [INFO] sleeping 10 seconds until next reconnection attempt

 

[highgo@dbrs2 conf]$ [2019-05-06 14:25:04] [INFO] checking state of node 1, 6 of 6 attempts

[2019-05-06 14:25:04] [WARNING] unable to reconnect to node 1 after 6 attempts

[2019-05-06 14:25:04] [NOTICE] this node is the only available candidate and will now promote itself

[2019-05-06 14:25:04] [INFO] promote_command is:

  "repmgr -f /opt/highgo/5.6.1/conf/hg_repmgr.conf standby promote"

NOTICE: promoting standby to primary

DETAIL: promoting server "dbrs2" (ID: 2) using "/opt/highgo/5.6.1/bin/pg_ctl  -w -D '/opt/highgo/5.6.1/data' promote"

DETAIL: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete

NOTICE: STANDBY PROMOTE successful

DETAIL: server "dbrs2" (ID: 2) was successfully promoted to primary

[2019-05-06 14:25:10] [INFO] switching to primary monitoring mode

[2019-05-06 14:25:10] [NOTICE] monitoring cluster primary "dbrs2" (node ID: 2)

更多详细信息请登录【瀚高技术支持平台】 查看https://support.highgo.com/#/index/docContent/fcf01ba201b12041

你可能感兴趣的:(Highgo,DB)