db_ha集群安装后的自动切换及切换后的判断步骤说明文档

瀚高数据库
目录
环境
文档用途
详细信息

环境
系统平台:Linux x86-64 Red Hat Enterprise Linux 7
版本:4.5.7
文档用途
本文档用于指导db_ha集群安装后的自动切换及切换后的判断步骤

详细信息
一、db_ha集群,会在主库断网,主库宕机的情况下发生主备切换,具体模拟实验请查看附件。

二、db_ha集群,原主库服务器在恢复正常状态之后,会自动降级为备库加入集群。

三、判断切换之后的集群为状态采用如下方法。

1、检查集群流复制

①主库查看send进程

ps -ef | grep -v grep | grep walsender

root     21466 19862  0 14:51 ?        00:00:00 postgres: walsender sysdba 192.168.80.230(36582) streaming 0/4053860

root     23017 19862  0 15:06 ?        00:00:00 postgres: walsender sysdba 192.168.80.228(47786) streaming 0/4053860

②备库查看receive进程

ps -ef | grep -v grep | grep walreceive

root     13489 13482  0 15:45 ?        00:00:06 postgres: walreceiver   streaming 0/6004460

③如果数据库查不到send和receive进程,只有post进程,数据库已经脱离流复制,成为单机。

ps -ef | grep post

root      4148     1  0 613 ?       00:04:36 /opt/HighGo4.5.7-see/bin/postgres -D /opt/HighGo4.5.7-see/data

root      4149  4148  0 613 ?       00:00:00 postgres: logger

root      4150  4148  1 613 ?       00:11:30 postgres: auditwriter

root      4152  4148  0 613 ?       00:00:00 postgres: checkpointer

root      4153  4148  0 613 ?       00:00:04 postgres: background writer

root      4154  4148  0 613 ?       00:00:54 postgres: stats collector

root      4156  4148  0 613 ?       00:00:00 postgres: audit archiver or cleanup

root      4314  4148  0 613 ?       00:00:04 postgres: walwriter

root      4315  4148  0 613 ?       00:00:01 postgres: autovacuum launcher

root      4316  4148  0 613 ?       00:00:00 postgres: archiver   last was 000000040000000000000008

root      4317  4148  0 613 ?       00:00:00 postgres: logical replication launche

④如果数据库post进程也查询不到,说明数据库不在运行。

2、使用集群命令查看

①集群状态正常:集群所有节点healthy=t,nodetype=STANDBY的节点streamingState=streaming表示正常。

/usr/local/db_ha/bin/db_ha select -f /usr/local/db_ha/conf/db_ha.conf

connect monitor success

cluster num = 3         secondary monitor is normal

nodeip=192.168.80.228,nodetype=PRIMARY,replicationName=ha228 streamingType=NONE streamingState=none healthy=t agentState=NORMAL

nodeip=192.168.80.229,nodetype=STANDBY,replicationName=ha229 streamingType=ASYNC streamingState=streaming healthy=t agentState=NORMAL

nodeip=192.168.80.230,nodetype=STANDBY,replicationName=ha230 streamingType=ASYNC streamingState=streaming healthy=t agentState=NORMAL

②集群检查,228节点异常,streamingState=none healthy=f。

/usr/local/db_ha/bin/db_ha select -f /usr/local/db_ha/conf/db_ha.conf
connect monitor success

cluster num = 3         secondary monitor is normal

nodeip=192.168.80.229,nodetype=PRIMARY,replicationName=ha229 streamingType=NONE streamingState=none healthy=t agentState=NORMAL

nodeip=192.168.80.228,nodetype=PRIMARY,replicationName=ha228 streamingType=NONE streamingState=none healthy=f agentState=UNUSUAL

nodeip=192.168.80.230,nodetype=STANDBY,replicationName=ha230 streamingType=ASYNC streamingState=streaming healthy=t agentState=NORMAL

3、pg_controldata查看数据库时间线和状态

注意:数据库各个节点时间线不一致,集群出现问题。

①如下表示主库正常运行

export LANG="en_US.UTF-8"

pg_controldata |grep -E "TimeLineID|state"

Database cluster state:               in production

Latest checkpoint's TimeLineID:       4

Latest checkpoint's PrevTimeLineID:   4

②如下表示备库正常运行

export LANG="en_US.UTF-8"

pg_controldata |grep -E "TimeLineID|state"

Database cluster state:               in archive recovery

Latest checkpoint's TimeLineID:       4

Latest checkpoint's PrevTimeLineID:   4

4、查看备库标志standby.signal

ll /opt/HighGo4.5.7-see/data/standby.signal

-rw------- 1 root root 18 613 15:45 /opt/HighGo4.5.7-see/data/standby.signal

你可能感兴趣的:(HighGo,DB之故障处理,数据库,oracle)