上一期讲过,因为DV的问题,在捯饬PDB克隆的问题。在写上一期(事情是前一天的,文章是今天写的)之前,做了一个又建了个测试PDB在做测试,仍然以失败告终,随即在主库删了PDB,开始写昨天的事情。结果短信告警就来了。
短信告警是啥内容呢:
event_type:metric_alert,severity:Critical,target_name:xxdbaas,message:The Data Guard status of xxdbdg is Error ORA-16810: multiple errors or warnings detected for the member.,occured:2023-09-13 10:49:37
到DGMGRL上去一看备库状态,发现错误:
Database Error(s):
** ORA-16766: Redo Apply is stopped**
这就很奇怪了,尝试用下面的命令去启动备库apply,但是问题依旧:
edit database xxdbdg set state='APPLY-ON';
老规矩,看备库告警日志,发下以下问题:
MRP0: Background Media Recovery terminated with error 1274
...
ORA-01274: cannot add datafile '/path/to/file' – file could not be created
这里发现是删除了的测试PDB报的数据文件创建的报错。
到数据库通过show pdbs命令检查发现主库已经删除了的测试PDB仍然存在,尝试操作该PDB则显示PDB不存在,这里明显是因为备库控制文件没有更新导致测试PDB在备库出现了异常,无法进行正常的数据同步。(当然也可能是我删主库上的测试PDB删太快了)
大概率知道是控制文件没有更新导致的,那就开始进行操作:
alter system set dg_broker_start=false;
alter database recover managed standby database cancel;
shutdown immediate
startup nomount;
rman target /
RMAN> RESTORE STANDBY CONTROLFILE FROM SERVICE xxdbaas;
RMAN> ALTER DATABASE MOUNT;
RMAN> catalog start with '+DATAC1/XXDBDG/';
... YES or NO? YES
RMAN> SWITCH DATABASE TO COPY;
alter database recover managed standby database disconnect from session;
alter system set dg_broker_start=true;
通过DGMGRL检查备库应用情况,可以看到数据同步正常运行了,但是仍然有俩warning:
Database Warning(s):
ORA-16826: apply service state is inconsistent with the DelayMins property
ORA-16789: standby redo logs configured incorrectly
检查备库的日志状态发现两个问题:
alter database add logfile thread 3 group 10 size 10g;
alter database add logfile thread 3 group 11 size 10g;
alter database add logfile thread 3 group 12 size 10g;
alter database add logfile thread 4 group 13 size 10g;
alter database add logfile thread 4 group 14 size 10g;
alter database add logfile thread 4 group 15 size 10g;
alter database add logfile thread 4 group 16 size 10g;
+DATAC1/MUST_RENAME_THIS_LOGFILE_31.4294967295.4294967295
alter system set standby_file_management=manual;
alter database recover managed standby database cancel;
alter database drop standby logfile group XX;
...
alter database add standby logfile thread x group XX size 10g;
...
--删除和添加操作期间可能需要期间可能需要多次启停备库和开关日志应用
经过以上操作最后启动备库并开启日志应用,DGMGRL显示恢复正常:
show database xxdbdg
Database - xxdbdg
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 0 seconds (computed 0 seconds ago)
Apply Lag: 0 seconds (computed 0 seconds ago)
Average Apply Rate: 17.91 MByte/s
Real Time Query: ON
Instance(s):
xxdbdg1 (apply instance)
xxdbdg2
xxdbdg3
xxdbdg4
Database Status:
SUCCESS
本期处理了一个DG备端的异常。
老规矩,知道写了些啥。