一次dg 因密码文件与gap引起归档日志无法应用的处理

问题描述:
Linux上11.2.0.4.0 RAC-->RAC做的DG。
在主库一个节点使用alter user sys命令对SYS用户进行了更新,之后因密码问题日志无法同步。
发现问题后重新同步了密码文件;但是备库无法通过FAL配置获取节点2的归档日志,GAP机制未生产。


解决方法:

密码文件的错误:
在以前也遇到过11.2.0.3时只能通过传输密码文件同步,通过相同命令在不同节点创建密码文件无法同步的问题。
本次同样通过传输密码文件同步。

重新配置FAL后,仍无法获取归档日志:

此时的思路是可以尝试手动从主库的节点scp传输归档日志过来进行注册和恢复或者主库重新关闭或开启相应的log目录,或者使用一下重启备库,重新启动相应的进程。

涉及的归档较多,因此先尝试了重启DG备库,幸运的是通过重启DG的备库的方式解决了此问题。

此数据库还有日志切换频繁(业务时段在每小时30次左右,高峰时达到50),此处就不再多说了。


关于GAP机制:
Dataguard的Gap处理机制是从9i开始设置fal_server和fal_client。
Oracle提供了2种log gap的检测和处理机制。对于gap的处理,fal_*参数在某些情况下并不是必须配置的。
   1.Automatic Gap Resolution
   2.FAL Gap Resolution
1.Automatic Gap Resolution
    从9i开始,Dataguard就引入了自动日志缺失检测的机制,无需设置任何fal_*参数,Datguard便运行在这种机制下。
当Lgwr和Arch进程发送redo/archive到standby端的时候,当前log sequence会同standby端RFS进程上次接收到的log sequence做比较,如果发现二者有断档,RFS会发送请求到primary端,要求主库传送缺失的日志。从9iR2开始,Automatic gap resolution 功能上得到增强。主库上的ARCH进程会每分钟检查备库上的日志gap情况并做相应处理。
2.FAL Gap Resolution
    FAL是Fetch Archive Log的缩写,通过配置FALserver和FALclient实现Gap检测的一种机制。当备端的RFS进程收到
archivelog的时候,更新standby的控制文件以记录这些归档信息,一旦MRP发现控制文件被更新,会进行Recover/Apply log。如果MRP发现所需的日志出现缺失或者所需的日志文件不可用(损坏或者被物理移除等),会通过FAL来发送相应的处理请求。MRP是standby端的恢复进程,不像RFS进程一样与parimary有直接关联,通过FAL的参数配置来主动请求primary处理gap。
 FAL_Server和fal_client是standby端的参数配置,考虑到switchover的平滑性,可考虑在primary 端也做预先设置。
FAL_SERVER: 指向primary端的Oracle Net service
FAL_CLIENTL: 指向standby端的Oracle Net service   
在9iR2以上版本中,Oracle首先尝试使用FAL Gap Resolution 进行GAP处理,当发现FAL机制并没有配置生效的时候,
进而尝试使用Automatic Gap Resolution进行处理。
   对于一些cascade dataguard架构,FAL Gap Resolution是更好的gap处理方式。另外,Automatic gap resolution
在某些版本的dg环境下存在bug(比如bug 5929647等),需要不得不配置FAL参数。

-----------------------------------------------------------------------------------

具体的问题信息:
1.密码问题时的报错:
主库:
Wed Dec 23 20:06:43 2015
Error 1017 received logging on to the standby
------------------------------------------------------------
Check that the primary and standby are using a password file
and remote_login_passwordfile is set to SHARED or EXCLUSIVE,
and that the SYS password is same in the password files.
      returning error ORA-16191
----------------------------------------------------------

同步了密码文件(使用RAC主库一个节点上的传输到其它节点。
--遇到过11.2.0.3时只能通过传输密码文件,通过相同命令在不同节点创建密码文件无法同步的问题。

2.FAL机制没有正常工作
同步密码文件后,出现新问题,报错如下:
--在此之前已经设置过正确 的fal_server参数:
Wed Dec 23 19:56:05 2015
ALTER SYSTEM SET fal_server='primary','primary2' SCOPE=BOTH;
备库启动日志应用后日志:
Wed Dec 23 19:11:19 2015
Media Recovery Log +DATA/hnplusdb/arch/1_15423_879093457.dbf
Media Recovery Log +DATA/hnplusdb/arch/2_12297_879093457.dbf
Media Recovery Log +DATA/hnplusdb/arch/1_15424_879093457.dbf
Media Recovery Log +DATA/hnplusdb/arch/1_15425_879093457.dbf
Media Recovery Waiting for thread 2 sequence 12298
Fetching gap sequence in thread 2, gap sequence 12298-12397
Wed Dec 23 19:13:20 2015
FAL[client]: Failed to request gap sequence
 GAP - thread 2 sequence 12298-12397
 DBID 1714301265 branch 879093457
FAL[client]: All defined FAL servers have been attempted.
------------------------------------------------------------
Wed Dec 23 20:09:19 2015
alter database recover managed standby database using current logfile disconnect from session
Attempt to start background Managed Standby Recovery process (hnplusdb1)
Wed Dec 23 20:09:19 2015
MRP0 started with pid=43, OS id=28250
MRP0: Background Managed Standby Recovery process started (hnplusdb1)
 started logmerger process
Wed Dec 23 20:09:24 2015
Managed Standby Recovery starting Real Time Apply
Parallel Media Recovery started with 64 slaves
Waiting for all non-current ORLs to be archived...
All non-current ORLs have been archived.
Wed Dec 23 20:09:25 2015
Media Recovery Waiting for thread 2 sequence 12298
Fetching gap sequence in thread 2, gap sequence 12298-12397
Completed: alter database recover managed standby database using current logfile disconnect from session
Wed Dec 23 20:11:28 2015
FAL[client]: Failed to request gap sequence
 GAP - thread 2 sequence 12298-12397
 DBID 1714301265 branch 879093457
FAL[client]: All defined FAL servers have been attempted.
----------------------------------------------------------

查找最初报错时的信息:
Tue Dec 22 19:47:00 2015
RFS[3]: No standby redo logfiles available for thread 2
RFS[3]: Opened log for thread 2 sequence 12296 dbid 1714301265 branch 879093457
Archived Log entry 13831 added for thread 2 sequence 12296 rlc 879093457 ID 0x662d8951 dest 2:
Tue Dec 22 19:48:00 2015
RFS[3]: No standby redo logfiles available for thread 2
RFS[3]: Opened log for thread 2 sequence 12297 dbid 1714301265 branch 879093457
Archived Log entry 13832 added for thread 2 sequence 12297 rlc 879093457 ID 0x662d8951 dest 2:
Tue Dec 22 19:49:00 2015
RFS[3]: No standby redo logfiles available for thread 2
Creating archive destination file : +DATA/hnplusdb/arch/2_12298_879093457.dbf (63502 blocks)
Tue Dec 22 19:49:51 2015
Unable to create archive log file '+DATA/hnplusdb/arch/2_12290_879093457.dbf'
ARC1: Error 19504 Creating archive log file to '+DATA/hnplusdb/arch/2_12290_879093457.dbf'
ARCH: Archival stopped, error occurred. Will continue retrying
ORACLE Instance hnplusdb1 - Archival Error
ORA-16038: log 12 sequence# 12290 cannot be archived
ORA-19504: failed to create file ""
ORA-00312: online log 12 thread 2: '+DATA/hnplusdb/onlinelog/slog6.log'
Tue Dec 22 19:49:51 2015
ARC3: Archiving not possible: error count exceeded
ARCH: Archival stopped, error occurred. Will continue retrying
ORACLE Instance hnplusdb1 - Archival Error


在主库进行查询,相应的日志都没有被删除,很幸运,接下来就是处理GAP的问题了。
------
通过如下反复启动日志应用无法解决的(问题关键是主库没有传输过来或者说是备库的GAP机制也无法连过去获取文件)
alter database recover managed standby database using current logfile disconnect from session
alter database recover managed standby database cancel
重启备库后解决:
Physical Standby Database mounted.
Lost write protection disabled
ARC2: Becoming the active heartbeat ARCH
ARC2: Becoming the active heartbeat ARCH
Completed: ALTER DATABASE   MOUNT
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Wed Dec 23 20:21:32 2015
Using STANDBY_ARCHIVE_DEST parameter default value as +DATA/hnplusdb/arch
Wed Dec 23 20:21:33 2015
RFS[1]: Assigned to RFS process 34331
RFS[1]: Opened log for thread 1 sequence 15578 dbid 1714301265 branch 879093457
Wed Dec 23 20:21:33 2015
RFS[2]: Assigned to RFS process 34329
RFS[2]: Opened log for thread 1 sequence 15577 dbid 1714301265 branch 879093457
RFS[3]: Assigned to RFS process 34318
RFS[3]: Opened log for thread 1 sequence 15579 dbid 1714301265 branch 879093457
Wed Dec 23 20:21:33 2015
RFS[4]: Assigned to RFS process 34320
RFS[4]: Opened log for thread 2 sequence 12300 dbid 1714301265 branch 879093457
Wed Dec 23 20:21:33 2015
RFS[5]: Assigned to RFS process 34337
RFS[5]: Opened log for thread 2 sequence 12298 dbid 1714301265 branch 879093457
Wed Dec 23 20:21:33 2015

此时主库的节点2已经传日志过来。
启动日志应用
Wed Dec 23 20:25:32 2015
alter database recover managed standby database using current logfile disconnect from session

等待主、备数据库日志同步、一致即可。




你可能感兴趣的:(一次dg 因密码文件与gap引起归档日志无法应用的处理)