今天发生一起DataGuard 归档无法同步的情况,查看主库LNS进程存在表示,主库正常向备库传送归档日志:
SQL> SELECT PROCESS,PID,GROUP#,RESETLOG_ID,THREAD#,SEQUENCE#,BLOCK#,BLOCKS,DELAY_MINS,KNOWN_AGENTS FROM V$MANAGED_STANDBY; PROCESS PID GROUP# RESETLOG_ID THREAD# SEQUENCE# BLOCK# BLOCKS DELAY_MINS KNOWN_AGENTS --------- ---------- ---------- ----------- ---------- ---------- ---------- ---------- ---------- ------------ ARCH 8067 1 845994484 1 23362 106496 685 0 0 ARCH 8069 3 845994484 1 23360 149504 1659 0 0 ARCH 8071 4 845994484 1 21741 1 2 0 0 ARCH 8073 4 845994484 1 23361 133120 1223 0 0 LNS 8075 2 845994484 1 23363 117792 1 0 0
但备库的MRP进程不在,判断是否存在归档的GAP:
SQL> SELECT THREAD#,LOW_SEQUENCE#,HIGH_SEQUENCE# FROM V$ARCHIVE_GAP; no rows selected
不存在GAP,手动启动MRP进程,仍无法启动查看告警日志,找到关键内容如下:
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION
Attempt to start background Managed Standby Recovery process (sss) Thu Oct 09 11:38:58 2014 MRP0 started with pid=30, OS id=102010 MRP0: Background Managed Standby Recovery process started (sss) started logmerger process Thu Oct 09 11:39:03 2014 Managed Standby Recovery not using Real Time Apply MRP0: Background Media Recovery terminated with error 1111 Errors in file /dba/oracle/diag/rdbms/sss/sss/trace/sss_pr00_102070.trc: ORA-01111: name for data file 12 is unknown - rename to correct file ORA-01110: data file 12: '/u01/oracle/product/11.2.0.3.0/dbs/UNNAMED00012' ORA-01157: cannot identify/lock data file 12 - see DBWR trace file ORA-01111: name for data file 12 is unknown - rename to correct file ORA-01110: data file 12: '/u01/oracle/product/11.2.0.3.0/dbs/UNNAMED00012' Slave exiting with ORA-1111 exception Errors in file /dba/oracle/diag/rdbms/sss/sss/trace/sss_pr00_102070.trc: ORA-01111: name for data file 12 is unknown - rename to correct file ORA-01110: data file 12: '/u01/oracle/product/11.2.0.3.0/dbs/UNNAMED00012' ORA-01157: cannot identify/lock data file 12 - see DBWR trace file ORA-01111: name for data file 12 is unknown - rename to correct file ORA-01110: data file 12: '/u01/oracle/product/11.2.0.3.0/dbs/UNNAMED00012' Recovery Slave PR00 previously exited with exception 1111 MRP0: Background Media Recovery process shutdown (sss)
可以看到数据库尝试启动MRP进程,但后台进行介质恢复时,被错误1111码终端(即ORA-01111),ORA-01111提示数据文件12是未知的。而后续的ORA-01110报错显示该数据文件的位置应该是$ORACLE_HOME/UNNAMED00012文件。
实际在该路径下是查找不到该文件的:
find /u01/oracle/product/11.2.0.3.0/dbs/ -name 'UNNAMED00012'
那么问题就来了,为什么数据库需要找到该文件进行恢复?
我们都清楚DG的备库是通过归档日志进行恢复的。在归档获取正确的情况下,会把主库的对数据的更新内容都传递到备库进行应用。也就是说上面报错在于,主库传过来一条更新记录,对于备库是无法判断的。
通过网上查找相关答案,发现原来当主库异常宕机重启之后,数据库会进行自动恢复,也就是Instance Recovery,这部分缺失的数据被记录再Redo之中,在异常关闭后,传输到备库的归档中并不包含这部分内容,而主库通过一个临时的数据文件(UNNAMED命名方式)恢复后,这部分被恢复的记录在后续的归档中被传输到备库,当备库恢复到这个归档时,备库无法自动去创建这个UNNAMED临时数据文件。
解决方式:
停止备库归档应用(实际已停止,非必要),同步归档将备库归档自动管理该为手动,手工创建该数据文件,启动归档应用进程,将归档管理由手动转自动。
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL; SQL> ALTER SYSTEM SET STANDBY_FILE_MANAGEMENT=MANUAL SCOPE=MEMEORY; SQL> ALTER DATABASE CREATE DATAFILE '/u01/oracle/product/11.2.0.3.0/dbs/UNNAMED00012' as '/u01/oradata/sss/sys07.dbf' SQL> ALTER SYSTEM SET standby_file_management='AUTO' SCOPE=MEMORY; SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION
到这里查看告警日志:
Attempt to start background Managed Standby Recovery process (sss) Thu Oct 09 11:41:08 2014 MRP0 started with pid=30, OS id=102513 MRP0: Background Managed Standby Recovery process started (sss) started logmerger process Thu Oct 09 11:41:14 2014 Managed Standby Recovery not using Real Time Apply Parallel Media Recovery started with 24 slaves Waiting for all non-current ORLs to be archived... All non-current ORLs have been archived. Thu Oct 09 11:41:14 2014 Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION Media Recovery Log /u01/sss/SSS/archivelog/2014_10_03/o1_mf_1_13523_b2wzmf1b_.arc Thu Oct 09 11:41:36 2014
告警日志已经说明归档已经正常应用了,如果不放心也可以查看一下V$MANAGED_STANDBY视图,确认一下MRP进程是否启用。
这里我还有一个疑问,就是我手工创建的这个数据文件,等数据库恢复以后是否可以删除呢?