DG目前正常运行在最大可用模式
注:Failover 是失败切换。 这种情况下切换对redo 的处理,就显的很重要。如果处理好,就不会有数据丢失。 否则就会有数据丢失。
在Oracle 11g里,Data Guard 切换多了一个新的功能:flush redo。---我这里未使用。
Flush 能把没有发送的redo 从主库传送到standby库。 只要主库能启动到mount 状态,那么Flush 就可以把没有发送的归档和current online redo 发送到备库。
Flush语法:SQL> ALTER SYSTEM FLUSH REDO TO target_db_name;
这里的target_db_name 是我们在主库的db_unique_name 名称。 也就是在tnsnames.ora 文件配置的。 Flush 会将未发送的redo 从主库传到备库,并且等待redo 在standby 库上apply 之后返回成功。 所以只要Flush成功,那么Failover 就没有数据丢失。
如果说我们的Primary 已经不能启动到mount 状态,那么就只能按照下面的方法来手动复制相应归档及REDO文件。
环境:
DG1-主库
[oracle@dg1 ~]$ sqlplus / as sysdba
SYS@dg1>select open_mode from v$database;
OPEN_MODE
--------------------
READ WRITE
SYS@dg1>select protection_mode,database_role from v$database;
PROTECTION_MODE DATABASE_ROLE
-------------------- ----------------
MAXIMUM AVAILABILITY PRIMARY
DG2-备库
SYS@dg2>select protection_mode,database_role from v$database;
PROTECTION_MODE DATABASE_ROLE
-------------------- ----------------
MAXIMUM AVAILABILITY PHYSICAL STANDBY
SYS@dg2>select open_mode from v$database;
OPEN_MODE
--------------------
MOUNTED
1.在主库切换日志,备库可以正常应用归档。
---模拟failover的话这一步可以不做的,只是证明下DG工作正常。
DG1
SYS@dg1>alter system switch logfile;
System altered.
SYS@dg1>col name for a40
SYS@dg1>select * from (select name,recid from v$archived_log order by recid desc) where rownum<5;
NAME RECID
---------------------------------------- ----------
/u01/archivelog/arc_1_159_821829622.arc 310
dg2 309
308
dg2 307
SYS@dg1>select max(sequence#) from v$archived_log;
MAX(SEQUENCE#)
--------------
159
SYS@dg2>select max(sequence#) from v$archived_log;
MAX(SEQUENCE#)
--------------
159
SYS@dg2>col name for a40
SYS@dg2>select * from (select name,recid from v$archived_log order by recid desc) where rownum<5;
NAME RECID
---------------------------------------- ----------
/u01/archivelog/arc_1_159_821829622.arc 166
165
164
163
日志:
[oracle@dg1 ~]$ cat alert_dg.log
Tue Aug 06 09:12:50 2013
LGWR: Standby redo logfile selected to archive thread 1 sequence 160
LGWR: Standby redo logfile selected for thread 1 sequence 160 for destination LOG_ARCHIVE_DEST_2
Thread 1 advanced to log sequence 160 (LGWR switch)
Current log# 1 seq# 160 mem# 0: /u01/oradata/dg/redo01.log
Tue Aug 06 09:12:51 2013
Archived Log entry 310 added for thread 1 sequence 159 ID 0x6776473b dest 1:
[oracle@dg2 ~]$ cat alert_dg.log
Tue Aug 06 09:12:50 2013
Standby controlfile consistent with primary
Tue Aug 06 09:12:50 2013
Media Recovery Waiting for thread 1 sequence 160 (in transit)
RFS[1]: Selected log 5 for thread 1 sequence 160 dbid 1735160627 branch 821829622
Tue Aug 06 09:12:51 2013
Archived Log entry 166 added for thread 1 sequence 159 ID 0x6776473b dest 1:
Recovery of Online Redo Log: Thread 1 Group 5 Seq 160 Reading mem 0
Mem# 0: /u01/oradata/dg/standbyredo05.log
2.断开备库DG2的网络,同时在DG1上切换日志来触发归档,然后shutdown abort。
DG1:
SYS@dg1>alter system switch logfile;
System altered.
SYS@dg1>alter system switch logfile;
System altered.
SYS@dg1>select * from (select name,recid from v$archived_log order by recid desc) where rownum<5;
NAME RECID
---------------------------------------- ----------
/u01/archivelog/arc_1_161_821829622.arc 312
/u01/archivelog/arc_1_160_821829622.arc 311
/u01/archivelog/arc_1_159_821829622.arc 310
dg2 309
在表里插入一条数据,测试REDO能否恢复。
SYS@dg1>conn bys/bys
Connected.
BYS@dg1>select * from test;
A
----------
999
BYS@dg1>insert into test values(888);
1 row created.
BYS@dg1>commit;
Commit complete.
BYS@dg1>select * from test;
A
----------
999
888
BYS@dg1>conn / as sysdba
Connected.
SYS@dg1>shutdown abort;
ORACLE instance shut down.
日志:
[oracle@dg1 ~]$ cat alert_dg.log
Tue Aug 06 09:12:51 2013
Archived Log entry 310 added for thread 1 sequence 159 ID 0x6776473b dest 1:
Tue Aug 06 09:19:40 2013
ORA-16198: LGWR received timedout error from KSR
LGWR: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (16198)
LGWR: Destination LOG_ARCHIVE_DEST_2 network reconnect abandoned----从这一句也能判断出是DG2网络有问题
Errors in file /u01/diag/rdbms/dg1/dg/trace/dg_lgwr_6182.trc:
ORA-16198: Timeout incurred on internal channel during remote archival
Error 16198 for archive log file 1 to 'dg2'
Destination LOG_ARCHIVE_DEST_2 is UNSYNCHRONIZED
LGWR: Failed to archive log 1 thread 1 sequence 160 (16198)
Thread 1 advanced to log sequence 161 (LGWR switch)
Current log# 2 seq# 161 mem# 0: /u01/oradata/dg/redo02.log
Tue Aug 06 09:19:45 2013
Archived Log entry 311 added for thread 1 sequence 160 ID 0x6776473b dest 1:
Tue Aug 06 09:22:27 2013------------------------从这次日志可以判断,161号归档文件已经归档完成。当前使用的是redo03.log,对应的归档日志162号。
Thread 1 advanced to log sequence 162 (LGWR switch)
Current log# 3 seq# 162 mem# 0: /u01/oradata/dg/redo03.log
Tue Aug 06 09:22:27 2013
Archived Log entry 312 added for thread 1 sequence 161 ID 0x6776473b dest 1:
Tue Aug 06 09:23:43 2013
Shutting down instance (abort)
License high water mark = 11
USER (ospid: 7015): terminating the instance
Instance terminated by USER, pid = 7015
Tue Aug 06 09:23:44 2013
Instance shutdown complete
3.打开备库dg2的网络并查询相关状态
[oracle@dg2 ~]$ sqlplus / as sysdba
SYS@dg2>select max(sequence#) from v$archived_log;
MAX(SEQUENCE#)
--------------
159
SYS@dg2>col name for a40
SYS@dg2>select * from (select name,recid from v$archived_log order by recid desc) where rownum<5;
NAME RECID
---------------------------------------- ----------
/u01/archivelog/arc_1_159_821829622.arc 166
/u01/archivelog/arc_1_158_821829622.arc 165
/u01/archivelog/arc_1_157_821829622.arc 164
/u01/archivelog/arc_1_156_821829622.arc 163
4.从主库将未传到备库的归档及REDO发送过去
[oracle@dg1 ~]$ cd /u01/archivelog/
[oracle@dg1 archivelog]$ ll -t
total 3472
-rw-r----- 1 oracle oinstall 91648 Aug 6 09:22 arc_1_161_821829622.arc
-rw-r----- 1 oracle oinstall 2673152 Aug 6 09:19 arc_1_160_821829622.arc
-rw-r----- 1 oracle oinstall 772608 Aug 6 09:12 arc_1_159_821829622.arc
[oracle@dg1 archivelog]$ cd /u01/oradata/dg
[oracle@dg1 dg]$ ls
control01.ctl standbyredo04.log sysaux01.dbf
example01.dbf standbyredo05.log system01.dbf
redo01.log standbyredo06.log temp01.dbf
redo02.log standbyredo07.log undotbs01.dbf
redo03.log standctl01.ctl users01.dbf
[oracle@dg1 dg]$
scp redo0* dg2:/u01/archivelog/
oracle@dg2's password:
redo01.log 100% 50MB 10.0MB/s 00:05
redo02.log 100% 50MB 7.1MB/s 00:07
redo03.log 100% 50MB 6.3MB/s 00:08
[oracle@dg1 dg]$ cd /u01/archivelog/
[oracle@dg1 archivelog]$
scp arc_1_160_821829622.arc arc_1_161_821829622.arc dg2:/u01/archivelog
oracle@dg2's password:
arc_1_160_821829622.a 100% 2611KB 2.6MB/s 00:00
arc_1_161_821829622.a 100% 90KB 89.5KB/s 00:00
5.备库上注册复制来的归档日志--解决GAP问题。
注意注意注意:
此时备库打开应用方式如果是alter database recover managed standby database disconnect from session;的话,会自动应用归档日志。
如果是alter database recover managed standby database using current logfile disconnect from session;不会自动应用归档日志。
SYS@dg2>alter database register logfile '/u01/archivelog/arc_1_160_821829622.arc';
Database altered.
SYS@dg2>alter database recover managed standby database cancel;
Database altered.
SYS@dg2>alter database recover managed standby database disconnect from session;
Database altered.
SYS@dg2>alter database register logfile '/u01/archivelog/arc_1_161_821829622.arc';
Database altered.
SYS@dg2>alter database recover managed standby database cancel;
Database altered.
SYS@dg2>select max(sequence#) from v$archived_log;
MAX(SEQUENCE#)
--------------
161 -------------------------已经 恢复到162号。
对应的日志:
Tue Aug 06 09:19:43 2013
RFS[1]: Possible network disconnect with primary database
Tue Aug 06 09:31:22 2013-------------------- 这里就是成功注册了归档日志,但是没有自动应用。
alter database register logfile '/u01/archivelog/arc_1_160_821829622.arc'
There are 1 logfiles specified.
ALTER DATABASE REGISTER [PHYSICAL] LOGFILE
Completed: alter database register logfile '/u01/archivelog/arc_1_160_821829622.arc'
Tue Aug 06 09:32:19 2013
alter database recover managed standby database cancel
Tue Aug 06 09:32:19 2013
MRP0: Background Media Recovery cancelled with status 16037
Errors in file /u01/diag/rdbms/dg2/dg/trace/dg_mrp0_6370.trc:
ORA-16037: user requested cancel of managed recovery operation
Managed Standby Recovery not using Real Time Apply-----------------------------------
没有自动应用。
Recovery interrupted!
Recovered data files to a consistent state at change 1639340
Errors in file /u01/diag/rdbms/dg2/dg/trace/dg_mrp0_6370.trc:
ORA-16037: user requested cancel of managed recovery operation
MRP0: Background Media Recovery process shutdown (dg)
Waiting for MRP0 pid 6370 to terminate
Managed Standby Recovery Canceled (dg)
Completed: alter database recover managed standby database cancel
T
ue Aug 06 09:32:41 2013
alter database recover managed standby database disconnect from session -----使用这种方式开始恢复
Attempt to start background Managed Standby Recovery process (dg)
Tue Aug 06 09:32:41 2013
MRP0 started with pid=26, OS id=7093
MRP0: Background Managed Standby Recovery process started (dg)
Serial Media Recovery started
Managed Standby Recovery not using Real Time Apply
Waiting for all non-current ORLs to be archived...
All non-current ORLs have been archived.
Media Recovery Log /u01/archivelog/arc_1_160_821829622.arc
Media Recovery Waiting for thread 1 sequence 161
Completed: alter database recover managed standby database disconnect from session
Tue Aug 06 09:33:14 2013- -----------可以看到新注册的归档日志自动进行了恢复应用。
alter database register logfile '/u01/archivelog/arc_1_161_821829622.arc'
There are 1 logfiles specified.
ALTER DATABASE REGISTER [PHYSICAL] LOGFILE
Resynchronizing thread 1 from sequence 160 to 161
Completed: alter database register logfile '/u01/archivelog/arc_1_161_821829622.arc'
Tue Aug 06 09:33:18 2013
Media Recovery Log /u01/archivelog/arc_1_161_821829622.arc
Media Recovery Waiting for thread 1 sequence 162
6.使用主库的当前REDO恢复,failover并打开数据库。验证数据是否完整。
SYS@dg2>recover standby database until cancel;
ORA-00279: change 1639530 generated at 08/06/2013 09:22:27 needed for thread 1
ORA-00289: suggestion : /u01/archivelog/arc_1_162_821829622.arc
ORA-00280: change 1639530 for thread 1 is in sequence #162
Specify log: {
=suggested | filename | AUTO | CANCEL}
/u01/archivelog/redo03.log --------------输入日志名(从主库传送来的主库当前REDO),前面DG1 的日志中有提到。
Log applied.
Media recovery complete.
SYS@dg2>alter database
activate standby database;
Database altered.
SYS@dg2>alter database open;
Database altered.
SYS@dg2>select protection_mode,database_role,open_mode from v$database;
PROTECTION_MODE DATABASE_ROLE OPEN_MODE
-------------------- ---------------- --------------------
MAXIMUM AVAILABILITY PRIMARY READ WRITE
查看DG1崩溃前的DML操作,数据无丢失。
SYS@dg2>conn bys/bys
Connected.
BYS@dg2>select * from test;
A
----------
999
888
7.相关日志:
Tue Aug 06 09:36:28 2013
ALTER DATABASE RECOVER standby database until cancel
Media Recovery Start
Serial Media Recovery started
Managed Standby Recovery not using Real Time Apply
ORA-279 signalled during: ALTER DATABASE RECOVER standby database until cancel ...
Tue Aug 06 09:36:55 2013
ALTER DATABASE RECOVER CANCEL
Media Recovery Canceled
Completed: ALTER DATABASE RECOVER CANCEL
Tue Aug 06 09:37:31 2013
ALTER DATABASE RECOVER standby database until cancel
Media Recovery Start
Serial Media Recovery started
Managed Standby Recovery not using Real Time Apply
ORA-279 signalled during: ALTER DATABASE RECOVER standby database until cancel ...
Tue Aug 06 09:37:51 2013
ALTER DATABASE RECOVER LOGFILE '/u01/archivelog/redo03.log'
Media Recovery Log /u01/archivelog/redo03.log
Incomplete recovery applied all redo ever generated.
Recovery completed through change 1639636 time 08/06/2013 09:23:42
Media Recovery Complete (dg)
Completed: ALTER DATABASE RECOVER LOGFILE '/u01/archivelog/redo03.log'
Tue Aug 06 09:43:40 2013
alter database activate standby database
ALTER DATABASE ACTIVATE [PHYSICAL] STANDBY DATABASE (dg)
krsv_proc_kill: Killing 2 processes (all RFS)
Begin: Standby Redo Logfile archival
End: Standby Redo Logfile archival
RESETLOGS after complete recovery through change 1639636
Resetting resetlogs activation ID 1735804731 (0x6776473b)
Online log /u01/oradata/dg/redo01.log: Thread 1 Group 1 was previously cleared
Online log /u01/oradata/dg/redo02.log: Thread 1 Group 2 was previously cleared
Online log /u01/oradata/dg/redo03.log: Thread 1 Group 3 was previously cleared
Standby became primary SCN: 1639634
Tue Aug 06 09:43:40 2013
Setting recovery target incarnation to 4
ACTIVATE STANDBY: Complete - Database mounted as primary
Completed: alter database activate standby database