Oracle RMAN Recover中使用BBED 跳过缺失的归档恢复测试

一.测试场景

Oracle RMAN 备份的恢复分2个步骤:RESTROE 和 RECOVER。
在这个过程中,Recover 是依赖与归档文件的。
假设一种情况:周一对数据库做了全备,然后保留归档。周四发现数据库有异常,准备恢复,发现周二的时候少了一个归档。 
按照正常的情况,我们只能将数据库恢复到周二缺失归档的之前的点。
这里测试,如何跳过这个缺失的归档,让数据库继续进行Recover。
根据测试结果,Recover 是可以继续,但是测试的结果意义不是很大,因为还是有数据丢失。
二、测试步骤
1.测试环境:
OS:Centos6.8
数据库:11.2.0.4
2.使用RMAN 全备数据库

cebpm:/home/oracle@cebpm>rman target /

Recovery Manager: Release 11.2.0.4.0 - Production on Wed Jan 10 09:10:31 2018

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

connected to target database: CEBPM (DBID=3677012495)

RMAN> backup database format '/data/backup/cebpm/fullback/bak_U%';

Starting backup at 2018/01/10 09:11:00
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=48 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00001 name=/data/CEBPM/datafile/o1_mf_system_dm1flxkw_.dbf
input datafile file number=00002 name=/data/CEBPM/datafile/o1_mf_sysaux_dm1fnw5v_.dbf
input datafile file number=00003 name=/data/CEBPM/datafile/o1_mf_undotbs1_dm1foow9_.dbf
input datafile file number=00005 name=/data/CEBPM/datafile/test01.dbf
input datafile file number=00004 name=/data/CEBPM/datafile/o1_mf_users_dm1fqcrp_.dbf
channel ORA_DISK_1: starting piece 1 at 2018/01/10 09:11:02
channel ORA_DISK_1: finished piece 1 at 2018/01/10 09:12:18
piece handle=/data/backup/cebpm/fullback/bak_U% tag=TAG20180110T091102 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:01:16
Finished backup at 2018/01/10 09:12:18

Starting Control File and SPFILE Autobackup at 2018/01/10 09:12:18
piece handle=/data/backup/cebpm/ctlbackup/control_c-3677012495-20180110-01 comment=NONE
Finished Control File and SPFILE Autobackup at 2018/01/10 09:12:20
3.创建测试表test并切换归档

09:16:01 SQL>  create table test (id int,times date);

Table created.

09:16:15 SQL> insert into test values(1,sysdate);

1 row created.

09:16:36 SQL> commit;

Commit complete.

09:16:38 SQL> select sequence# from v$log;

 SEQUENCE#
----------
	 4
	 5
	 3
09:17:46 SQL> alter system archive log current;

System altered.

09:18:29 SQL> select sequence# from v$log;

 SEQUENCE#
----------
	 4
	 5
	 6

09:18:43 SQL> insert into test values(2,sysdate);

1 row created.

09:19:08 SQL> commit;

Commit complete.

09:19:11 SQL> alter system archive log current;

System altered.

09:19:22 SQL> select sequence# from v$log;

 SEQUENCE#
----------
	 7
	 5
	 6
4.删除6的归档

cebpm:/data/CEBPM/archivelog@cebpm>ll
总用量 35132
-rw-r-----. 1 oracle dba   100864 12月 13 11:05 1_125_945593423.arc
-rw-r-----. 1 oracle dba     1024 12月 13 11:05 1_126_945593423.arc
-rw-r-----. 1 oracle dba   284672 12月 13 11:06 1_127_945593423.arc
-rw-r-----. 1 oracle dba   196096 12月 13 11:06 1_129_945593423.arc
-rw-r-----. 1 oracle dba  9365504 1月   5 07:35 1_1_962622394.arc
-rw-r-----. 1 oracle dba 17507328 1月  10 08:17 1_2_962622394.arc
-rw-r-----. 1 oracle dba  6851072 1月  10 08:53 1_3_962622394.arc
-rw-r-----. 1 oracle dba  1241600 1月  10 09:03 1_4_962622394.arc
-rw-r-----. 1 oracle dba   399872 1月  10 09:18 1_5_962622394.arc
-rw-r-----. 1 oracle dba     2048 1月  10 09:19 1_6_962622394.arc
-rw-r-----. 1 oracle dba     2048 1月  10 09:19 1_7_962622394.arc
cebpm:/data/CEBPM/archivelog@cebpm>rm -rf 1_6_962622394.arc 

5.然后restore和recover 操作

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup mount;
ORACLE instance started.

Total System Global Area  688959488 bytes
Fixed Size		    2256432 bytes
Variable Size		  566231504 bytes
Database Buffers	  117440512 bytes
Redo Buffers		    3031040 bytes
Database mounted.
cebpm:/home/oracle@cebpm>rman target /

Recovery Manager: Release 11.2.0.4.0 - Production on Wed Jan 10 09:21:39 2018

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

connected to target database: CEBPM (DBID=3677012495, not open)

RMAN> restore database;

Starting restore at 2018/01/10 09:21:51
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=20 device type=DISK

channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00001 to /data/CEBPM/datafile/o1_mf_system_dm1flxkw_.dbf
channel ORA_DISK_1: restoring datafile 00002 to /data/CEBPM/datafile/o1_mf_sysaux_dm1fnw5v_.dbf
channel ORA_DISK_1: restoring datafile 00003 to /data/CEBPM/datafile/o1_mf_undotbs1_dm1foow9_.dbf
channel ORA_DISK_1: restoring datafile 00004 to /data/CEBPM/datafile/o1_mf_users_dm1fqcrp_.dbf
channel ORA_DISK_1: restoring datafile 00005 to /data/CEBPM/datafile/test01.dbf
channel ORA_DISK_1: reading from backup piece /data/backup/cebpm/fullback/bak_1msoagpe_1_1
channel ORA_DISK_1: piece handle=/data/backup/cebpm/fullback/bak_1msoagpe_1_1 tag=TAG20180110T091317
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:01:15
Finished restore at 2018/01/10 09:23:07

RMAN> recover database;

Starting recover at 2018/01/10 09:23:49
using channel ORA_DISK_1

starting media recovery

archived log for thread 1 with sequence 5 is already on disk as file /data/CEBPM/archivelog/1_5_962622394.arc
archived log for thread 1 with sequence 7 is already on disk as file /data/CEBPM/archivelog/1_7_962622394.arc
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 01/10/2018 09:23:50
RMAN-06053: unable to perform media recovery because of missing log
RMAN-06025: no backup of archived log for thread 1 with sequence 6 and starting SCN of 1070624 found to restore
这个6是我们刚才手工删掉的归档。如果这个不搞定,后面没办法恢复。


6.BBED 修改SCN

(1)修改原理说明

System Checkpoint SCN:
SQL> select checkpoint_change# from v$database;

CHECKPOINT_CHANGE#
------------------
	   1070928
Datafile CheckpointSCN:
SQL> select name,checkpoint_change# from v$datafile;

NAME							     CHECKPOINT_CHANGE#
------------------------------------------------------------ ------------------
/data/CEBPM/datafile/o1_mf_system_dm1flxkw_.dbf 			1070928
/data/CEBPM/datafile/o1_mf_sysaux_dm1fnw5v_.dbf 			1070928
/data/CEBPM/datafile/o1_mf_undotbs1_dm1foow9_.dbf			1070928
/data/CEBPM/datafile/o1_mf_users_dm1fqcrp_.dbf				1070928
/data/CEBPM/datafile/test01.dbf 					1070928
START SCN:
SQL> select name,checkpoint_change# from v$datafile_header;

NAME							     CHECKPOINT_CHANGE#
------------------------------------------------------------ ------------------
/data/CEBPM/datafile/o1_mf_system_dm1flxkw_.dbf 			1070427
/data/CEBPM/datafile/o1_mf_sysaux_dm1fnw5v_.dbf 			1070427
/data/CEBPM/datafile/o1_mf_undotbs1_dm1foow9_.dbf			1070427
/data/CEBPM/datafile/o1_mf_users_dm1fqcrp_.dbf				1070427
/data/CEBPM/datafile/test01.dbf 					1070427


在数据库启动过程中,当SystemCheckpoint SCN、Datafile Checkpoint SCN和Start SCN号都相同时,数据库可以正常启动,不需要做media recovery.三者当中有一个不同时,则需要做media recovery。
如果在启动的过程中,EndSCN号为NULL,则需要做instance recovery。ORACLE在启动过程中首先检查是否需要media recovery,然后再检查是否需要instance recovery。

在进行recovery的时候,我们根据归档,推进START SCN,但是归档缺失,导致无法推荐,数据库也无法启动。
我们这里缺失的是6的归档,我们只需要手工的修改datafile header,让数据库认为这个归档已经恢复了,即可。 这是一种欺骗行为,虽然可以继续,但还是会出现问题。
可以使用如下方法确定具体缺失的归档SCN,然后使用BBED 跳过这些SCN 即可

select sequence#,first_change#,next_change# from v$archived_log;

 SEQUENCE# FIRST_CHANGE# NEXT_CHANGE#
---------- ------------- ------------
	94	  886923       887112
	95	  887112       887701
	96	  887701       887781
	97	  887781       887917
	98	  887917       888336
	99	  888336       888445
	85	  882971       883687
	86	  883687       884020
       102	  889538       889754
       102	  889538       889754
       103	  889754       889770

 SEQUENCE# FIRST_CHANGE# NEXT_CHANGE#
---------- ------------- ------------
       103	  889754       889770
       104	  889770       889779
       104	  889770       889779
       105	  889779       889784
       105	  889779       889784
       106	  889784       889806
       106	  889784       889806
       107	  889806       890028
       107	  889806       890028
       108	  890028       894208
       108	  890028       894208

 SEQUENCE# FIRST_CHANGE# NEXT_CHANGE#
---------- ------------- ------------
       109	  894208       894296
       109	  894208       894296
       110	  894296       910788
       110	  894296       910788
       111	  910788       910887
       111	  910788       910887
       112	  910887       934551
       113	  934551       934580
       112	  910887       934551
       113	  934551       934580
       114	  934580       934864

 SEQUENCE# FIRST_CHANGE# NEXT_CHANGE#
---------- ------------- ------------
       114	  934580       934864
       115	  934864       936779
       116	  936779       936788
       117	  936788       939244
       119	  977459       977469
       118	  939244       977459
       120	  977469       986086
       121	  986086       986095
       122	  986095       987949
       123	  987949       987957
       124	  987957      1008998

 SEQUENCE# FIRST_CHANGE# NEXT_CHANGE#
---------- ------------- ------------
       125	 1008998      1009546
       126	 1009546      1009554
       127	 1009554      1010197
       128	 1010197      1010205
       125	 1008998      1009546
       126	 1009546      1009554
       127	 1009554      1010197
       129	 1010205      1010738
	 1	 1010058      1041170
	 2	 1041170      1066942
	 3	 1066942      1069309

 SEQUENCE# FIRST_CHANGE# NEXT_CHANGE#
---------- ------------- ------------
	 4	 1069309      1069686
	 5	 1069686      1070624
	 6	 1070624      1070646
	 7	 1070646      1070658

103 rows selected.
这个正好与我们之前RMAN 错误一致:

archived log for thread 1 with sequence 5 is already on disk as file /data/CEBPM/archivelog/1_5_962622394.arc
archived log for thread 1 with sequence 7 is already on disk as file /data/CEBPM/archivelog/1_7_962622394.arc
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 01/10/2018 09:23:50
RMAN-06053: unable to perform media recovery because of missing log
RMAN-06025: no backup of archived log for thread 1 with sequence 6 and starting SCN of 1070624 found to restore
(2)使用BBED 推进所有DATAFILE  header SCN

关于BBED安装配置请参考:

http://blog.csdn.net/shiyu1157758655/article/details/56279479

这里,我们只修改kscnbas的值:
kscnbas (at offset 484) - SCN of lastchange to the datafile.

cebpm:/home/oracle@cebpm>bbed parfile=/home/oracle/bbed_parameter.txt 
Password: 

BBED: Release 2.0.0.0.0 - Limited Production on Wed Jan 10 09:44:16 2018

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

************* !!! For Oracle Internal Use only !!! ***************

BBED> info 
 File#  Name                                                        Size(blks)
 -----  ----                                                        ----------
     1  /data/CEBPM/datafile/o1_mf_system_dm1flxkw_.dbf                  89600
     2  /data/CEBPM/datafile/o1_mf_sysaux_dm1fnw5v_.dbf                  76800
     3  /data/CEBPM/datafile/o1_mf_undotbs1_dm1foow9_.dbf                37120
     4  /data/CEBPM/datafile/o1_mf_users_dm1fqcrp_.dbf                     640
     5  /data/CEBPM/datafile/test01.dbf                                   1280
我们需要将所有datafile 的SCN从1070624 推到1070646

SQL> select to_char(1070646,'xxxxxxxxx') from dual;

TO_CHAR(10
----------
    105636
因此我们的kscnbas 的新值是:0x00105636。
但是注意,对于little-endian的format,他存储是先存储低位的,因此实际block 存储的是:36561000.
我们需要使用BBED 将所有datafileheader 的@484 的值修改成:36561000。

BBED> d /v dba 1,1 offset 484
 File: /data/CEBPM/datafile/o1_mf_system_dm1flxkw_.dbf (1)
 Block: 1       Offsets:  484 to  995  Dba:0x00400001
-------------------------------------------------------
 5b551000 00000000 2e438539 01000000 l [U.......C.9....
 05000000 b9020000 10000000 02000000 l ....1...........
 00000000 00000000 00000000 00000000 l ................
 00000000 00000000 00000000 00000000 l ................
 00000000 00000000 00000000 00000000 l ................
 00000000 00000000 00000000 00000000 l ................
 00000000 00000000 00000000 00000000 l ................
 00000000 00000000 00000000 00000000 l ................
 00000000 00000000 00000000 00000000 l ................
 00000000 00000000 00000000 00000000 l ................
 0d000d00 0d000100 00000000 00000000 l ................
 00000000 02004000 504d0e00 00000000 l [email protected]......
 00000000 00000000 00000000 00000000 l ................
 00000000 00000000 00000000 00000000 l ................
 00000000 00000000 00000000 00000000 l ................
 00000000 00000000 00000000 00000000 l ................
 00000000 00000000 00000000 00000000 l ................
 00000000 00000000 00000000 00000000 l ................
 00000000 00000000 00000000 0217f375 l ..............
 5cbea959 0e00be60 e9a26afe 9b000000 l \??Y..?`顪t....
 00000000 00000000 00000000 00000000 l ................
 00000000 00300000 00000000 003a1817 l .....0.......:..
 f6ca8655 2395b099 11bffb0e e9010600 l ??#.°..???.
 a3f10f00 00000000 00000000 00000000 l £??...........
 00000000 00000000 00000000 00000000 l ................
 00000000 00000000 00000000 00000000 l ................
 00000000 00000000 00000000 00000000 l ................
 00000000 00000000 00000000 00000000 l ................
 00000000 00000000 00000000 00000000 l ................
 00000000 00000000 00000000 00000000 l ................
 00000000 00000000 00000000 00000000 l ................
 00000000 00000000 00000000 00000000 l ................

 <16 bytes per line>

BBED> modify /x 3656 dba 1,1 offset 484
 File: /data/CEBPM/datafile/o1_mf_system_dm1flxkw_.dbf (1)
 Block: 1                Offsets:  484 to  995           Dba:0x00400001
------------------------------------------------------------------------
 36561000 00000000 2e438539 01000000 05000000 b9020000 10000000 02000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 0d000d00 0d000100 00000000 00000000 00000000 02004000 504d0e00 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 0217f375 5cbea959 0e00be60 e9a26afe 9b000000 
 00000000 00000000 00000000 00000000 00000000 00300000 00000000 003a1817 
 f6ca8655 2395b099 11bffb0e e9010600 a3f10f00 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 

 <32 bytes per line>

BBED> sum dba 1,1 apply
Check value for File 1, Block 1:
current = 0xf385, required = 0xf385
按照同样的步骤,把剩下的4个datafile都修改

最终结果如下:

SQL>  select file#,checkpoint_change#,status from v$datafile_header;

     FILE# CHECKPOINT_CHANGE# STATUS
---------- ------------------ -------
	 1	      1070646 ONLINE
	 2	      1070646 ONLINE
	 3	      1070646 ONLINE
	 4	      1070646 ONLINE
	 5	      1070646 ONLINE
这里的datafile 的SCN 都跳过了我们缺失的归档,我们可以继续进行recover了。
(3)重新recover

RMAN> recover database;

Starting recover at 2018/01/10 13:09:33
using channel ORA_DISK_1

starting media recovery
media recovery failed
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 01/10/2018 13:09:57
ORA-00283: recovery session canceled due to errors
RMAN-11003: failure during parse/execution of SQL statement: alter database recover if needed
 start
ORA-00283: recovery session canceled due to errors
ORA-00600: internal error code, arguments: [3020], [3], [34370], [12617282], [], [], [], [], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 3, block# 34370, file offset is 281559040 bytes)
ORA-10564: tablespace UNDOTBS1
ORA-01110: data file 3: '/data/CEBPM/datafile/o1_mf_undotbs1_dm1foow9_.dbf'
ORA-10560: block type 'KTU UNDO BLOCK'
根据官网的说明,我们这是UNDO 表空间恢复无法继续了,详见:

Resolving ORA-600[3020] Raised During Recovery (文档 ID 361172.1)
尝试跳过坏块测试:

RMAN>  recover database allow 50  corruption;

Starting recover at 2018/01/10 13:10:33
using channel ORA_DISK_1

starting media recovery
media recovery complete, elapsed time: 00:00:02

Finished recover at 2018/01/10 13:10:36
恢复是没有问题,但是打开是有问题的:

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-01578: ORACLE data block corrupted (file # 3, block # 128)
ORA-01110: data file 3: '/data/CEBPM/datafile/o1_mf_undotbs1_dm1foow9_.dbf'
Process ID: 7040
Session ID: 21 Serial number: 35
(4)重建UNDO 表空间
这里里面的 3 就是我们的undo 表空间,我们把重新创建一个UNDO 在拉起数据库:

修改pfile:

SQL> create pfile from spfile;
cebpm:/data/CEBPM/archivelog@cebpm>vi /u01/app/oracle/product/11.2.0/db_1/dbs/initcebpm.ora
修改如下:
#*.undo_tablespace='UNDOTBS1'
*.undo_management='MANUAL'
*.rollback_segments='SYSTEM'
利用修改之后的pfile,重启数据库

SQL> startup pfile='/u01/app/oracle/product/11.2.0/db_1/dbs/initcebpm.ora';
ORACLE instance started.

Total System Global Area  688959488 bytes
Fixed Size		    2256432 bytes
Variable Size		  545259984 bytes
Database Buffers	  138412032 bytes
Redo Buffers		    3031040 bytes
Database mounted.
Database opened.
删除原来的表空间,创建新的UNDO 表空间

SQL> select tablespace_name from dba_tablespaces;

TABLESPACE_NAME
------------------------------
SYSTEM
SYSAUX
UNDOTBS1
TEMP
USERS
TEST

6 rows selected.

SQL> select name from v$datafile;

NAME
--------------------------------------------------------------------------------
/data/CEBPM/datafile/o1_mf_system_dm1flxkw_.dbf
/data/CEBPM/datafile/o1_mf_sysaux_dm1fnw5v_.dbf
/data/CEBPM/datafile/o1_mf_undotbs1_dm1foow9_.dbf
/data/CEBPM/datafile/o1_mf_users_dm1fqcrp_.dbf
/data/CEBPM/datafile/test01.dbf

SQL> drop tablespace UNDOTBS1;

Tablespace dropped.

SQL> create undo tablespace UNDOTBS1 datafile '/data/CEBPM/datafile/undotbs01.dbf' size 100M autoextend on next 32M maxsize  unlimited;

Tablespace created.
关闭数据库,修改pfile参数,然后用新的pfile创建spfile,在正常启动数据库。
*.undo_tablespace='UNDOTBS1'
#*.undo_management='MANUAL'
#*.rollback_segments='SYSTEM'
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> create spfile from pfile;

File created.

SQL> startup
ORACLE instance started.

Total System Global Area  688959488 bytes
Fixed Size		    2256432 bytes
Variable Size		  566231504 bytes
Database Buffers	  117440512 bytes
Redo Buffers		    3031040 bytes
Database mounted.
Database opened.
(5)验证

SQL> col name for a60
SQL> select name,checkpoint_change# from v$datafile;

NAME							     CHECKPOINT_CHANGE#
------------------------------------------------------------ ------------------
/data/CEBPM/datafile/o1_mf_system_dm1flxkw_.dbf 			1131748
/data/CEBPM/datafile/o1_mf_sysaux_dm1fnw5v_.dbf 			1131748
/data/CEBPM/datafile/undotbs01.dbf					1131748
/data/CEBPM/datafile/o1_mf_users_dm1fqcrp_.dbf				1131748
/data/CEBPM/datafile/test01.dbf 					1131748

SQL> select name,checkpoint_change# from v$datafile_header;

NAME							     CHECKPOINT_CHANGE#
------------------------------------------------------------ ------------------
/data/CEBPM/datafile/o1_mf_system_dm1flxkw_.dbf 			1131748
/data/CEBPM/datafile/o1_mf_sysaux_dm1fnw5v_.dbf 			1131748
/data/CEBPM/datafile/undotbs01.dbf					1131748
/data/CEBPM/datafile/o1_mf_users_dm1fqcrp_.dbf				1131748
/data/CEBPM/datafile/test01.dbf 					1131748

SQL> select checkpoint_change# from v$database;

CHECKPOINT_CHANGE#
------------------
	   1131748

SQL> conn test/test
Connected.
SQL> select * from test;

	ID TIMES
---------- ---------
	 1 10-JAN-18
如上我们数据库已经正常启动,但是测试表里之前是有2条数据,由于跳过丢失的归档现在只有1条数据。





你可能感兴趣的:(Oracle学习笔记)