故障描述
(1)存储故障导致ASM磁盘丢失。
(2)CRS因为OCR和VOTEDISK的丢失,除了OHAS还联机外,CLUSTERWARE服务都已经停止
操作步骤
一、恢复OCR和VOTEDISK
(1) 在所有RAC节点上停止CRS服务
[root@node1 ~]crsctl stop has -f
[root@node2 ~]crsctl stop has -f
(2) 在一个节点上以NOCRS方式启动CRS,此操作会启动ASM实例。
root@node1 ~]crsctl start crs -excl -nocrs
(3)之前已创建asm磁盘,查看磁盘状态。
[grid@node1 ~]sqlplus /as sysasm
SQL>select group_number group#, disk_number disk#, OS_MB, state, path, header_status from v$asm_disk order by 1,2;
(4) 创建三个磁盘组,OCR_VOTE给CRS使用,用于存放OCR,VOTEDISK和ASM实例的SPFILE。其余两个给ORACLE使用,ASM_DATA用于存放datafile,
controlfile,redolog,spfile;ASM_FRA存放archivelog。
SQL> create diskgroup OCR_VOTE external redundancy
2 disk 'OCRL:OCR_VOTE1' //OCRL:OCR_VOTE1这是在前一步骤中查看到的磁盘路径path
3 disk 'OCRL:OCR_VOTE2'
4 ATTRIBUTE 'compatible.rdbms' = '11.2','compatible.asm' = '11.2';
Diskgroup created.
SQL> create diskgroup ASM_DATA external redundancy
2 disk 'OCRL:ASM_DATA1'
3 disk 'OCRL:ASM_DATA2'
4 disk 'OCRL:ASM_DATA3'
5 ATTRIBUTE 'compatible.rdbms' = '11.2','compatible.asm' = '11.2';
Diskgroup created.
SQL> create diskgroup ASM_FRA external redundancy
2 disk 'OCRL:ASM_FRA1'
2 disk 'OCRL:ASM_FRA2'
3 ATTRIBUTE 'compatible.rdbms' = '11.2','compatible.asm' = '11.2';
Diskgroup created.
(5) 准备恢复OCR和VOTEDISK,/etc/oracle/ocr.loc中记录了OCR路径,修改ocrconfig_loc的值,以便将OCR恢复到新的磁盘组中。
[root@rac1 ~]# more /etc/oracle/ocr.loc
ocrconfig_loc=+DATA
local_only=FALSE
[root@rac1 ~]# vi /etc/oracle/ocr.loc
ocrconfig_loc=+SYSTEMDG
local_only=FALSE
(6) 恢复OCR
[root@rac1 ~]# ocrconfig -restore /u01/app/11.2.0/grid/cdata/rac-cluster/backup00.ocr
[root@rac1 ~]#
[root@rac1 ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 2840
Available space (kbytes) : 259280
ID : 59415097
Device/File Name : +SYSTEMDG
Device/File integrity check succeeded
Device/File not configured
Device/File not configured
Device/File not configured
Device/File not configured
Cluster registry integrity check succeeded
Logical corruption check succeeded
(7) 创建VOTEDISK
[root@rac1 init]# crsctl replace votedisk +OCR_VOTE
Successful addition of voting disk 8ebb7a63accb4fa8bfa7ab65df7a8c8a.
Successfully replaced voting disk group with +OCR_VOTE.
CRS-4266: Voting file(s) successfully replaced
(8) OCR和VOTEDISK都恢复完成后,重启CRS到正常模式。
[root@rac1 ~]# crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
[root@rac1 ~]# crsctl check crs (如以下资源没在线,请稍等或reboot重启下系统)
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
二、修改CRS注册表中相关配置信息
(1) 挂载新的ASM磁盘组
[grid@rac1 ~]$ sqlplus / as sysasm
SQL*Plus: Release 11.2.0.3.0 Production on Sat Jul 6 00:16:05 2013
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
SQL>
SQL> select name,state from v$asm_diskgroup;
NAME STATE
------------------------------ -----------
SYSTEMDG MOUNTED
ARCLOGDG DISMOUNTED
DATADG DISMOUNTED
SQL> alter diskgroup ARCLOGDG,DATADG mount;
Diskgroup altered.
(2) 更改CRS配置文件中数据库的磁盘组为DATADG和ARCLOGDG
[root@rac1 ~]# srvctl modify database -d csdb -a "DATADG,ARCLOGDG"
(3) 禁用并删除原来的磁盘组DATA
[root@rac1 ~]# srvctl disable diskgroup -g DATA
[root@rac1 ~]# srvctl remove diskgroup -g DATA
[root@rac1 rac-cluster]# crs_stat -t -v
Name Type R/RA F/FT Target State Host
----------------------------------------------------------------------
ora....OGDG.dg ora....up.type 0/5 0/ ONLINE ONLINE rac1
ora.DATADG.dg ora....up.type 0/5 0/ ONLINE ONLINE rac1
ora....ER.lsnr ora....er.type 0/5 0/ ONLINE ONLINE rac1
ora....N1.lsnr ora....er.type 0/5 0/0 ONLINE ONLINE rac1
ora.asm ora.asm.type 0/5 0/ ONLINE ONLINE rac1
ora.csdb.db ora....se.type 0/2 0/1 ONLINE OFFLINE
ora.cvu ora.cvu.type 0/5 0/0 ONLINE ONLINE rac1
ora.gsd ora.gsd.type 0/5 0/ OFFLINE OFFLINE
ora....network ora....rk.type 0/5 0/ ONLINE ONLINE rac1
ora.oc4j ora.oc4j.type 0/1 0/2 ONLINE ONLINE rac1
ora.ons ora.ons.type 0/3 0/ ONLINE ONLINE rac1
ora....SM1.asm application 0/5 0/0 ONLINE ONLINE rac1
ora....C1.lsnr application 0/5 0/0 ONLINE ONLINE rac1
ora.rac1.gsd application 0/5 0/0 OFFLINE OFFLINE
ora.rac1.ons application 0/3 0/0 ONLINE ONLINE rac1
ora.rac1.vip ora....t1.type 0/0 0/0 ONLINE ONLINE rac1
ora.rac2.vip ora....t1.type 0/0 0/0 ONLINE ONLINE rac1
ora.scan1.vip ora....ip.type 0/0 0/0 ONLINE ONLINE rac1
(4) 在OCR注册表中修改Oracle数据库参数文件的位置
[root@rac1 ~]# srvctl modify database -d csdb -p +DATADG/csdb/spfilecsdb.ora
三、恢复数据库
(1) 查看备份文件路径和名称
[root@rac1 ~]# su - oracle
[oracle@rac1 ~]$
[oracle@rac1 ~]$ cd /u01/app/oracle/backup
[oracle@rac1 backup]$ ll
total 221796
-rw-r----- 1 oracle asmadmin 5357568 Jul 5 15:19 arc_819991156_9.bk
-rw-r----- 1 oracle asmadmin 2560 Jul 5 15:19 arc_819991158_11.bk
-rw-r----- 1 oracle asmadmin 203104256 Jul 5 15:18 CSDB_819991120_5.bk
-rw-r----- 1 oracle asmadmin 18546688 Jul 5 15:19 ctl_file_0coe04jq_1_1_20130705.ctl
-rw-r----- 1 oracle asmadmin 98304 Jul 5 15:19 spfile_0doe04js_1_1_20130705
[oracle@rac1 backup]$
(2) 创建一个基本的启动参数文件,以便启动数据库到nomout状态恢复spfile
[oracle@rac1 ~]$ touch /u01/app/oracle/backup/init.ora
[oracle@rac1 ~]$ vi /u01/app/oracle/backup/init.ora
*.db_name='csdb'
*.remote_login_passwordfile='exclusive'
(3) 使用刚创建的参数文件将数据库启动到nomount状态
[oracle@rac1 ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.3.0 Production on Sat Jul 6 13:56:06 2013
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup nomount pfile='/u01/app/oracle/backup/init.ora';
ORACLE instance started.
Total System Global Area 238034944 bytes
Fixed Size 2227136 bytes
Variable Size 180356160 bytes
Database Buffers 50331648 bytes
Redo Buffers 5120000 bytes
SQL>
(4) 使用RMAN恢复SPFILE到ASM磁盘组DATADG
[oracle@rac1 ~]$ rman target /
Recovery Manager: Release 11.2.0.3.0 - Production on Sat Jul 6 13:59:26 2013
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
connected to target database: CSDB (not mounted)
RMAN> restore spfile to '+DATADG/csdb/spfilecsdb.ora' from '/u01/app/oracle/backup/spfile_0doe04js_1_1_20130705';
Starting restore at 06-JUL-13
using channel ORA_DISK_1
channel ORA_DISK_1: restoring spfile from AUTOBACKUP /u01/app/oracle/backup/spfile_0doe04js_1_1_20130705
channel ORA_DISK_1: SPFILE restore from AUTOBACKUP complete
Finished restore at 06-JUL-13
(5) 使用恢复后spfile启动数据库,并修改control_files,db_recovery_file_dest,log_archive_dest等存在旧路径的参数值。
[oracle@rac1 ~]$ vi $ORACLE_HOME/dbs/initcsdb1.ora
SPFILE='+DATADG/csdb/spfilecsdb.ora'
[oracle@rac1 ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.3.0 Production on Sat Jul 6 14:11:58 2013
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
SQL> startup nomount force
ORACLE instance started.
Total System Global Area 1653518336 bytes
Fixed Size 2228904 bytes
Variable Size 1073745240 bytes
Database Buffers 570425344 bytes
Redo Buffers 7118848 bytes
SQL>
SQL> show parameter control_files
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
control_files string +DATA/csdb/control01.ctl, +DAT
A/csdb/control02.ctl
SQL>
SQL> alter system set control_files='+DATADG/csdb/control01.ctl','+DATADG/csdb/control02.ctl' scope=spfile
System altered.
SQL> alter system set db_recovery_file_dest='+DATADG' scope=spfile;
System altered.
SQL> alter system set log_archive_dest_1='LOCATION=+ARCLOGDG' scope=spfile;
System altered.
SQL> startup force nomount;
ORACLE instance started.
Total System Global Area 1653518336 bytes
Fixed Size 2228904 bytes
Variable Size 1073745240 bytes
Database Buffers 570425344 bytes
Redo Buffers 7118848 bytes
(6) 查看数据库的DBID
[oracle@rac1 ~]$ strings /u01/app/oracle/backup/CSDB_819991120_5.bk | grep MAXVALUE,
返回的值类似下面的例子,其中那一窜数字即为DBID。
...
MAXVALUE, MAXVALUE!
3042905279, MAXVALUE,
3042905279, MAXVALUE,
...
(7) 恢复控制文件到新的ASM磁盘组DATADG
[oracle@rac1 ~]$ rman target /
Recovery Manager: Release 11.2.0.3.0 - Production on Sat Jul 6 14:28:28 2013
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
connected to target database: CSDB (not mounted)
RMAN> set dbid=3042905279
executing command: SET DBID
RMAN> restore controlfile from '/u01/app/oracle/backup/ctl_file_0coe04jq_1_1_20130705.ctl';
Starting restore at 06-JUL-13
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=18 instance=csdb1 device type=DISK
channel ORA_DISK_1: restoring control file
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
output file name=+DATADG/csdb/control01.ctl
output file name=+DATADG/csdb/control02.ctl
Finished restore at 06-JUL-13
(8) 进入SQLPLUS,查看旧数据文件信息
[oracle@rac1 ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.3.0 Production on Sat Jul 6 14:41:12 2013
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SQL> alter database mount;
Database altered.
SQL> col name format a50
SQL> select file#,name from v$datafile;
FILE# NAME
---------- --------------------------------------------------
1 +DATA/csdb/datafile/system.260.819979847
2 +DATA/csdb/datafile/sysaux.261.819979871
3 +DATA/csdb/datafile/undotbs1.262.819979889
4 +DATA/csdb/datafile/undotbs2.264.819979905
5 +DATA/csdb/datafile/users.265.819979913
(9)使用RMAN恢复数据库
RMAN> run{
2> set newname for datafile 1 to '+DATADG/csdb/datafile/system.260.819979847';
3> set newname for datafile 2 to '+DATADG/csdb/datafile/sysaux.261.819979871';
4> set newname for datafile 3 to '+DATADG/csdb/datafile/undotbs1.262.819979889';
5> set newname for datafile 4 to '+DATADG/csdb/datafile/undotbs2.264.819979905';
6> set newname for datafile 5 to '+DATADG/csdb/datafile/users.265.819979913';
7> restore database;
8> switch datafile all;
9> recover database;
10> }
executing command: SET NEWNAME
released channel: ORA_DISK_1
executing command: SET NEWNAME
executing command: SET NEWNAME
executing command: SET NEWNAME
executing command: SET NEWNAME
Starting restore at 07-JUL-13
Starting implicit crosscheck backup at 07-JUL-13
allocated channel: ORA_DISK_1
Crosschecked 10 objects
Finished implicit crosscheck backup at 07-JUL-13
Starting implicit crosscheck copy at 07-JUL-13
using channel ORA_DISK_1
Finished implicit crosscheck copy at 07-JUL-13
searching for all files in the recovery area
cataloging files...
no files cataloged
using channel ORA_DISK_1
channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00002 to +DATADG/csdb/datafile/sysaux.261.819979871
channel ORA_DISK_1: restoring datafile 00003 to +DATADG/csdb/datafile/undotbs1.262.819979889
channel ORA_DISK_1: reading from backup piece /u01/app/oracle/backup/CSDB_819991120_6.bk
channel ORA_DISK_1: piece handle=/u01/app/oracle/backup/CSDB_819991120_6.bk tag=ORCL_HOT_DB_BK
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:46
channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00001 to +DATADG/csdb/datafile/system.260.819979847
channel ORA_DISK_1: restoring datafile 00004 to +DATADG/csdb/datafile/undotbs2.264.819979905
channel ORA_DISK_1: restoring datafile 00005 to +DATADG/csdb/datafile/users.265.819979913
channel ORA_DISK_1: reading from backup piece /u01/app/oracle/backup/CSDB_819991120_5.bk
channel ORA_DISK_1: piece handle=/u01/app/oracle/backup/CSDB_819991120_5.bk tag=ORCL_HOT_DB_BK
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:35
Finished restore at 07-JUL-13
datafile 1 switched to datafile copy
input datafile copy RECID=6 STAMP=820112831 file name=+DATADG/csdb/datafile/system.284.820112797
datafile 2 switched to datafile copy
input datafile copy RECID=7 STAMP=820112831 file name=+DATADG/csdb/datafile/sysaux.282.820112751
datafile 3 switched to datafile copy
input datafile copy RECID=8 STAMP=820112831 file name=+DATADG/csdb/datafile/undotbs1.283.820112751
datafile 4 switched to datafile copy
input datafile copy RECID=9 STAMP=820112831 file name=+DATADG/csdb/datafile/undotbs2.285.820112797
datafile 5 switched to datafile copy
input datafile copy RECID=10 STAMP=820112831 file name=+DATADG/csdb/datafile/users.286.820112797
Starting recover at 07-JUL-13
using channel ORA_DISK_1
starting media recovery
channel ORA_DISK_1: starting archived log restore to default destination
channel ORA_DISK_1: restoring archived log
archived log thread=1 sequence=15
channel ORA_DISK_1: reading from backup piece /u01/app/oracle/backup/arc_819991156_9.bk
channel ORA_DISK_1: piece handle=/u01/app/oracle/backup/arc_819991156_9.bk tag=TAG20130705T151916
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
archived log file name=+ARCLOGDG/csdb/archivelog/2013_07_07/thread_1_seq_15.256.820112833 thread=1 sequence=15
channel ORA_DISK_1: starting archived log restore to default destination
channel ORA_DISK_1: restoring archived log
archived log thread=2 sequence=2
channel ORA_DISK_1: restoring archived log
archived log thread=1 sequence=16
channel ORA_DISK_1: reading from backup piece /u01/app/oracle/backup/arc_819991156_10.bk
channel ORA_DISK_1: piece handle=/u01/app/oracle/backup/arc_819991156_10.bk tag=TAG20130705T151916
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
archived log file name=+ARCLOGDG/csdb/archivelog/2013_07_07/thread_2_seq_2.257.820112835 thread=2 sequence=2
channel default: deleting archived log(s)
archived log file name=+ARCLOGDG/csdb/archivelog/2013_07_07/thread_1_seq_15.256.820112833 RECID=5 STAMP=820112832
archived log file name=+ARCLOGDG/csdb/archivelog/2013_07_07/thread_1_seq_16.258.820112835 thread=1 sequence=16
channel default: deleting archived log(s)
archived log file name=+ARCLOGDG/csdb/archivelog/2013_07_07/thread_2_seq_2.257.820112835 RECID=7 STAMP=820112834
channel ORA_DISK_1: starting archived log restore to default destination
channel ORA_DISK_1: restoring archived log
archived log thread=2 sequence=3
channel ORA_DISK_1: reading from backup piece /u01/app/oracle/backup/arc_819991158_11.bk
channel ORA_DISK_1: piece handle=/u01/app/oracle/backup/arc_819991158_11.bk tag=TAG20130705T151916
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
archived log file name=+ARCLOGDG/csdb/archivelog/2013_07_07/thread_2_seq_3.257.820112837 thread=2 sequence=3
channel default: deleting archived log(s)
archived log file name=+ARCLOGDG/csdb/archivelog/2013_07_07/thread_2_seq_3.257.820112837 RECID=8 STAMP=820112835
unable to find archived log
archived log thread=2 sequence=4
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 07/07/2013 01:07:16
RMAN-06054: media recovery requesting unknown archived log for thread 2 with sequence 4 and starting SCN of 323980
(10) 更改REDO LOG位置信息
SQL> select member from v$logfile;
MEMBER
--------------------------------------------------------------------------------
+DATA/csdb/redo01.log
+DATA/csdb/redo02.log
+DATA/csdb/redo03.log
+DATA/csdb/redo04.log
SQL> alter database rename file '+DATA/csdb/redo01.log' to '+DATADG/csdb/redo01.log';
Database altered.
SQL> alter database rename file '+DATA/csdb/redo02.log' to '+DATADG/csdb/redo02.log';
Database altered.
SQL> alter database rename file '+DATA/csdb/redo03.log' to '+DATADG/csdb/redo03.log';
Database altered.
SQL> alter database rename file '+DATA/csdb/redo04.log' to '+DATADG/csdb/redo04.log';
Database altered.
(11) 打开数据库
SQL> alter database open resetlogs;
Database altered.
(12) 更改TEMP表空间文件位置
SQL> select name from v$tempfile;
NAME
--------------------------------------------------------------------------------
+DATA/csdb/tempfile/temp.263.819979895
SQL> alter tablespace temp add tempfile '+DATADG';
Tablespace altered.
SQL> alter tablespace temp drop tempfile '+DATA/csdb/tempfile/temp.263.819979895';
Tablespace altered
四、完成恢复操作
(1) 在其他RAC节点上更改OCR路径
[root@rac2 ~]# vi /etc/oracle/ocr.loc
ocrconfig_loc=+SYSTEMDG
local_only=FALSE
(2) 在恢复节点上重启CRS
[root@rac1 ~]# crsctl stop crs
[root@rac1 ~]# crsctl start has
(3) 在其他节点上启动CRS
[root@rac2 ~]# crsctl start crs