oracle asm 存储故障,ORACLE 11G RAC ASM磁盘全部丢失后的恢复

故障描述

(1)存储故障导致ASM磁盘丢失。

(2)CRS因为OCR和VOTEDISK的丢失,除了OHAS还联机外,CLUSTERWARE服务都已经停止

操作步骤

一、恢复OCR和VOTEDISK

(1) 在所有RAC节点上停止CRS服务

[root@node1 ~]crsctl stop has -f

[root@node2 ~]crsctl stop has -f

(2) 在一个节点上以NOCRS方式启动CRS,此操作会启动ASM实例。

root@node1 ~]crsctl start crs -excl -nocrs

(3)之前已创建asm磁盘,查看磁盘状态。

[grid@node1 ~]sqlplus /as sysasm

SQL>select group_number group#, disk_number disk#, OS_MB, state, path, header_status from v$asm_disk order by 1,2;

(4) 创建三个磁盘组,OCR_VOTE给CRS使用,用于存放OCR,VOTEDISK和ASM实例的SPFILE。其余两个给ORACLE使用,ASM_DATA用于存放datafile,

controlfile,redolog,spfile;ASM_FRA存放archivelog。

SQL> create diskgroup OCR_VOTE external redundancy

2  disk 'OCRL:OCR_VOTE1' //OCRL:OCR_VOTE1这是在前一步骤中查看到的磁盘路径path

3  disk 'OCRL:OCR_VOTE2'

4  ATTRIBUTE 'compatible.rdbms' = '11.2','compatible.asm' = '11.2';

Diskgroup created.

SQL> create diskgroup ASM_DATA external redundancy

2  disk 'OCRL:ASM_DATA1'

3  disk 'OCRL:ASM_DATA2'

4  disk 'OCRL:ASM_DATA3'

5  ATTRIBUTE 'compatible.rdbms' = '11.2','compatible.asm' = '11.2';

Diskgroup created.

SQL> create diskgroup ASM_FRA external redundancy

2  disk 'OCRL:ASM_FRA1'

2  disk 'OCRL:ASM_FRA2'

3  ATTRIBUTE 'compatible.rdbms' = '11.2','compatible.asm' = '11.2';

Diskgroup created.

(5) 准备恢复OCR和VOTEDISK,/etc/oracle/ocr.loc中记录了OCR路径,修改ocrconfig_loc的值,以便将OCR恢复到新的磁盘组中。

[root@rac1 ~]# more /etc/oracle/ocr.loc

ocrconfig_loc=+DATA

local_only=FALSE

[root@rac1 ~]# vi /etc/oracle/ocr.loc

ocrconfig_loc=+SYSTEMDG

local_only=FALSE

(6) 恢复OCR

[root@rac1 ~]# ocrconfig -restore /u01/app/11.2.0/grid/cdata/rac-cluster/backup00.ocr

[root@rac1 ~]#

[root@rac1 ~]# ocrcheck

Status of Oracle Cluster Registry is as follows :

Version                  :          3

Total space (kbytes)     :     262120

Used space (kbytes)      :       2840

Available space (kbytes) :     259280

ID                       :   59415097

Device/File Name         :  +SYSTEMDG

Device/File integrity check succeeded

Device/File not configured

Device/File not configured

Device/File not configured

Device/File not configured

Cluster registry integrity check succeeded

Logical corruption check succeeded

(7) 创建VOTEDISK

[root@rac1 init]# crsctl replace votedisk +OCR_VOTE

Successful addition of voting disk 8ebb7a63accb4fa8bfa7ab65df7a8c8a.

Successfully replaced voting disk group with +OCR_VOTE.

CRS-4266: Voting file(s) successfully replaced

(8)  OCR和VOTEDISK都恢复完成后,重启CRS到正常模式。

[root@rac1 ~]# crsctl start crs

CRS-4123: Oracle High Availability Services has been started.

[root@rac1 ~]# crsctl check crs (如以下资源没在线,请稍等或reboot重启下系统)

CRS-4638: Oracle High Availability Services is online

CRS-4537: Cluster Ready Services is online

CRS-4529: Cluster Synchronization Services is online

CRS-4533: Event Manager is online

二、修改CRS注册表中相关配置信息

(1) 挂载新的ASM磁盘组

[grid@rac1 ~]$ sqlplus / as sysasm

SQL*Plus: Release 11.2.0.3.0 Production on Sat Jul 6 00:16:05 2013

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to:

Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production

With the Real Application Clusters and Automatic Storage Management options

SQL>

SQL> select name,state from v$asm_diskgroup;

NAME                           STATE

------------------------------ -----------

SYSTEMDG                       MOUNTED

ARCLOGDG                       DISMOUNTED

DATADG                         DISMOUNTED

SQL> alter diskgroup ARCLOGDG,DATADG mount;

Diskgroup altered.

(2) 更改CRS配置文件中数据库的磁盘组为DATADG和ARCLOGDG

[root@rac1 ~]# srvctl modify database -d csdb -a "DATADG,ARCLOGDG"

(3) 禁用并删除原来的磁盘组DATA

[root@rac1 ~]# srvctl disable diskgroup -g DATA

[root@rac1 ~]# srvctl remove diskgroup -g DATA

[root@rac1 rac-cluster]# crs_stat -t -v

Name           Type           R/RA   F/FT   Target    State     Host

----------------------------------------------------------------------

ora....OGDG.dg ora....up.type 0/5    0/     ONLINE    ONLINE    rac1

ora.DATADG.dg  ora....up.type 0/5    0/     ONLINE    ONLINE    rac1

ora....ER.lsnr ora....er.type 0/5    0/     ONLINE    ONLINE    rac1

ora....N1.lsnr ora....er.type 0/5    0/0    ONLINE    ONLINE    rac1

ora.asm        ora.asm.type   0/5    0/     ONLINE    ONLINE    rac1

ora.csdb.db    ora....se.type 0/2    0/1    ONLINE    OFFLINE

ora.cvu        ora.cvu.type   0/5    0/0    ONLINE    ONLINE    rac1

ora.gsd        ora.gsd.type   0/5    0/     OFFLINE   OFFLINE

ora....network ora....rk.type 0/5    0/     ONLINE    ONLINE    rac1

ora.oc4j       ora.oc4j.type  0/1    0/2    ONLINE    ONLINE    rac1

ora.ons        ora.ons.type   0/3    0/     ONLINE    ONLINE    rac1

ora....SM1.asm application    0/5    0/0    ONLINE    ONLINE    rac1

ora....C1.lsnr application    0/5    0/0    ONLINE    ONLINE    rac1

ora.rac1.gsd   application    0/5    0/0    OFFLINE   OFFLINE

ora.rac1.ons   application    0/3    0/0    ONLINE    ONLINE    rac1

ora.rac1.vip   ora....t1.type 0/0    0/0    ONLINE    ONLINE    rac1

ora.rac2.vip   ora....t1.type 0/0    0/0    ONLINE    ONLINE    rac1

ora.scan1.vip  ora....ip.type 0/0    0/0    ONLINE    ONLINE    rac1

(4) 在OCR注册表中修改Oracle数据库参数文件的位置

[root@rac1 ~]# srvctl modify database -d csdb -p +DATADG/csdb/spfilecsdb.ora

三、恢复数据库

(1) 查看备份文件路径和名称

[root@rac1 ~]# su - oracle

[oracle@rac1 ~]$

[oracle@rac1 ~]$ cd /u01/app/oracle/backup

[oracle@rac1 backup]$ ll

total 221796

-rw-r----- 1 oracle asmadmin   5357568 Jul  5 15:19 arc_819991156_9.bk

-rw-r----- 1 oracle asmadmin      2560 Jul  5 15:19 arc_819991158_11.bk

-rw-r----- 1 oracle asmadmin 203104256 Jul  5 15:18 CSDB_819991120_5.bk

-rw-r----- 1 oracle asmadmin  18546688 Jul  5 15:19 ctl_file_0coe04jq_1_1_20130705.ctl

-rw-r----- 1 oracle asmadmin     98304 Jul  5 15:19 spfile_0doe04js_1_1_20130705

[oracle@rac1 backup]$

(2) 创建一个基本的启动参数文件,以便启动数据库到nomout状态恢复spfile

[oracle@rac1 ~]$ touch /u01/app/oracle/backup/init.ora

[oracle@rac1 ~]$ vi /u01/app/oracle/backup/init.ora

*.db_name='csdb'

*.remote_login_passwordfile='exclusive'

(3) 使用刚创建的参数文件将数据库启动到nomount状态

[oracle@rac1 ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Sat Jul 6 13:56:06 2013

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup nomount pfile='/u01/app/oracle/backup/init.ora';

ORACLE instance started.

Total System Global Area  238034944 bytes

Fixed Size                  2227136 bytes

Variable Size             180356160 bytes

Database Buffers           50331648 bytes

Redo Buffers                5120000 bytes

SQL>

(4) 使用RMAN恢复SPFILE到ASM磁盘组DATADG

[oracle@rac1 ~]$ rman target /

Recovery Manager: Release 11.2.0.3.0 - Production on Sat Jul 6 13:59:26 2013

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

connected to target database: CSDB (not mounted)

RMAN> restore spfile to '+DATADG/csdb/spfilecsdb.ora' from '/u01/app/oracle/backup/spfile_0doe04js_1_1_20130705';

Starting restore at 06-JUL-13

using channel ORA_DISK_1

channel ORA_DISK_1: restoring spfile from AUTOBACKUP /u01/app/oracle/backup/spfile_0doe04js_1_1_20130705

channel ORA_DISK_1: SPFILE restore from AUTOBACKUP complete

Finished restore at 06-JUL-13

(5) 使用恢复后spfile启动数据库,并修改control_files,db_recovery_file_dest,log_archive_dest等存在旧路径的参数值。

[oracle@rac1 ~]$ vi $ORACLE_HOME/dbs/initcsdb1.ora

SPFILE='+DATADG/csdb/spfilecsdb.ora'

[oracle@rac1 ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Sat Jul 6 14:11:58 2013

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to:

Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production

With the Partitioning, Real Application Clusters, OLAP, Data Mining

and Real Application Testing options

SQL> startup nomount force

ORACLE instance started.

Total System Global Area 1653518336 bytes

Fixed Size                  2228904 bytes

Variable Size            1073745240 bytes

Database Buffers          570425344 bytes

Redo Buffers                7118848 bytes

SQL>

SQL> show parameter control_files

NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

control_files                        string      +DATA/csdb/control01.ctl, +DAT

A/csdb/control02.ctl

SQL>

SQL> alter system set control_files='+DATADG/csdb/control01.ctl','+DATADG/csdb/control02.ctl' scope=spfile

System altered.

SQL> alter system set db_recovery_file_dest='+DATADG' scope=spfile;

System altered.

SQL> alter system set log_archive_dest_1='LOCATION=+ARCLOGDG' scope=spfile;

System altered.

SQL> startup force nomount;

ORACLE instance started.

Total System Global Area 1653518336 bytes

Fixed Size                  2228904 bytes

Variable Size            1073745240 bytes

Database Buffers          570425344 bytes

Redo Buffers                7118848 bytes

(6) 查看数据库的DBID

[oracle@rac1 ~]$ strings /u01/app/oracle/backup/CSDB_819991120_5.bk | grep MAXVALUE,

返回的值类似下面的例子,其中那一窜数字即为DBID。

...

MAXVALUE, MAXVALUE!

3042905279, MAXVALUE,

3042905279, MAXVALUE,

...

(7) 恢复控制文件到新的ASM磁盘组DATADG

[oracle@rac1 ~]$ rman target /

Recovery Manager: Release 11.2.0.3.0 - Production on Sat Jul 6 14:28:28 2013

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

connected to target database: CSDB (not mounted)

RMAN> set dbid=3042905279

executing command: SET DBID

RMAN> restore controlfile from '/u01/app/oracle/backup/ctl_file_0coe04jq_1_1_20130705.ctl';

Starting restore at 06-JUL-13

using target database control file instead of recovery catalog

allocated channel: ORA_DISK_1

channel ORA_DISK_1: SID=18 instance=csdb1 device type=DISK

channel ORA_DISK_1: restoring control file

channel ORA_DISK_1: restore complete, elapsed time: 00:00:01

output file name=+DATADG/csdb/control01.ctl

output file name=+DATADG/csdb/control02.ctl

Finished restore at 06-JUL-13

(8) 进入SQLPLUS,查看旧数据文件信息

[oracle@rac1 ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Sat Jul 6 14:41:12 2013

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to:

Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production

With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,

Data Mining and Real Application Testing options

SQL> alter database mount;

Database altered.

SQL> col name format a50

SQL> select file#,name from v$datafile;

FILE# NAME

---------- --------------------------------------------------

1 +DATA/csdb/datafile/system.260.819979847

2 +DATA/csdb/datafile/sysaux.261.819979871

3 +DATA/csdb/datafile/undotbs1.262.819979889

4 +DATA/csdb/datafile/undotbs2.264.819979905

5 +DATA/csdb/datafile/users.265.819979913

(9)使用RMAN恢复数据库

RMAN> run{

2> set newname for datafile 1 to '+DATADG/csdb/datafile/system.260.819979847';

3> set newname for datafile 2 to '+DATADG/csdb/datafile/sysaux.261.819979871';

4> set newname for datafile 3 to '+DATADG/csdb/datafile/undotbs1.262.819979889';

5> set newname for datafile 4 to '+DATADG/csdb/datafile/undotbs2.264.819979905';

6> set newname for datafile 5 to '+DATADG/csdb/datafile/users.265.819979913';

7> restore database;

8> switch datafile all;

9> recover database;

10> }

executing command: SET NEWNAME

released channel: ORA_DISK_1

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

Starting restore at 07-JUL-13

Starting implicit crosscheck backup at 07-JUL-13

allocated channel: ORA_DISK_1

Crosschecked 10 objects

Finished implicit crosscheck backup at 07-JUL-13

Starting implicit crosscheck copy at 07-JUL-13

using channel ORA_DISK_1

Finished implicit crosscheck copy at 07-JUL-13

searching for all files in the recovery area

cataloging files...

no files cataloged

using channel ORA_DISK_1

channel ORA_DISK_1: starting datafile backup set restore

channel ORA_DISK_1: specifying datafile(s) to restore from backup set

channel ORA_DISK_1: restoring datafile 00002 to +DATADG/csdb/datafile/sysaux.261.819979871

channel ORA_DISK_1: restoring datafile 00003 to +DATADG/csdb/datafile/undotbs1.262.819979889

channel ORA_DISK_1: reading from backup piece /u01/app/oracle/backup/CSDB_819991120_6.bk

channel ORA_DISK_1: piece handle=/u01/app/oracle/backup/CSDB_819991120_6.bk tag=ORCL_HOT_DB_BK

channel ORA_DISK_1: restored backup piece 1

channel ORA_DISK_1: restore complete, elapsed time: 00:00:46

channel ORA_DISK_1: starting datafile backup set restore

channel ORA_DISK_1: specifying datafile(s) to restore from backup set

channel ORA_DISK_1: restoring datafile 00001 to +DATADG/csdb/datafile/system.260.819979847

channel ORA_DISK_1: restoring datafile 00004 to +DATADG/csdb/datafile/undotbs2.264.819979905

channel ORA_DISK_1: restoring datafile 00005 to +DATADG/csdb/datafile/users.265.819979913

channel ORA_DISK_1: reading from backup piece /u01/app/oracle/backup/CSDB_819991120_5.bk

channel ORA_DISK_1: piece handle=/u01/app/oracle/backup/CSDB_819991120_5.bk tag=ORCL_HOT_DB_BK

channel ORA_DISK_1: restored backup piece 1

channel ORA_DISK_1: restore complete, elapsed time: 00:00:35

Finished restore at 07-JUL-13

datafile 1 switched to datafile copy

input datafile copy RECID=6 STAMP=820112831 file name=+DATADG/csdb/datafile/system.284.820112797

datafile 2 switched to datafile copy

input datafile copy RECID=7 STAMP=820112831 file name=+DATADG/csdb/datafile/sysaux.282.820112751

datafile 3 switched to datafile copy

input datafile copy RECID=8 STAMP=820112831 file name=+DATADG/csdb/datafile/undotbs1.283.820112751

datafile 4 switched to datafile copy

input datafile copy RECID=9 STAMP=820112831 file name=+DATADG/csdb/datafile/undotbs2.285.820112797

datafile 5 switched to datafile copy

input datafile copy RECID=10 STAMP=820112831 file name=+DATADG/csdb/datafile/users.286.820112797

Starting recover at 07-JUL-13

using channel ORA_DISK_1

starting media recovery

channel ORA_DISK_1: starting archived log restore to default destination

channel ORA_DISK_1: restoring archived log

archived log thread=1 sequence=15

channel ORA_DISK_1: reading from backup piece /u01/app/oracle/backup/arc_819991156_9.bk

channel ORA_DISK_1: piece handle=/u01/app/oracle/backup/arc_819991156_9.bk tag=TAG20130705T151916

channel ORA_DISK_1: restored backup piece 1

channel ORA_DISK_1: restore complete, elapsed time: 00:00:01

archived log file name=+ARCLOGDG/csdb/archivelog/2013_07_07/thread_1_seq_15.256.820112833 thread=1 sequence=15

channel ORA_DISK_1: starting archived log restore to default destination

channel ORA_DISK_1: restoring archived log

archived log thread=2 sequence=2

channel ORA_DISK_1: restoring archived log

archived log thread=1 sequence=16

channel ORA_DISK_1: reading from backup piece /u01/app/oracle/backup/arc_819991156_10.bk

channel ORA_DISK_1: piece handle=/u01/app/oracle/backup/arc_819991156_10.bk tag=TAG20130705T151916

channel ORA_DISK_1: restored backup piece 1

channel ORA_DISK_1: restore complete, elapsed time: 00:00:01

archived log file name=+ARCLOGDG/csdb/archivelog/2013_07_07/thread_2_seq_2.257.820112835 thread=2 sequence=2

channel default: deleting archived log(s)

archived log file name=+ARCLOGDG/csdb/archivelog/2013_07_07/thread_1_seq_15.256.820112833 RECID=5 STAMP=820112832

archived log file name=+ARCLOGDG/csdb/archivelog/2013_07_07/thread_1_seq_16.258.820112835 thread=1 sequence=16

channel default: deleting archived log(s)

archived log file name=+ARCLOGDG/csdb/archivelog/2013_07_07/thread_2_seq_2.257.820112835 RECID=7 STAMP=820112834

channel ORA_DISK_1: starting archived log restore to default destination

channel ORA_DISK_1: restoring archived log

archived log thread=2 sequence=3

channel ORA_DISK_1: reading from backup piece /u01/app/oracle/backup/arc_819991158_11.bk

channel ORA_DISK_1: piece handle=/u01/app/oracle/backup/arc_819991158_11.bk tag=TAG20130705T151916

channel ORA_DISK_1: restored backup piece 1

channel ORA_DISK_1: restore complete, elapsed time: 00:00:01

archived log file name=+ARCLOGDG/csdb/archivelog/2013_07_07/thread_2_seq_3.257.820112837 thread=2 sequence=3

channel default: deleting archived log(s)

archived log file name=+ARCLOGDG/csdb/archivelog/2013_07_07/thread_2_seq_3.257.820112837 RECID=8 STAMP=820112835

unable to find archived log

archived log thread=2 sequence=4

RMAN-00571: ===========================================================

RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============

RMAN-00571: ===========================================================

RMAN-03002: failure of recover command at 07/07/2013 01:07:16

RMAN-06054: media recovery requesting unknown archived log for thread 2 with sequence 4 and starting SCN of 323980

(10) 更改REDO LOG位置信息

SQL> select member from v$logfile;

MEMBER

--------------------------------------------------------------------------------

+DATA/csdb/redo01.log

+DATA/csdb/redo02.log

+DATA/csdb/redo03.log

+DATA/csdb/redo04.log

SQL> alter database rename file '+DATA/csdb/redo01.log' to '+DATADG/csdb/redo01.log';

Database altered.

SQL> alter database rename file '+DATA/csdb/redo02.log' to '+DATADG/csdb/redo02.log';

Database altered.

SQL> alter database rename file '+DATA/csdb/redo03.log' to '+DATADG/csdb/redo03.log';

Database altered.

SQL> alter database rename file '+DATA/csdb/redo04.log' to '+DATADG/csdb/redo04.log';

Database altered.

(11) 打开数据库

SQL> alter database open resetlogs;

Database altered.

(12) 更改TEMP表空间文件位置

SQL> select name from v$tempfile;

NAME

--------------------------------------------------------------------------------

+DATA/csdb/tempfile/temp.263.819979895

SQL> alter tablespace temp add tempfile '+DATADG';

Tablespace altered.

SQL> alter tablespace temp drop tempfile '+DATA/csdb/tempfile/temp.263.819979895';

Tablespace altered

四、完成恢复操作

(1) 在其他RAC节点上更改OCR路径

[root@rac2 ~]# vi /etc/oracle/ocr.loc

ocrconfig_loc=+SYSTEMDG

local_only=FALSE

(2) 在恢复节点上重启CRS

[root@rac1 ~]# crsctl stop crs

[root@rac1 ~]# crsctl start has

(3) 在其他节点上启动CRS

[root@rac2 ~]# crsctl start crs

你可能感兴趣的:(oracle,asm,存储故障)