一、环境描述
(1) 11.2.0.3 RAC ON Oracle Linux 6 x86_64,只有一个ASM外部冗余磁盘组——DATA;
(2)OCR,VOTEDISK,DATAFILE,CONTROLFILE,SPFILE全部位于这个磁盘组上;
二、故障描述
(1)存储故障导致ASM磁盘丢失。
(2)CRS因为OCR和VOTEDISK的丢失,除了OHAS还联机外,CLUSTERWARE服务都已经停止。
三、备份情况
(1)RMAN备份:包括controlfile,database,spfile,archivelog,
(2)OCR备份:没有进行过人工备份,在$CRS_HOME/cdata目录下有CRS自动备份文件。
四、操作步骤
说明:准使用CRS自动备份的文件恢复OCR,使用RMAN备份来恢复数据库;准备恢复数据的同时,调整ASM磁盘组,将OCR,VOTEDISK同数据库文件分开存放。
推荐阅读:
Oracle 11g从入门到精通 PDF+光盘源代码
12.04(amd64)安装完Oracle 11gR2后各种问题解决方法
4.1 恢复OCR和VOTEDISK
(1) 在所有RAC节点上停止CRS服务
[root@rac1 ~]# crsctl stop has -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on'rac1'
CRS-2673: Attempting to stop 'ora.mdnsd'on'rac1'
CRS-2673: Attempting to stop 'ora.crf'on'rac1'
CRS-2677: Stop of'ora.mdnsd'on'rac1' succeeded
CRS-2677: Stop of'ora.crf'on'rac1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd'on'rac1'
CRS-2677: Stop of'ora.gipcd'on'rac1' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd'on'rac1'
CRS-2677: Stop of'ora.gpnpd'on'rac1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on'rac1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
[root@rac2 ~]# crsctl stop has -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on'rac2'
CRS-2673: Attempting to stop 'ora.mdnsd'on'rac2'
CRS-2673: Attempting to stop 'ora.crf'on'rac2'
CRS-2677: Stop of'ora.mdnsd'on'rac2' succeeded
CRS-2677: Stop of'ora.crf'on'rac2' succeeded
CRS-2673: Attempting to stop 'ora.gipcd'on'rac2'
CRS-2677: Stop of'ora.gipcd'on'rac2' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd'on'rac2'
CRS-2677: Stop of'ora.gpnpd'on'rac2' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on'rac2' has completed
CRS-4133: Oracle High Availability Services has been stopped.
(2) 在一个节点上以NOCRS方式启动CRS,此操作会启动ASM实例。
[root@rac1 ~]# crsctl start crs -excl -nocrs
CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.mdnsd'on'rac1'
CRS-2676: Start of'ora.mdnsd'on'rac1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd'on'rac1'
CRS-2676: Start of'ora.gpnpd'on'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor'on'rac1'
CRS-2672: Attempting to start 'ora.gipcd'on'rac1'
CRS-2676: Start of'ora.cssdmonitor'on'rac1' succeeded
CRS-2676: Start of'ora.gipcd'on'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssd'on'rac1'
CRS-2672: Attempting to start 'ora.diskmon'on'rac1'
CRS-2676: Start of'ora.diskmon'on'rac1' succeeded
CRS-2676: Start of'ora.cssd'on'rac1' succeeded
CRS-2679: Attempting to clean 'ora.cluster_interconnect.haip'on'rac1'
CRS-2672: Attempting to start 'ora.ctssd'on'rac1'
CRS-2681: Clean of'ora.cluster_interconnect.haip'on'rac1' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip'on'rac1'
CRS-2676: Start of'ora.ctssd'on'rac1' succeeded
CRS-2676: Start of'ora.cluster_interconnect.haip'on'rac1' succeeded
CRS-2672: Attempting to start 'ora.asm'on'rac1'
CRS-2676: Start of'ora.asm'on'rac1' succeeded
(3) 新添加了三块磁盘,已经使用UDEV进行了绑定,查看磁盘状态。
[root@rac1 ~]# su - grid
[grid@rac1 ~]$ sqlplus / as sysasm
SQL*Plus: Release 11.2.0.3.0 Production on Fri Jul 5 17:41:49 2013
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
SQL> select group_number group#, disk_number disk#, OS_MB, state, path, header_status from v$asm_disk orderby 1,2;
GROUP# DISK# OS_MB STATE PATH HEADER_STATUS
---------- ---------- ---------- ---------- -------------------- ----------------------
0 0 1024 NORMAL /dev/asm-diskc CANDIDATE
0 1 5120 NORMAL /dev/asm-diskd CANDIDATE
0 2 20480 NORMAL /dev/asm-diskb CANDIDATE
(4) 创建三个磁盘组,SYSTEMDG给CRS使用,用于存放OCR,VOTEDISK和ASM实例的SPFILE。其余两个给ORACLE使用,DATADG用于存放datafile,controlfile,redolog,spfile;ARCLOGDG存放archivelog。
SQL> create diskgroup SYSTEMDG external redundancy
2 disk '/dev/asm-diskc'
3 ATTRIBUTE 'compatible.rdbms' = '11.2','compatible.asm' = '11.2';
Diskgroup created.
SQL> create diskgroup DATADG external redundancy
2 disk '/dev/asm-diskb'
3 ATTRIBUTE 'compatible.rdbms' = '11.2','compatible.asm' = '11.2';
Diskgroup created.
SQL> create diskgroup ARCLOGDG external redundancy
2 disk '/dev/asm-diskd'
3 ATTRIBUTE 'compatible.rdbms' = '11.2','compatible.asm' = '11.2';
Diskgroup created.
(5) 准备恢复OCR和VOTEDISK,/etc/oracle/ocr.loc中记录了OCR路径,修改ocrconfig_loc的值,以便将OCR恢复到新的磁盘组中。
[root@rac1 ~]# more /etc/oracle/ocr.loc
ocrconfig_loc=+DATA
local_only=FALSE
[root@rac1 ~]# vi /etc/oracle/ocr.loc
ocrconfig_loc=+SYSTEMDG
local_only=FALSE
(6) 恢复OCR
[root@rac1 ~]# ocrconfig -showbackup
PROT-26: Oracle Cluster Registry backup locations were retrieved from a local copy
rac1 2013/07/05 12:30:00 /u01/app/11.2.0/grid/cdata/rac-cluster/backup00.ocr
rac1 2013/07/05 08:30:00 /u01/app/11.2.0/grid/cdata/rac-cluster/backup01.ocr
rac1 2013/07/05 04:30:00 /u01/app/11.2.0/grid/cdata/rac-cluster/backup02.ocr
rac1 2013/07/05 00:29:59 /u01/app/11.2.0/grid/cdata/rac-cluster/day.ocr
rac1 2013/07/05 00:29:59 /u01/app/11.2.0/grid/cdata/rac-cluster/week.ocr
PROT-25: Manual backups for the Oracle Cluster Registry are not available
[root@rac1 ~]# ocrconfig -restore /u01/app/11.2.0/grid/cdata/rac-cluster/backup00.ocr
[root@rac1 ~]#
[root@rac1 ~]# ocrcheck
Status of Oracle Cluster Registry isas follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 2840
Available space (kbytes) : 259280
ID : 59415097
Device/File Name : +SYSTEMDG
Device/File integrity check succeeded
Device/File not configured
Device/File not configured
Device/File not configured
Device/File not configured
Cluster registry integrity check succeeded
Logical corruption check succeeded
(7) 创建VOTEDISK
[root@rac1 ~]# crsctl replace votedisk +SYSTEMDG
CRS-4602: Failed 27 toadd voting file afb0ca0f35684f1abfd43d5ec2dc1123.
Failed toreplace voting disk groupwith +SYSTEMDG.
CRS-4000: Command Replace failed, or completed with errors.
以上报错是因为使用UDEV绑定ASM磁盘时需要更改默认磁盘搜索路径为/dev/asm*,修改ASM磁盘搜索路径
[root@rac1 ~]# su - grid
[grid@rac1 ~]$ sqlplus / as sysasm
SQL*Plus: Release 11.2.0.3.0 Production on Fri Jul 5 19:03:25 2013
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
SQL> show parameter asm_diskstring
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
asm_diskstring string
SQL>
SQL>
SQL> alter system set asm_diskstring = '/dev/asm*';
System altered.
SQL> create spfile from memory;
create spfile from memory
*
ERROR at line 1:
ORA-00349: failure obtaining block sizefor
'+DATA/rac-cluster/asmparameterfile/registry.253.819922365'
ORA-15001: diskgroup "DATA" does not exist orisnot mounted
SQL> create spfile='+SYSTEMDG'from memory;
File created.
SQL> startup force mount;
ORA-32004: obsolete or deprecated parameter(s) specified for ASM instance
ASM instance started
Total System Global Area 283930624 bytes
Fixed Size 2227664 bytes
Variable Size 256537136 bytes
ASM Cache 25165824 bytes
ASM diskgroups mounted
在次创建VOTEDISK,成功。
[root@rac1 init]# crsctl replace votedisk +SYSTEMDG
Successful addition of voting disk 8ebb7a63accb4fa8bfa7ab65df7a8c8a.
Successfully replaced voting disk groupwith +SYSTEMDG.
CRS-4266: Voting file(s) successfully replaced
(8) OCR和VOTEDISK都恢复完成后,重启CRS到正常模式。
[root@rac1 ~]# crsctl stop has -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on'rac1'
CRS-2673: Attempting to stop 'ora.mdnsd'on'rac1'
CRS-2673: Attempting to stop 'ora.ctssd'on'rac1'
CRS-2673: Attempting to stop 'ora.asm'on'rac1'
CRS-2677: Stop of'ora.mdnsd'on'rac1' succeeded
CRS-2677: Stop of'ora.asm'on'rac1' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip'on'rac1'
CRS-2677: Stop of'ora.ctssd'on'rac1' succeeded
CRS-2677: Stop of'ora.cluster_interconnect.haip'on'rac1' succeeded
CRS-2673: Attempting to stop 'ora.cssd'on'rac1'
CRS-2677: Stop of'ora.cssd'on'rac1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd'on'rac1'
CRS-2677: Stop of'ora.gipcd'on'rac1' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd'on'rac1'
CRS-2677: Stop of'ora.gpnpd'on'rac1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on'rac1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
[root@rac1 ~]# crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
[root@rac1 ~]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
[root@rac1 ~]#