情景介绍:
做OCR备份恢复实验,OCR有4份自动备份。将OCR磁盘从+DATA替换为+OCR2(/dev/raw/raw4) 完成之后使用ocrconfig -manualbackup手动备份OCR,完成之后对/dev/raw/raw4执行dd操作。关闭集群,启动集群,发现集群不能启动。

问题分析(假设不知道问题出在哪里,先分析):
1、检查集群服务,发现CRS和CSS服务未能正常启动
crsctl check crs
2、检查CRS和CSS日志,发现OCR磁盘异常
3、恢复OCR(其实就是使用root.sh重建OCR的过程,重建之后需要重新注册相关的资源如listener/database等)
清空所有节点的cluster配置信息:root用户执行 $GRID_HOME/crs/install/rootcrs.pl
节点1
[root@node1 install]# ./rootcrs.pl
Using configuration parameter file: ./crsconfig_params
User ignored Prerequisites during installation
Installing Trace File Analyzer
Configure Oracle Grid Infrastructure for a Cluster ... succeeded

节点2
[root@node2 install]# ./rootcrs.pl
Using configuration parameter file: ./crsconfig_params
User ignored Prerequisites during installation
Installing Trace File Analyzer
Configure Oracle Grid Infrastructure for a Cluster ... succeeded

清除所有节点的cluster信息
节点1
[root@node1 install]# ./rootcrs.pl -deconfig -force
Using configuration parameter file: ./crsconfig_params
PRCR-1119 : Failed to look up CRS resources of ora.cluster_vip_net1.type type
PRCR-1068 : Failed to query resources
Cannot communicate with crsd
PRCR-1070 : Failed to check if resource ora.gsd is registered
Cannot communicate with crsd
PRCR-1070 : Failed to check if resource ora.ons is registered
Cannot communicate with crsd

CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Stop failed, or completed with errors.
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'node1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'node1'
CRS-2673: Attempting to stop 'ora.crf' on 'node1'
CRS-2673: Attempting to stop 'ora.ctssd' on 'node1'
CRS-2673: Attempting to stop 'ora.evmd' on 'node1'
CRS-2673: Attempting to stop 'ora.asm' on 'node1'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'node1'
CRS-2677: Stop of 'ora.evmd' on 'node1' succeeded
CRS-2677: Stop of 'ora.crf' on 'node1' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'node1' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'node1' succeeded
CRS-2677: Stop of 'ora.drivers.acfs' on 'node1' succeeded
CRS-2677: Stop of 'ora.asm' on 'node1' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'node1'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'node1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'node1'
CRS-2677: Stop of 'ora.cssd' on 'node1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'node1'
CRS-2677: Stop of 'ora.gipcd' on 'node1' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'node1'
CRS-2677: Stop of 'ora.gpnpd' on 'node1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'node1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
Removing Trace File Analyzer
Successfully deconfigured Oracle clusterware stack on this node

节点2
[root@node2 install]# ./rootcrs.pl -deconfig -force -lastnode
Using configuration parameter file: ./crsconfig_params
CRS-5702: Resource 'ora.cssd' is already running on 'node2'
CRS-4000: Command Start failed, or completed with errors.
CSS startup failed with return code 1
PRCR-1068 : Failed to query resources
Cannot communicate with crsd
PRCR-1068 : Failed to query resources
Cannot communicate with crsd
PRCR-1068 : Failed to query resources
Cannot communicate with crsd
PRCR-1068 : Failed to query resources
Cannot communicate with crsd
PRCR-1119 : Failed to look up CRS resources of ora.cluster_vip_net1.type type
PRCR-1068 : Failed to query resources
Cannot communicate with crsd
PRCR-1070 : Failed to check if resource ora.gsd is registered
Cannot communicate with crsd
PRCR-1070 : Failed to check if resource ora.ons is registered
Cannot communicate with crsd

CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Stop failed, or completed with errors.
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Delete failed, or completed with errors.
CRS-2673: Attempting to stop 'ora.ctssd' on 'node2'
CRS-2673: Attempting to stop 'ora.evmd' on 'node2'
CRS-2673: Attempting to stop 'ora.asm' on 'node2'
CRS-2677: Stop of 'ora.evmd' on 'node2' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'node2' succeeded
CRS-2677: Stop of 'ora.asm' on 'node2' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'node2'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'node2' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'node2'
CRS-2677: Stop of 'ora.cssd' on 'node2' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'node2'
CRS-2676: Start of 'ora.cssdmonitor' on 'node2' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'node2'
CRS-2672: Attempting to start 'ora.diskmon' on 'node2'
CRS-2676: Start of 'ora.diskmon' on 'node2' succeeded
CRS-2676: Start of 'ora.cssd' on 'node2' succeeded
CRS-4611: Successful deletion of voting disk +DATA.
ASM de-configuration trace file location: /tmp/asmcadc_clean2016-10-31_02-02-22-PM.log
ASM Clean Configuration START
ASM Clean Configuration END

ASM with SID +ASM1 deleted successfully. Check /tmp/asmcadc_clean2016-10-31_02-02-22-PM.log for details.

CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'node2'
CRS-2673: Attempting to stop 'ora.ctssd' on 'node2'
CRS-2673: Attempting to stop 'ora.asm' on 'node2'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'node2'
CRS-2677: Stop of 'ora.mdnsd' on 'node2' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'node2' succeeded
CRS-2677: Stop of 'ora.asm' on 'node2' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'node2'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'node2' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'node2'
CRS-2677: Stop of 'ora.cssd' on 'node2' succeeded
CRS-2673: Attempting to stop 'ora.crf' on 'node2'
CRS-2677: Stop of 'ora.crf' on 'node2' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'node2'
CRS-2677: Stop of 'ora.gipcd' on 'node2' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'node2'
CRS-2677: Stop of 'ora.gpnpd' on 'node2' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'node2' has completed
CRS-4133: Oracle High Availability Services has been stopped.
Removing Trace File Analyzer
Successfully deconfigured Oracle clusterware stack on this node

重建OCR和OLR,使用root.sh脚本完成重建,其实就是安装RAC中执行的脚本,默认位置为$GRID_HOME

节点1
[root@node1 grid]# ./root.sh
Performing root user operation for Oracle 11g

The following environment variables are set as:
ORACLE_OWNER= grid
ORACLE_HOME= /u01/11.2.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/11.2.0/grid/crs/install/crsconfig_params
User ignored Prerequisites during installation
Installing Trace File Analyzer
OLR initialization - successful
Adding Clusterware entries to upstart
CRS-2672: Attempting to start 'ora.mdnsd' on 'node1'
CRS-2676: Start of 'ora.mdnsd' on 'node1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'node1'
CRS-2676: Start of 'ora.gpnpd' on 'node1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'node1'
CRS-2672: Attempting to start 'ora.gipcd' on 'node1'
CRS-2676: Start of 'ora.cssdmonitor' on 'node1' succeeded
CRS-2676: Start of 'ora.gipcd' on 'node1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'node1'
CRS-2672: Attempting to start 'ora.diskmon' on 'node1'
CRS-2676: Start of 'ora.diskmon' on 'node1' succeeded
CRS-2676: Start of 'ora.cssd' on 'node1' succeeded

ASM created and started successfully.

Disk Group DATA created successfully.

clscfg: -install mode specified
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Successful addition of voting disk 4331dad495c14f71bfdb6d4f1a82d2f9.
Successfully replaced voting disk group with +DATA.
CRS-4266: Voting file(s) successfully replaced

STATE File Universal Id File Name Disk group


  1. ONLINE 4331dad495c14f71bfdb6d4f1a82d2f9 (/dev/raw/raw1) [DATA]
    Located 1 voting disk(s).
    CRS-2672: Attempting to start 'ora.asm' on 'node1'
    CRS-2676: Start of 'ora.asm' on 'node1' succeeded
    CRS-2672: Attempting to start 'ora.DATA.dg' on 'node1'
    CRS-2676: Start of 'ora.DATA.dg' on 'node1' succeeded
    Preparing packages for installation...
    cvuqdisk-1.0.9-1
    Configure Oracle Grid Infrastructure for a Cluster ... succeeded

节点2
[root@node2 grid]# ./root.sh
Performing root user operation for Oracle 11g

The following environment variables are set as:
ORACLE_OWNER= grid
ORACLE_HOME= /u01/11.2.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/11.2.0/grid/crs/install/crsconfig_params
User ignored Prerequisites during installation
Installing Trace File Analyzer
OLR initialization - successful
Adding Clusterware entries to upstart
CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node node1, number 1, and is terminating
An active cluster was found during exclusive startup, restarting to join the cluster
Preparing packages for installation...
cvuqdisk-1.0.9-1
Configure Oracle Grid Infrastructure for a Cluster ... succeeded

检查资源信息

节点1
[root@node1 grid]# crs_stat -t
Name Type Target State Host

ora.DATA.dg ora....up.type ONLINE ONLINE node1
ora....N1.lsnr ora....er.type ONLINE ONLINE node1
ora.asm ora.asm.type ONLINE ONLINE node1
ora.cvu ora.cvu.type ONLINE ONLINE node1
ora.gsd ora.gsd.type OFFLINE OFFLINE
ora....network ora....rk.type ONLINE ONLINE node1
ora....SM1.asm application ONLINE ONLINE node1
ora.node1.gsd application OFFLINE OFFLINE
ora.node1.ons application ONLINE ONLINE node1
ora.node1.vip ora....t1.type ONLINE ONLINE node1
ora....SM2.asm application ONLINE ONLINE node2
ora.node2.gsd application OFFLINE OFFLINE
ora.node2.ons application ONLINE ONLINE node2
ora.node2.vip ora....t1.type ONLINE ONLINE node2
ora.oc4j ora.oc4j.type ONLINE ONLINE node1
ora.ons ora.ons.type ONLINE ONLINE node1
ora....ry.acfs ora....fs.type ONLINE ONLINE node1
ora.scan1.vip ora....ip.type ONLINE ONLINE node1
[root@node1 grid]# crsctl stat res -t

NAME TARGET STATE SERVER STATE_DETAILS

Local Resources

ora.DATA.dg
ONLINE ONLINE node1
ONLINE ONLINE node2
ora.asm
ONLINE ONLINE node1 Started
ONLINE ONLINE node2 Started
ora.gsd
OFFLINE OFFLINE node1
OFFLINE OFFLINE node2
ora.net1.network
ONLINE ONLINE node1
ONLINE ONLINE node2
ora.ons
ONLINE ONLINE node1
ONLINE ONLINE node2
ora.registry.acfs
ONLINE ONLINE node1
ONLINE ONLINE node2

Cluster Resources

ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE node1
ora.cvu
1 ONLINE ONLINE node1
ora.node1.vip
1 ONLINE ONLINE node1
ora.node2.vip
1 ONLINE ONLINE node2
ora.oc4j
1 ONLINE ONLINE node1
ora.scan1.vip
1 ONLINE ONLINE node1

节点2
[root@node2 grid]# crs_stat -t
Name Type Target State Host

ora.DATA.dg ora....up.type ONLINE ONLINE node1
ora....N1.lsnr ora....er.type ONLINE ONLINE node1
ora.asm ora.asm.type ONLINE ONLINE node1
ora.cvu ora.cvu.type ONLINE ONLINE node1
ora.gsd ora.gsd.type OFFLINE OFFLINE
ora....network ora....rk.type ONLINE ONLINE node1
ora....SM1.asm application ONLINE ONLINE node1
ora.node1.gsd application OFFLINE OFFLINE
ora.node1.ons application ONLINE ONLINE node1
ora.node1.vip ora....t1.type ONLINE ONLINE node1
ora....SM2.asm application ONLINE ONLINE node2
ora.node2.gsd application OFFLINE OFFLINE
ora.node2.ons application ONLINE ONLINE node2
ora.node2.vip ora....t1.type ONLINE ONLINE node2
ora.oc4j ora.oc4j.type ONLINE ONLINE node1
ora.ons ora.ons.type ONLINE ONLINE node1
ora....ry.acfs ora....fs.type ONLINE ONLINE node1
ora.scan1.vip ora....ip.type ONLINE ONLINE node1
[root@node2 grid]# crsctl stat res -t

NAME TARGET STATE SERVER STATE_DETAILS

Local Resources

ora.DATA.dg
ONLINE ONLINE node1
ONLINE ONLINE node2
ora.asm
ONLINE ONLINE node1 Started
ONLINE ONLINE node2 Started
ora.gsd
OFFLINE OFFLINE node1
OFFLINE OFFLINE node2
ora.net1.network
ONLINE ONLINE node1
ONLINE ONLINE node2
ora.ons
ONLINE ONLINE node1
ONLINE ONLINE node2
ora.registry.acfs
ONLINE ONLINE node1
ONLINE ONLINE node2

Cluster Resources

ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE node1
ora.cvu
1 ONLINE ONLINE node1
ora.node1.vip
1 ONLINE ONLINE node1
ora.node2.vip
1 ONLINE ONLINE node2
ora.oc4j
1 ONLINE ONLINE node1
ora.scan1.vip
1 ONLINE ONLINE node1

查看磁盘组信息,如果没有挂载则手动挂载:
SQL> select name,state from v$asm_diskgroup;

4、添加资源(监听、数据库、实例等)

添加监听
[grid@node1 ~]$ srvctl add listener -l listener
查看监听
[grid@node1 ~]$ srvctl config listener

添加db和instance
[oracle@node1 ~]$ srvctl add database -h
[oracle@node1 ~]$ srvctl add database -d orcl -o /u01/app/oracle/product/11.2.0/db_1 -c RAC
[oracle@node1 ~]$ srvctl add instance -h
[oracle@node1 ~]$ srvctl add instance -d orcl -i orcl1 -n node1
[oracle@node1 ~]$ srvctl add instance -d orcl -i orcl2 -n node2
[oracle@node1 ~]$ srvctl config database -d orcl

5、资源添加完毕,重新启动集群
[root@node1 grid]# crsctl stop cluster -all
[root@node1 grid]# crsctl start cluster -all

添加完成后,可能出现数据库不能自动启动的问题。尝试执行以下语句:
[oracle@node1 ~]$ srvctl enable database -d orcl
[oracle@node1 ~]$ srvctl enable instance -d orcl -i orcl1
[oracle@node1 ~]$ srvctl enable instance -d orcl -i orcl2
[oracle@node1 ~]$ srvctl start database -d orcl