11.2.0.1的RAC中,rac1和rac2 一、OLR有备份的情况 1.手动将rac1中的olr重命名,模拟丢失 mv rac1.olr rac1.olr.test 2.重启crs ./crsctl stop crs 正常关闭。 ./crsctl start crs 启动报错: 2016-01-12 11:08:58.249: [ OCROSD][3084149472]utopen:6m':failed in stat OCR file/disk /u01/app/11.2.0/grid/cdata/rac1.olr, errno=2, os err string=No such file or directory 2016-01-12 11:08:58.249: [ OCROSD][3084149472]utopen:7:failed to open any OCR file/disk, errno=2, os err string=No such file or directory 2016-01-12 11:08:58.249: [ OCRRAW][3084149472]proprinit: Could not open raw device 2016-01-12 11:08:58.249: [ OCRAPI][3084149472]a_init:16!: Backend init unsuccessful : [26] 2016-01-12 11:08:58.249: [ CRSOCR][3084149472] OCR context init failure. Error: PROCL-26: Error while accessing the physical storage Operating System error [No such file or directory] [2] 2016-01-12 11:08:58.249: [ default][3084149472] OLR initalization failured, rc=26 2016-01-12 11:08:58.250: [ default][3084149472]Created alert : (:OHAS00106:) : Failed to initialize Oracle Local Registry 2016-01-12 11:08:58.250: [ default][3084149472][PANIC] OHASD exiting; Could not init OLR 2016-01-12 11:08:58.250: [ default][3084149472] Done. 最后: [root@rac1 bin]# ./crsctl start crs CRS-4124: Oracle High Availability Services startup failed. CRS-4000: Command Start failed, or completed with errors. 3.1有备份的情况 olr备份在: [root@rac1 olr]# pwd /olr [root@rac1 olr]# ll total 6412 -rw------- 1 root root 6553600 Jan 12 09:01 backup_20160112_090125.olr root用户: [root@rac1 bin]# ./crsctl stop crs -f CRS-4133: Oracle High Availability Services has been stopped. [root@rac1 bin]# touch /u01/app/11.2.0/grid/cdata/rac1.olr [root@rac1 bin]# chown root:oinstall /u01/app/11.2.0/grid/cdata/rac1.olr [root@rac1 bin]# ./ocrconfig -local -restore /olr/backup_20160112_090125.olr 然后发现: [root@rac1 olr]# cd /u01/app/11.2.0/grid/cdata/ [root@rac1 cdata]# ll total 8928 drwxr-xr-x 2 grid oinstall 4096 Jan 11 08:44 localhost drwxr-xr-x 2 grid oinstall 4096 Jan 12 08:52 rac1 -rw-r--r-- 1 root oinstall 272756736 Jan 12 11:33 rac1.olr -rw------- 1 root oinstall 272756736 Jan 12 11:02 rac1.olr.test drwxrwxr-x 2 grid oinstall 4096 Jan 12 05:29 rac-cluster olr文件回去了!! 启动一下试试: [root@rac1 bin]# ./crsctl start crs CRS-4123: Oracle High Availability Services has been started. 检查数据库状态: SQL> select open_mode from gv$database; OPEN_MODE -------------------- READ WRITE READ WRITE 二、OLR无备份的情况 重头戏来了,没有备份的情况下 1.手动将rac1中的olr重命名,模拟丢失 mv rac1.olr rac1.olr.test 2.重启crs ./crsctl stop crs 正常关闭。 ./crsctl start crs 启动报错: 2016-01-12 11:08:58.249: [ OCROSD][3084149472]utopen:6m':failed in stat OCR file/disk /u01/app/11.2.0/grid/cdata/rac1.olr, errno=2, os err string=No such file or directory 2016-01-12 11:08:58.249: [ OCROSD][3084149472]utopen:7:failed to open any OCR file/disk, errno=2, os err string=No such file or directory 2016-01-12 11:08:58.249: [ OCRRAW][3084149472]proprinit: Could not open raw device 2016-01-12 11:08:58.249: [ OCRAPI][3084149472]a_init:16!: Backend init unsuccessful : [26] 2016-01-12 11:08:58.249: [ CRSOCR][3084149472] OCR context init failure. Error: PROCL-26: Error while accessing the physical storage Operating System error [No such file or directory] [2] 2016-01-12 11:08:58.249: [ default][3084149472] OLR initalization failured, rc=26 2016-01-12 11:08:58.250: [ default][3084149472]Created alert : (:OHAS00106:) : Failed to initialize Oracle Local Registry 2016-01-12 11:08:58.250: [ default][3084149472][PANIC] OHASD exiting; Could not init OLR 2016-01-12 11:08:58.250: [ default][3084149472] Done. 最后: [root@rac1 bin]# ./crsctl start crs CRS-4124: Oracle High Availability Services startup failed. CRS-4000: Command Start failed, or completed with errors. 3.那么在没有备份的前提下,就只能重新配置然后重跑root.sh以重建olr [root@rac1 bin]# /u01/app/11.2.0/grid/crs/install/rootcrs.pl -deconfig -force 2016-01-12 14:12:34: Parsing the host name 2016-01-12 14:12:34: Checking for super user privileges 2016-01-12 14:12:34: User has super user privileges Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params PRCR-1035 : Failed to look up CRS resource ora.cluster_vip.type for 1 PRCR-1068 : Failed to query resources Cannot communicate with crsd PRCR-1070 : Failed to check if resource ora.gsd is registered Cannot communicate with crsd PRCR-1070 : Failed to check if resource ora.ons is registered Cannot communicate with crsd PRCR-1070 : Failed to check if resource ora.eons is registered Cannot communicate with crsd ACFS-9200: Supported CRS-4535: Cannot communicate with Cluster Ready Services CRS-4000: Command Stop failed, or completed with errors. CRS-4544: Unable to connect to OHAS CRS-4000: Command Stop failed, or completed with errors. error: package cvuqdisk is not installed Successfully deconfigured Oracle clusterware stack on this node 执行root.sh [root@rac1 bin]# /u01/app/11.2.0/grid/root.sh Running Oracle 11g root.sh script... The following environment variables are set as: ORACLE_OWNER= grid ORACLE_HOME= /u01/app/11.2.0/grid Enter the full pathname of the local bin directory: [/usr/local/bin]: The file "dbhome" already exists in /usr/local/bin. Overwrite it? (y/n) [n]: y Copying dbhome to /usr/local/bin ... The file "oraenv" already exists in /usr/local/bin. Overwrite it? (y/n) [n]: y Copying oraenv to /usr/local/bin ... The file "coraenv" already exists in /usr/local/bin. Overwrite it? (y/n) [n]: y Copying coraenv to /usr/local/bin ... Entries will be added to the /etc/oratab file as needed by Database Configuration Assistant when a database is created Finished running generic part of root.sh script. Now product-specific root actions will be performed. 2016-01-12 14:14:26: Parsing the host name 2016-01-12 14:14:26: Checking for super user privileges 2016-01-12 14:14:26: User has super user privileges Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params LOCAL ADD MODE Creating OCR keys for user 'root', privgrp 'root'.. Operation successful. Adding daemon to inittab CRS-4123: Oracle High Availability Services has been started. ohasd is starting CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node rac2, number 2, and is terminating An active cluster was found during exclusive startup, restarting to join the cluster CRS-2672: Attempting to start 'ora.mdnsd' on 'rac1' CRS-2676: Start of 'ora.mdnsd' on 'rac1' succeeded CRS-2672: Attempting to start 'ora.gipcd' on 'rac1' CRS-2676: Start of 'ora.gipcd' on 'rac1' succeeded CRS-2672: Attempting to start 'ora.gpnpd' on 'rac1' CRS-2676: Start of 'ora.gpnpd' on 'rac1' succeeded CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac1' CRS-2676: Start of 'ora.cssdmonitor' on 'rac1' succeeded CRS-2672: Attempting to start 'ora.cssd' on 'rac1' CRS-2672: Attempting to start 'ora.diskmon' on 'rac1' CRS-2676: Start of 'ora.diskmon' on 'rac1' succeeded CRS-2676: Start of 'ora.cssd' on 'rac1' succeeded CRS-2672: Attempting to start 'ora.ctssd' on 'rac1' CRS-2676: Start of 'ora.ctssd' on 'rac1' succeeded CRS-2672: Attempting to start 'ora.drivers.acfs' on 'rac1' CRS-2676: Start of 'ora.drivers.acfs' on 'rac1' succeeded CRS-2672: Attempting to start 'ora.asm' on 'rac1' CRS-2676: Start of 'ora.asm' on 'rac1' succeeded CRS-2672: Attempting to start 'ora.crsd' on 'rac1' CRS-2676: Start of 'ora.crsd' on 'rac1' succeeded CRS-2672: Attempting to start 'ora.evmd' on 'rac1' CRS-2676: Start of 'ora.evmd' on 'rac1' succeeded Timed out waiting for the CRS stack to start. OLR有了! [root@rac1 cdata]# ll total 8716 drwxr-xr-x 2 grid oinstall 4096 Jan 11 08:44 localhost drwxr-xr-x 2 grid oinstall 4096 Jan 12 08:52 rac1 -rw------- 1 root oinstall 272756736 Jan 12 11:59 rac1.olr -rwxr-xr-x 1 grid oinstall 272756736 Jan 12 11:43 rac1.olr.bak drwxrwxr-x 2 grid oinstall 4096 Jan 12 05:29 rac-cluster [root@rac1 cdata]# 等等!执行root.sh的时候有报错!! Timed out waiting for the CRS stack to start. 查看crsd.log 2016-01-11 09:35:41.780: [ CRSD][4165444320] ENV Logging level for Module: UiServer 0 2016-01-11 09:35:41.780: [ CRSMAIN][4165444320] Checking the OCR device 2016-01-11 09:35:41.781: [ CRSMAIN][4165444320] Connecting to the CSS Daemon 2016-01-11 09:35:41.783: [ CSSCLNT][1099733312]clssnsquerymode: not connected to CSSD 2016-01-11 09:35:41.987: [ CRSMAIN][4165444320] Initializing OCR 节点1的ocrcheck [grid@rac1 crsd]$ ocrcheck Status of Oracle Cluster Registry is as follows : Version : 3 Total space (kbytes) : 262120 Used space (kbytes) : 2760 Available space (kbytes) : 259360 ID : 729466762 Device/File Name : +crs Device/File integrity check succeeded Device/File not configured Device/File not configured Device/File not configured Device/File not configured Cluster registry integrity check succeeded Logical corruption check bypassed due to non-privileged user 节点2的ocrcheck [grid@rac2 ~]$ ocrcheck Status of Oracle Cluster Registry is as follows : Version : 3 Total space (kbytes) : 262120 Used space (kbytes) : 2728 Available space (kbytes) : 259392 ID : 729466762 Device/File Name : +OCRNEW Device/File integrity check succeeded Device/File not configured Device/File not configured Device/File not configured Device/File not configured Cluster registry integrity check succeeded Logical corruption check bypassed due to non-privileged user 是因为我修改过ocr以及vd的存储磁盘。 首先查看磁盘状态: GROUP_NUMBER NAME STATE TYPE TOTAL_MB FREE_MB USABLE_FILE_MB ------------ ------------------------------ ----------- ------ ---------- ---------- -------------- 0 CRS MOUNTED EXTERN 5120 4756 4756 0 OCRNEW MOUNTED NORMAL 15360 14436 7063 0 TMP DISMOUNTED 0 0 0 0 FRA DISMOUNTED 0 0 0 0 DATA DISMOUNTED 0 0 0 mount他们! GROUP_NUMBER NAME STATE TYPE TOTAL_MB FREE_MB USABLE_FILE_MB ------------ ------------------------------ ----------- ------ ---------- ---------- -------------- 0 CRS MOUNTED EXTERN 5120 4756 4756 0 OCRNEW MOUNTED NORMAL 15360 14436 7063 5 TMP MOUNTED EXTERN 5120 4757 4757 5 FRA MOUNTED EXTERN 8192 6604 6604 5 DATA MOUNTED EXTERN 8192 5513 5513 因为时间关系,关掉了电脑,再次开机的时候,rac1节点启动了,但是rac2节点启动失败。 GROUP_NUMBER NAME STATE TYPE TOTAL_MB FREE_MB USABLE_FILE_MB ------------ ------------------------------ ----------- ------ ---------- ---------- -------------- 0 CRS MOUNTED EXTERN 5120 4756 4756 0 DATA MOUNTED EXTERN 8192 5513 5513 5 FRA MOUNTED EXTERN 8192 6581 6581 5 OCRNEW MOUNTED NORMAL 15360 14436 7063 5 TMP MOUNTED EXTERN 5120 4757 4757 两个节点的磁盘组都是mount的。 [grid@rac1 ~]$ ocrcheck Status of Oracle Cluster Registry is as follows : Version : 3 Total space (kbytes) : 262120 Used space (kbytes) : 2728 Available space (kbytes) : 259392 ID : 729466762 Device/File Name : +crs Device/File integrity check succeeded Device/File not configured Device/File not configured Device/File not configured Device/File not configured Cluster registry integrity check succeeded Logical corruption check bypassed due to non-privileged user [grid@rac2 crsd]$ ocrcheck Status of Oracle Cluster Registry is as follows : Version : 3 Total space (kbytes) : 262120 Used space (kbytes) : 2728 Available space (kbytes) : 259392 ID : 729466762 Device/File Name : +OCRNEW Device/File integrity check succeeded Device/File not configured Device/File not configured Device/File not configured Device/File not configured Cluster registry integrity check succeeded Logical corruption check bypassed due to non-privileged user 两节点的OCR磁盘不同。 将1节点的OCR磁盘修改成与2节点相同: [root@rac1 bin]# ./ocrconfig -add +OCRNEW [root@rac1 bin]# ./ocrconfig -delete +crs [root@rac1 bin]# ./ocrcheck Status of Oracle Cluster Registry is as follows : Version : 3 Total space (kbytes) : 262120 Used space (kbytes) : 2728 Available space (kbytes) : 259392 ID : 729466762 Device/File Name : +OCRNEW Device/File integrity check succeeded Device/File not configured Device/File not configured Device/File not configured Device/File not configured Cluster registry integrity check succeeded Logical corruption check succeeded [root@rac1 bin]# ./crsctl query css votedisk ## STATE File Universal Id File Name Disk group -- ----- ----------------- --------- --------- 1. ONLINE 610033cee0c34ff3bf2269f62bbf7340 (/dev/raw/raw2) [OCRNEW] 2. ONLINE afa5da0d2a8f4f75bf05f1b72d979c4c (/dev/raw/raw3) [OCRNEW] 3. ONLINE 02d613656c1c4f99bf59a36d62b24c8b (/dev/raw/raw4) [OCRNEW] Located 3 voting disk(s). 1节点vd还是原来的配置。 启动2节点: 首先关闭:./crsctl stop crs -f 启动: ./crsctl start crs 立马查看集群状态,还是没起来,赶紧找日志,找了半天没发现什么,再次查看: [grid@rac2 rac2]$ crsctl status res -t -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.CRS.dg ONLINE ONLINE rac1 ONLINE ONLINE rac2 ora.DATA.dg ONLINE ONLINE rac1 ONLINE ONLINE rac2 ora.FRA.dg ONLINE ONLINE rac1 ONLINE ONLINE rac2 ora.LISTENER.lsnr ONLINE ONLINE rac1 ONLINE ONLINE rac2 ora.OCRNEW.dg ONLINE ONLINE rac1 ONLINE ONLINE rac2 ora.TMP.dg ONLINE OFFLINE rac1 ONLINE ONLINE rac2 ora.asm ONLINE ONLINE rac1 Started ONLINE ONLINE rac2 Started ora.eons ONLINE ONLINE rac1 ONLINE ONLINE rac2 ora.gsd OFFLINE OFFLINE rac1 OFFLINE OFFLINE rac2 ora.net1.network ONLINE ONLINE rac1 ONLINE ONLINE rac2 ora.ons ONLINE ONLINE rac1 ONLINE ONLINE rac2 ora.registry.acfs ONLINE ONLINE rac1 ONLINE ONLINE rac2 -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE rac1 ora.oc4j 1 OFFLINE OFFLINE ora.orcl.db 1 ONLINE ONLINE rac1 Open 2 ONLINE ONLINE rac2 Open ora.rac1.vip 1 ONLINE ONLINE rac1 ora.rac2.vip 1 ONLINE ONLINE rac2 ora.scan1.vip 1 ONLINE ONLINE rac1 原来,需要时间的,不仅仅是人类,还有RAC!! 结论:1.看到有备份和没备份的恢复步骤,赤裸裸证明备份是多么的重要!!! 2.一旦有磁盘组的变更,建议立即对OLR进行备份! 3.有时候我们需要的不仅仅是技术,还有耐心。