OLR文件丢失的恢复

11.2.0.1的RAC中,rac1和rac2

一、OLR有备份的情况
1.手动将rac1中的olr重命名,模拟丢失
    mv rac1.olr rac1.olr.test
2.重启crs
     ./crsctl stop crs
     正常关闭。
     ./crsctl start crs
     启动报错:
2016-01-12 11:08:58.249: [  OCROSD][3084149472]utopen:6m':failed in stat OCR file/disk /u01/app/11.2.0/grid/cdata/rac1.olr, errno=2, os err string=No such file or directory
2016-01-12 11:08:58.249: [  OCROSD][3084149472]utopen:7:failed to open any OCR file/disk, errno=2, os err string=No such file or directory
2016-01-12 11:08:58.249: [  OCRRAW][3084149472]proprinit: Could not open raw device
2016-01-12 11:08:58.249: [  OCRAPI][3084149472]a_init:16!: Backend init unsuccessful : [26]
2016-01-12 11:08:58.249: [  CRSOCR][3084149472] OCR context init failure.  Error: PROCL-26: Error while accessing the physical storage Operating System error [No such file or directory] [2]
2016-01-12 11:08:58.249: [ default][3084149472] OLR initalization failured, rc=26
2016-01-12 11:08:58.250: [ default][3084149472]Created alert : (:OHAS00106:) :  Failed to initialize Oracle Local Registry
2016-01-12 11:08:58.250: [ default][3084149472][PANIC] OHASD exiting; Could not init OLR
2016-01-12 11:08:58.250: [ default][3084149472] Done.

最后:
[root@rac1 bin]# ./crsctl start crs
CRS-4124: Oracle High Availability Services startup failed.
CRS-4000: Command Start failed, or completed with errors.

3.1有备份的情况
     olr备份在:
     [root@rac1 olr]# pwd
          /olr
     [root@rac1 olr]# ll
          total 6412
          -rw------- 1 root root 6553600 Jan 12 09:01 backup_20160112_090125.olr
     root用户:
     [root@rac1 bin]# ./crsctl stop crs -f
                 CRS-4133: Oracle High Availability Services has been stopped.
    [root@rac1 bin]# touch /u01/app/11.2.0/grid/cdata/rac1.olr
    [root@rac1 bin]# chown root:oinstall /u01/app/11.2.0/grid/cdata/rac1.olr
     [root@rac1 bin]# ./ocrconfig -local -restore /olr/backup_20160112_090125.olr
     然后发现:
[root@rac1 olr]# cd /u01/app/11.2.0/grid/cdata/
[root@rac1 cdata]# ll
total 8928
drwxr-xr-x 2 grid oinstall      4096 Jan 11 08:44 localhost
drwxr-xr-x 2 grid oinstall      4096 Jan 12 08:52 rac1
-rw-r--r-- 1 root oinstall 272756736 Jan 12 11:33 rac1.olr
-rw------- 1 root oinstall 272756736 Jan 12 11:02 rac1.olr.test
drwxrwxr-x 2 grid oinstall      4096 Jan 12 05:29 rac-cluster
     olr文件回去了!!
    启动一下试试:
     [root@rac1 bin]# ./crsctl start crs
      CRS-4123: Oracle High Availability Services has been started.
     检查数据库状态:
SQL> select open_mode from gv$database;

OPEN_MODE
--------------------
READ WRITE
READ WRITE
二、OLR无备份的情况
重头戏来了,没有备份的情况下
1.手动将rac1中的olr重命名,模拟丢失
    mv rac1.olr rac1.olr.test
2.重启crs
     ./crsctl stop crs
     正常关闭。
     ./crsctl start crs
     启动报错:
2016-01-12 11:08:58.249: [  OCROSD][3084149472]utopen:6m':failed in stat OCR file/disk /u01/app/11.2.0/grid/cdata/rac1.olr, errno=2, os err string=No such file or directory
2016-01-12 11:08:58.249: [  OCROSD][3084149472]utopen:7:failed to open any OCR file/disk, errno=2, os err string=No such file or directory
2016-01-12 11:08:58.249: [  OCRRAW][3084149472]proprinit: Could not open raw device
2016-01-12 11:08:58.249: [  OCRAPI][3084149472]a_init:16!: Backend init unsuccessful : [26]
2016-01-12 11:08:58.249: [  CRSOCR][3084149472] OCR context init failure.  Error: PROCL-26: Error while accessing the physical storage Operating System error [No such file or directory] [2]
2016-01-12 11:08:58.249: [ default][3084149472] OLR initalization failured, rc=26
2016-01-12 11:08:58.250: [ default][3084149472]Created alert : (:OHAS00106:) :  Failed to initialize Oracle Local Registry
2016-01-12 11:08:58.250: [ default][3084149472][PANIC] OHASD exiting; Could not init OLR
2016-01-12 11:08:58.250: [ default][3084149472] Done.

最后:
[root@rac1 bin]# ./crsctl start crs
CRS-4124: Oracle High Availability Services startup failed.
CRS-4000: Command Start failed, or completed with errors.
        
        3.那么在没有备份的前提下,就只能重新配置然后重跑root.sh以重建olr
[root@rac1 bin]# /u01/app/11.2.0/grid/crs/install/rootcrs.pl -deconfig -force
2016-01-12 14:12:34: Parsing the host name
2016-01-12 14:12:34: Checking for super user privileges
2016-01-12 14:12:34: User has super user privileges
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
PRCR-1035 : Failed to look up CRS resource ora.cluster_vip.type for 1
PRCR-1068 : Failed to query resources
Cannot communicate with crsd
PRCR-1070 : Failed to check if resource ora.gsd is registered
Cannot communicate with crsd
PRCR-1070 : Failed to check if resource ora.ons is registered
Cannot communicate with crsd
PRCR-1070 : Failed to check if resource ora.eons is registered
Cannot communicate with crsd

ACFS-9200: Supported
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Stop failed, or completed with errors.
CRS-4544: Unable to connect to OHAS
CRS-4000: Command Stop failed, or completed with errors.
error: package cvuqdisk is not installed
Successfully deconfigured Oracle clusterware stack on this node


执行root.sh
[root@rac1 bin]# /u01/app/11.2.0/grid/root.sh
Running Oracle 11g root.sh script...

The following environment variables are set as:
    ORACLE_OWNER= grid
    ORACLE_HOME=  /u01/app/11.2.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The file "dbhome" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]: y
   Copying dbhome to /usr/local/bin ...
The file "oraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]: y
   Copying oraenv to /usr/local/bin ...
The file "coraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]: y
   Copying coraenv to /usr/local/bin ...

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root.sh script.
Now product-specific root actions will be performed.
2016-01-12 14:14:26: Parsing the host name
2016-01-12 14:14:26: Checking for super user privileges
2016-01-12 14:14:26: User has super user privileges
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
LOCAL ADD MODE
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Adding daemon to inittab
CRS-4123: Oracle High Availability Services has been started.
ohasd is starting
CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node rac2, number 2, and is terminating
An active cluster was found during exclusive startup, restarting to join the cluster
CRS-2672: Attempting to start 'ora.mdnsd' on 'rac1'
CRS-2676: Start of 'ora.mdnsd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.gipcd' on 'rac1'
CRS-2676: Start of 'ora.gipcd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'rac1'
CRS-2676: Start of 'ora.gpnpd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac1'
CRS-2676: Start of 'ora.cssdmonitor' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac1'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac1'
CRS-2676: Start of 'ora.diskmon' on 'rac1' succeeded
CRS-2676: Start of 'ora.cssd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.ctssd' on 'rac1'
CRS-2676: Start of 'ora.ctssd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'rac1'
CRS-2676: Start of 'ora.drivers.acfs' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'rac1'
CRS-2676: Start of 'ora.asm' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'rac1'
CRS-2676: Start of 'ora.crsd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.evmd' on 'rac1'
CRS-2676: Start of 'ora.evmd' on 'rac1' succeeded
Timed out waiting for the CRS stack to start.

OLR有了!
[root@rac1 cdata]# ll
total 8716
drwxr-xr-x 2 grid oinstall      4096 Jan 11 08:44 localhost
drwxr-xr-x 2 grid oinstall      4096 Jan 12 08:52 rac1
-rw------- 1 root oinstall 272756736 Jan 12 11:59 rac1.olr
-rwxr-xr-x 1 grid oinstall 272756736 Jan 12 11:43 rac1.olr.bak
drwxrwxr-x 2 grid oinstall      4096 Jan 12 05:29 rac-cluster
[root@rac1 cdata]#

等等!执行root.sh的时候有报错!!
Timed out waiting for the CRS stack to start.  

查看crsd.log
2016-01-11 09:35:41.780: [    CRSD][4165444320] ENV Logging level for Module: UiServer  0
2016-01-11 09:35:41.780: [ CRSMAIN][4165444320] Checking the OCR device
2016-01-11 09:35:41.781: [ CRSMAIN][4165444320] Connecting to the CSS Daemon
2016-01-11 09:35:41.783: [ CSSCLNT][1099733312]clssnsquerymode: not connected to CSSD
2016-01-11 09:35:41.987: [ CRSMAIN][4165444320] Initializing OCR

节点1的ocrcheck
[grid@rac1 crsd]$ ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          3
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       2760
         Available space (kbytes) :     259360
         ID                       :  729466762
         Device/File Name         :       +crs
                                    Device/File integrity check succeeded

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

         Cluster registry integrity check succeeded

         Logical corruption check bypassed due to non-privileged user


节点2的ocrcheck
[grid@rac2 ~]$ ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          3
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       2728
         Available space (kbytes) :     259392
         ID                       :  729466762
         Device/File Name         :    +OCRNEW
                                    Device/File integrity check succeeded

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

         Cluster registry integrity check succeeded

         Logical corruption check bypassed due to non-privileged user

是因为我修改过ocr以及vd的存储磁盘。

首先查看磁盘状态:
GROUP_NUMBER NAME                           STATE       TYPE     TOTAL_MB    FREE_MB USABLE_FILE_MB
------------ ------------------------------ ----------- ------ ---------- ---------- --------------
           0 CRS                            MOUNTED     EXTERN       5120       4756           4756
           0 OCRNEW                         MOUNTED     NORMAL      15360      14436           7063
           0 TMP                            DISMOUNTED                  0          0              0
           0 FRA                            DISMOUNTED                  0          0              0
           0 DATA                           DISMOUNTED                  0          0              0

mount他们!

GROUP_NUMBER NAME                           STATE       TYPE     TOTAL_MB    FREE_MB USABLE_FILE_MB
------------ ------------------------------ ----------- ------ ---------- ---------- --------------
           0 CRS                            MOUNTED     EXTERN       5120       4756           4756
           0 OCRNEW                         MOUNTED     NORMAL      15360      14436           7063
           5 TMP                            MOUNTED     EXTERN       5120       4757           4757
           5 FRA                            MOUNTED     EXTERN       8192       6604           6604
           5 DATA                           MOUNTED     EXTERN       8192       5513           5513

因为时间关系,关掉了电脑,再次开机的时候,rac1节点启动了,但是rac2节点启动失败。
GROUP_NUMBER NAME                           STATE       TYPE     TOTAL_MB    FREE_MB USABLE_FILE_MB
------------ ------------------------------ ----------- ------ ---------- ---------- --------------
           0 CRS                            MOUNTED     EXTERN       5120       4756           4756
           0 DATA                           MOUNTED     EXTERN       8192       5513           5513
           5 FRA                            MOUNTED     EXTERN       8192       6581           6581
           5 OCRNEW                         MOUNTED     NORMAL      15360      14436           7063
           5 TMP                            MOUNTED     EXTERN       5120       4757           4757
两个节点的磁盘组都是mount的。

[grid@rac1 ~]$ ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          3
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       2728
         Available space (kbytes) :     259392
         ID                       :  729466762
         Device/File Name         :       +crs
                                    Device/File integrity check succeeded

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

         Cluster registry integrity check succeeded

         Logical corruption check bypassed due to non-privileged user


[grid@rac2 crsd]$ ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          3
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       2728
         Available space (kbytes) :     259392
         ID                       :  729466762
         Device/File Name         :    +OCRNEW
                                    Device/File integrity check succeeded

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

         Cluster registry integrity check succeeded

         Logical corruption check bypassed due to non-privileged user


两节点的OCR磁盘不同。
将1节点的OCR磁盘修改成与2节点相同:
[root@rac1 bin]# ./ocrconfig -add +OCRNEW
[root@rac1 bin]# ./ocrconfig -delete +crs
[root@rac1 bin]# ./ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          3
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       2728
         Available space (kbytes) :     259392
         ID                       :  729466762
         Device/File Name         :    +OCRNEW
                                    Device/File integrity check succeeded

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

         Cluster registry integrity check succeeded

         Logical corruption check succeeded

[root@rac1 bin]# ./crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   610033cee0c34ff3bf2269f62bbf7340 (/dev/raw/raw2) [OCRNEW]
 2. ONLINE   afa5da0d2a8f4f75bf05f1b72d979c4c (/dev/raw/raw3) [OCRNEW]
 3. ONLINE   02d613656c1c4f99bf59a36d62b24c8b (/dev/raw/raw4) [OCRNEW]
Located 3 voting disk(s).

1节点vd还是原来的配置。

启动2节点:
首先关闭:./crsctl stop crs -f
启动:   ./crsctl start crs
立马查看集群状态,还是没起来,赶紧找日志,找了半天没发现什么,再次查看:
[grid@rac2 rac2]$ crsctl status res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.CRS.dg
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
ora.DATA.dg
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
ora.FRA.dg
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
ora.LISTENER.lsnr
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
ora.OCRNEW.dg
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
ora.TMP.dg
               ONLINE  OFFLINE      rac1                                         
               ONLINE  ONLINE       rac2                                         
ora.asm
               ONLINE  ONLINE       rac1                     Started             
               ONLINE  ONLINE       rac2                     Started             
ora.eons
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
ora.gsd
               OFFLINE OFFLINE      rac1                                         
               OFFLINE OFFLINE      rac2                                         
ora.net1.network
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
ora.ons
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
ora.registry.acfs
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       rac1                                         
ora.oc4j
      1        OFFLINE OFFLINE                                                   
ora.orcl.db
      1        ONLINE  ONLINE       rac1                     Open               
      2        ONLINE  ONLINE       rac2                     Open               
ora.rac1.vip
      1        ONLINE  ONLINE       rac1                                         
ora.rac2.vip
      1        ONLINE  ONLINE       rac2                                         
ora.scan1.vip
      1        ONLINE  ONLINE       rac1

原来,需要时间的,不仅仅是人类,还有RAC!! 

结论:1.看到有备份和没备份的恢复步骤,赤裸裸证明备份是多么的重要!!!
          2.一旦有磁盘组的变更,建议立即对OLR进行备份!
          3.有时候我们需要的不仅仅是技术,还有耐心。

你可能感兴趣的:(OLR文件丢失的恢复)