恢复损坏的表决磁盘(votedisk)和OCR

如何恢复损坏的表决磁盘(votedisk)和OCR

前记:

OCR:管理集群节点的相关信息及实例到节点的映射信息。 votedisk:解决分区集群中的集群成员资格问题。

ASM磁盘组冗余的三种类型:external、normal、high,这里恢复的是normal状态,模拟OCR磁盘或votedisk不可用时,

一般OCR磁盘还有ASMPARAMETERFILE的ASM参数文件,操作前需要先备份 create pfile='/home/oracle/11' from spfile

冗余:
Voting 磁盘(不要使用偶数个):
External 需要最少1个 Voting 磁盘(或者1个 failure group)
Normal 需要最少3个 Voting 磁盘(或者3个 failure group)
High 需要最少5个 Voting 磁盘(或者5个 failure group)
缺少 failure group 会引起 voting disk 创建失败。例如 ORA-15274: Not enough failgroups (3) to create voting files
OCR: 
10.2 和 11.1,最多2个 OCR 设备:OCR 和 OCRMIRROR
11.2+,最多5个 OCR。




步骤:
一、orc运行信息

1.查OCR有哪些备份:
[root@node1 ~]# /u01/app/11.2.0/grid/bin/ocrconfig -showbackup


node1     2014/12/10 13:08:14     /u01/app/11.2.0/grid/cdata/lxxhscan/backup00.ocr


node1     2014/12/10 09:08:13     /u01/app/11.2.0/grid/cdata/lxxhscan/backup01.ocr


node1     2014/12/10 05:08:13     /u01/app/11.2.0/grid/cdata/lxxhscan/backup02.ocr


node1     2014/12/09 09:08:07     /u01/app/11.2.0/grid/cdata/lxxhscan/day.ocr


node2     2014/12/02 07:20:17     /u01/app/11.2.0/grid/cdata/lxxhscan/week.ocr
PROT-25: Manual backups for the Oracle Cluster Registry are not available
[root@node1 ~]# 
2.查看表决盘信息:
[root@node1 ~]# /u01/app/11.2.0/grid/bin/crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   4c0a1622b1a04f48bf1425fadfbcd461 (/dev/mapper/ocr1p1) [OCR_VOTE]
 2. ONLINE   627259113bb24f88bf15ed81694c8f17 (/dev/mapper/ocr2p1) [OCR_VOTE]
 3. ONLINE   1086d74f5f5f4fb5bfe89bbb27054baf (/dev/mapper/ocr3p1) [OCR_VOTE]
Located 3 voting disk(s).
[root@node1 ~]#


3.检查信息ocr
[root@node1 ~]# /u01/app/11.2.0/grid/bin/ocrcheck 
Status of Oracle Cluster Registry is as follows :
         Version                  :          3
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       3084
         Available space (kbytes) :     259036
         ID                       :  995290881
         Device/File Name         :  +OCR_VOTE
                                    Device/File integrity check succeeded


                                    Device/File not configured


                                    Device/File not configured


                                    Device/File not configured


                                    Device/File not configured


         Cluster registry integrity check succeeded


         Logical corruption check succeeded


[root@node1 ~]# 




二、主机上添加新的存储分区(3块 每块2G NORMAL)


新添加的磁盘
1. for i in `ls /sys/class/scsi_host/`; do echo "- - -" >> /sys/class/scsi_host/$i/scan; done 磁盘uid扫描
[root@node1 home]# for i in `ls /sys/class/scsi_host/`; do echo "- - -" >> /sys/class/scsi_host/$i/scan; done
[root@node1 home]# ls -l /dev/mapper/
total 0
lrwxrwxrwx 1 root root      8 Dec 10 17:09 36000d3100082db000000000000000018 -> ../dm-12
lrwxrwxrwx 1 root root      8 Dec 10 17:09 36000d3100082db000000000000000019 -> ../dm-14
lrwxrwxrwx 1 root root      8 Dec 10 17:09 36000d3100082db00000000000000001a -> ../dm-13
lrwxrwxrwx 1 root root      7 Nov 12 15:42 archlog -> ../dm-2
lrwxrwxrwx 1 root root      7 Nov 12 15:42 archlogp1 -> ../dm-7
crw-rw---- 1 root root 10, 58 Nov 12 15:42 control
lrwxrwxrwx 1 root root      7 Nov 12 15:42 fra -> ../dm-9
lrwxrwxrwx 1 root root      8 Nov 12 15:42 frap1 -> ../dm-11
lrwxrwxrwx 1 root root      7 Nov 12 15:42 ocr1 -> ../dm-1
lrwxrwxrwx 1 root root      7 Nov 12 15:42 ocr1p1 -> ../dm-5
lrwxrwxrwx 1 root root      7 Nov 12 15:42 ocr2 -> ../dm-6
lrwxrwxrwx 1 root root      8 Nov 12 15:42 ocr2p1 -> ../dm-10
lrwxrwxrwx 1 root root      7 Nov 12 15:42 ocr3 -> ../dm-0
lrwxrwxrwx 1 root root      7 Nov 12 15:42 ocr3p1 -> ../dm-4
lrwxrwxrwx 1 root root      7 Nov 12 15:42 racdata -> ../dm-3
lrwxrwxrwx 1 root root      7 Nov 12 15:42 racdatap1 -> ../dm-8
[root@node1 home]# 




[root@node2 ~]# for i in `ls /sys/class/scsi_host/`; do echo "- - -" >> /sys/class/scsi_host/$i/scan; done
[root@node2 ~]# ls -l /dev/mapper/
total 0
lrwxrwxrwx 1 root root      8 Dec 10 17:09 36000d3100082db000000000000000018 -> ../dm-12
lrwxrwxrwx 1 root root      8 Dec 10 17:09 36000d3100082db000000000000000019 -> ../dm-14
lrwxrwxrwx 1 root root      8 Dec 10 17:09 36000d3100082db00000000000000001a -> ../dm-13
lrwxrwxrwx 1 root root      7 Nov 12 15:01 archlog -> ../dm-1
lrwxrwxrwx 1 root root      7 Nov 12 15:01 archlogp1 -> ../dm-5
crw-rw---- 1 root root 10, 58 Nov 12 15:01 control
lrwxrwxrwx 1 root root      7 Nov 12 15:01 fra -> ../dm-3
lrwxrwxrwx 1 root root      7 Nov 12 15:01 frap1 -> ../dm-9
lrwxrwxrwx 1 root root      7 Nov 12 15:01 ocr1 -> ../dm-2
lrwxrwxrwx 1 root root      7 Nov 12 15:01 ocr1p1 -> ../dm-7
lrwxrwxrwx 1 root root      7 Nov 12 15:01 ocr2 -> ../dm-8
lrwxrwxrwx 1 root root      8 Nov 12 15:01 ocr2p1 -> ../dm-11
lrwxrwxrwx 1 root root      7 Nov 12 15:01 ocr3 -> ../dm-0
lrwxrwxrwx 1 root root      7 Nov 12 15:01 ocr3p1 -> ../dm-4
lrwxrwxrwx 1 root root      7 Nov 12 15:01 racdata -> ../dm-6
lrwxrwxrwx 1 root root      8 Nov 12 15:01 racdatap1 -> ../dm-10
[root@node2 ~]# 


2. vi /etc/multipath.conf
添加如下


multipath {
               wwid                    36000d3100082db000000000000000018
               alias                   ocrnew1
               path_grouping_policy    multibus
#              path_checker            readsector0
               path_selector           "round-robin 0"
               failback                manual
               rr_weight               priorities
               no_path_retry           5
               rr_min_io               10
       }
       multipath {
               wwid                    36000d3100082db00000000000000001a
               alias                   ocrnew2
               path_grouping_policy    multibus
#              path_checker            readsector0
               path_selector           "round-robin 0"
               failback                manual
               rr_weight               priorities
               no_path_retry           5
               rr_min_io               10
       }
       multipath {
               wwid                    36000d3100082db000000000000000019
               alias                   ocrnew3
               path_grouping_policy    multibus
#              path_checker            readsector0
               path_selector           "round-robin 0"
               failback                manual
               rr_weight               priorities
               no_path_retry           5
               rr_min_io               10
       }


3.重启服务 service multipathd restart
4验证
ls -l /dev/mapper/
. multipath -ll


[root@node11 etc]# service multipathd restart --重启会导致磁盘权限掉失(如果以前用chmod chown付权限的就好掉失,如果以前用udev 付权限的就不会。)
ok
Stopping multipathd daemon:                                [  OK  ]
Starting multipathd daemon:                                [  OK  ]
[root@node11 etc]# ls -l /dev/mapper/
total 0
lrwxrwxrwx 1 root root      7 Dec 10 17:19 archlog -> ../dm-2
lrwxrwxrwx 1 root root      7 Dec 10 17:19 archlogp1 -> ../dm-7
crw-rw---- 1 root root 10, 58 Nov 12 15:42 control
lrwxrwxrwx 1 root root      7 Dec 10 17:19 fra -> ../dm-9
lrwxrwxrwx 1 root root      8 Dec 10 17:19 frap1 -> ../dm-11
lrwxrwxrwx 1 root root      7 Dec 10 17:19 ocr1 -> ../dm-1
lrwxrwxrwx 1 root root      7 Dec 10 17:19 ocr1p1 -> ../dm-5
lrwxrwxrwx 1 root root      7 Dec 10 17:19 ocr2 -> ../dm-6
lrwxrwxrwx 1 root root      8 Dec 10 17:19 ocr2p1 -> ../dm-10
lrwxrwxrwx 1 root root      7 Dec 10 17:19 ocr3 -> ../dm-0
lrwxrwxrwx 1 root root      7 Dec 10 17:19 ocr3p1 -> ../dm-4
lrwxrwxrwx 1 root root      8 Dec 10 17:19 ocrnew1 -> ../dm-12                  ##########是块设备
lrwxrwxrwx 1 root root      8 Dec 10 17:19 ocrnew2 -> ../dm-13 ###########是块设备
lrwxrwxrwx 1 root root      8 Dec 10 17:19 ocrnew3 -> ../dm-14 ############是块设备
lrwxrwxrwx 1 root root      7 Dec 10 17:19 racdata -> ../dm-3
lrwxrwxrwx 1 root root      7 Dec 10 17:19 racdatap1 -> ../dm-8
[root@node1 etc]# 




[root@node2 ~]# service multipathd restart
ok
Stopping multipathd daemon:                                [  OK  ]
Starting multipathd daemon:                                [  OK  ]
[root@node2 ~]# ls -l /dev/mapper/
total 0
lrwxrwxrwx 1 root root      7 Dec 10 17:19 archlog -> ../dm-1
lrwxrwxrwx 1 root root      7 Dec 10 17:19 archlogp1 -> ../dm-5
crw-rw---- 1 root root 10, 58 Nov 12 15:01 control
lrwxrwxrwx 1 root root      7 Dec 10 17:19 fra -> ../dm-3
lrwxrwxrwx 1 root root      7 Dec 10 17:19 frap1 -> ../dm-9
lrwxrwxrwx 1 root root      7 Dec 10 17:19 ocr1 -> ../dm-2
lrwxrwxrwx 1 root root      7 Dec 10 17:19 ocr1p1 -> ../dm-7
lrwxrwxrwx 1 root root      7 Dec 10 17:19 ocr2 -> ../dm-8
lrwxrwxrwx 1 root root      8 Dec 10 17:19 ocr2p1 -> ../dm-11
lrwxrwxrwx 1 root root      7 Dec 10 17:19 ocr3 -> ../dm-0
lrwxrwxrwx 1 root root      7 Dec 10 17:19 ocr3p1 -> ../dm-4
lrwxrwxrwx 1 root root      8 Dec 10 17:19 ocrnew1 -> ../dm-12
lrwxrwxrwx 1 root root      8 Dec 10 17:19 ocrnew2 -> ../dm-13
lrwxrwxrwx 1 root root      8 Dec 10 17:19 ocrnew3 -> ../dm-14
lrwxrwxrwx 1 root root      7 Dec 10 17:19 racdata -> ../dm-6
lrwxrwxrwx 1 root root      8 Dec 10 17:19 racdatap1 -> ../dm-10
[root@node2 ~]# 




5.分区 fdisk成分区(分不分区自己定) :使用partprobe可以不用重启系统即可配合fdisk工具创建新的分区




fdisk /dev/mapper/ocrnew1
fdisk /dev/mapper/ocrnew2
fdisk /dev/mapper/ocrnew3
5.添加权限(权限重新付)
chown grid:asmadmin /dev/mapper/ocrnew1*
chown grid:asmadmin /dev/mapper/ocrnew2*
chown grid:asmadmin /dev/mapper/ocrnew3*
chmod 660  /dev/mapper/ocrnew1*
chmod 660  /dev/mapper/ocrnew2*
chmod 660  /dev/mapper/ocrnew3*
chown grid:asmadmin /dev/mapper/archlog*
chown grid:asmadmin /dev/mapper/fra*
chown grid:asmadmin /dev/mapper/ocr1*
chown grid:asmadmin /dev/mapper/ocr2*
chown grid:asmadmin /dev/mapper/ocr3*
chown grid:asmadmin /dev/mapper/racdata*


chmod 660  /dev/mapper/archlog*
chmod 660  /dev/mapper/fra*
chmod 660  /dev/mapper/ocr1*
chmod 660  /dev/mapper/ocr2*
chmod 660  /dev/mapper/ocr3*
chmod 660  /dev/mapper/racdata*


三、开始恢复
确定ocr,votedisk,asm spfile存在一个独立asm diskgroup中
[grid@node1 ~]$  /u01/app/11.2.0/grid/bin/ocrcheck 
Status of Oracle Cluster Registry is as follows :
         Version                  :          3
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       3084
         Available space (kbytes) :     259036
         ID                       :  995290881
         Device/File Name         :  +OCR_VOTE
                                    Device/File integrity check succeeded


                                    Device/File not configured


                                    Device/File not configured


                                    Device/File not configured


                                    Device/File not configured


         Cluster registry integrity check succeeded


         Logical corruption check bypassed due to non-privileged user


grid以sysasm用户进入asm实例;

SQL> show parameter spfile;


NAME                    TYPE
------------------------------------ ----------------------
VALUE
------------------------------
spfile                  string
+OCR_VOTE/lxxhscan/asmparamete
rfile/registry.253.862151267




 SQL>create pfile='/home/oracle/11' from spfile


[grid@node1 ~]$ crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   4c0a1622b1a04f48bf1425fadfbcd461 (/dev/mapper/ocr1p1) [OCR_VOTE]
 2. ONLINE   627259113bb24f88bf15ed81694c8f17 (/dev/mapper/ocr2p1) [OCR_VOTE]
 3. ONLINE   1086d74f5f5f4fb5bfe89bbb27054baf (/dev/mapper/ocr3p1) [OCR_VOTE]
Located 3 voting disk(s).
[grid@node1 ~]$ 

ASMCMD> lsdsk -t -G dg_sys
ASMCMD-8001: diskgroup 'dg_sys' does not exist or is not mounted
ASMCMD> lsdsk -t -G OCR_VOTE
Create_Date          Mount_Date           Repair_Timer  Path
2014-10-28 14:27:39  2014-11-12 15:03:21  0             /dev/mapper/ocr1p1
2014-10-28 14:27:39  2014-11-12 15:03:21  0             /dev/mapper/ocr2p1
2014-10-28 14:27:39  2014-11-12 15:03:21  0             /dev/mapper/ocr3p1
ASMCMD> 


查看当前rac状态
[grid@node11 ~]$ crsctl status res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ARCHLOG.dg
               ONLINE  ONLINE       node11                                       
ora.DATA.dg
               ONLINE  ONLINE       node11                                       
ora.FRA.dg
               ONLINE  ONLINE       node11                                       
ora.LISTENER.lsnr
               ONLINE  ONLINE       node11                                       
ora.OCR_VOTE.dg
               ONLINE  ONLINE       node11                                       
ora.asm
               ONLINE  ONLINE       node11                   Started             
ora.gsd
               OFFLINE OFFLINE      node11                                       
ora.net1.network
               ONLINE  ONLINE       node11                                       
ora.ons
               ONLINE  ONLINE       node11                                       
ora.registry.acfs
               ONLINE  ONLINE       node11                                       
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       node11                                       
ora.cvu
      1        ONLINE  ONLINE       node11                                       
ora.gnnt.db
      1        ONLINE  ONLINE       node11                   Open                
      2        ONLINE  OFFLINE                                                   
ora.node11.vip
      1        ONLINE  ONLINE       node11                                       
ora.node12.vip
      1        ONLINE  INTERMEDIATE node11                   FAILED OVER         
ora.oc4j
      1        ONLINE  ONLINE       node11                                       
ora.scan1.vip
      1        ONLINE  ONLINE       node11 




备份ocr
[root@node1 ~]# /u01/app/11.2.0/grid/bin/ocrconfig  -manualbackup
node1     2014/12/11 11:15:01     /u01/app/11.2.0/grid/cdata/lxxhscan/backup_20141211_111501.ocr
[root@node1~]# /u01/app/11.2.0/grid/bin/ocrconfig -showbackup
node1     2014/12/11 09:08:19     /u01/app/11.2.0/grid/cdata/lxxhscan/backup00.ocr
node1     2014/12/11 05:08:18     /u01/app/11.2.0/grid/cdata/lxxhscan/backup01.ocr
node1     2014/12/11 01:08:15     /u01/app/11.2.0/grid/cdata/lxxhscan/backup02.ocr
node1     2014/12/10 09:08:13     /u01/app/11.2.0/grid/cdata/lxxhscan/day.ocr
node2     2014/12/02 07:20:17     /u01/app/11.2.0/grid/cdata/lxxhscan/week.ocr
node1     2014/12/11 11:15:01     /u01/app/11.2.0/grid/cdata/lxxhscan/backup_20141211_111501.ocr
[root@node1 ~]# 






1.破坏ocr磁盘
dd if=/dev/zero of=/dev/mapper/ocr1p1 bs=1024K count=1
dd if=/dev/zero of=/dev/mapper/ocr2p1 bs=1024K count=1
dd if=/dev/zero of=/dev/mapper/ocr3p1 bs=1024K count=1


[root@node1 ~]# dd if=/dev/zero of=/dev/mapper/ocr2p1 bs=1024K count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.00242945 s, 432 MB/s
[root@node1 ~]# dd if=/dev/zero of=/dev/mapper/ocr3p1 bs=1024K count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0030372 s, 345 MB/s
[root@node1 ~]# 




2.关闭crs
crsctl stop crs
3.启动crs
[grid@node1 ~]$ crsctl start crs
CRS-4563: Insufficient user privileges.


CRS-4000: Command Start failed, or completed with errors.


[grid@node1 ~]$ crsctl status res -t -init
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  INTERMEDIATE node11                   OCR not started     
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       node11                                       
ora.crf
      1        ONLINE  ONLINE       node11                                       
ora.crsd
      1        ONLINE  OFFLINE                                                   
ora.cssd
      1        ONLINE  ONLINE       node11                                       
ora.cssdmonitor
      1        ONLINE  ONLINE       node11                                       
ora.ctssd
      1        ONLINE  ONLINE       node11                   ACTIVE:0            
ora.diskmon
      1        OFFLINE OFFLINE                                                   
ora.drivers.acfs
      1        ONLINE  ONLINE       node11                                       
ora.evmd
      1        ONLINE  INTERMEDIATE node11                                       
ora.gipcd
      1        ONLINE  ONLINE       node11                                       
ora.gpnpd
      1        ONLINE  ONLINE       node11                                       
ora.mdnsd
      1        ONLINE  ONLINE       node11                                       
[grid@node1 ~]$ 


[grid@node1 ~]$ crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4534: Cannot communicate with Event Manager           =======


[root@node1 ~]# /u01/app/11.2.0/grid/bin/crsctl start cluster -all
CRS-2800: Cannot start resource 'ora.asm' as it is already in the INTERMEDIATE state on server 'node11'
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'node12'
CRS-2676: Start of 'ora.cssdmonitor' on 'node12' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'node12'
CRS-2672: Attempting to start 'ora.diskmon' on 'node12'
CRS-2676: Start of 'ora.diskmon' on 'node12' succeeded
CRS-2676: Start of 'ora.cssd' on 'node12' succeeded
CRS-2672: Attempting to start 'ora.ctssd' on 'node12'
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'node12'
CRS-2676: Start of 'ora.ctssd' on 'node12' succeeded
CRS-2672: Attempting to start 'ora.evmd' on 'node12'
CRS-2676: Start of 'ora.evmd' on 'node12' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'node12' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'node12'
CRS-2674: Start of 'ora.asm' on 'node12' failed
CRS-4705: Start of Clusterware failed on node node11.
CRS-4705: Start of Clusterware failed on node node12.
CRS-4000: Command Start failed, or completed with errors.


ocrcheck检测报错:
[root@node1 ~]# ocrcheck
-bash: ocrcheck: command not found
[root@node1 ~]# /u01/app/11.2.0/grid/bin/ocrcheck
PROT-602: Failed to retrieve data from the cluster registry
PROC-26: Error while accessing the physical storage


[root@node1 ~]# 




GI相关日志
--alert日志
2014-12-11 14:51:33.092: 
[/u01/app/11.2.0/grid/bin/oraagent.bin(7298)]CRS-5019:All OCR locations are on ASM disk groups [OCR_VOTE], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/u01/app/11.2.0/grid/log/node11/agent/ohasd/oraagent_grid/oraagent_grid.log".
2014-12-11 14:52:03.098: 
[/u01/app/11.2.0/grid/bin/oraagent.bin(7298)]CRS-5019:All OCR locations are on ASM disk groups [OCR_VOTE], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/u01/app/11.2.0/grid/log/node11/agent/ohasd/oraagent_grid/oraagent_grid.log"


--ocssd日志
/u01/app/11.2.0/grid/log/node11/agent/ohasd/oraagent_grid/oraagent_grid.log
[root@node1 ~]# tail -f /u01/app/11.2.0/grid/log/node11/crsd/crsd.log                               
2014-12-11 13:21:51.881: [    CRSD][1824413440]{1:23871:1029} Done.


2014-12-11 13:21:51.979: [ CRSCOMM][1843324672] IpcL: connection to member 1 has been removed
2014-12-11 13:21:51.979: [CLSFRAME][1843324672] Removing IPC Member:{Relative|Node:0|Process:1|Type:3}
2014-12-11 13:21:51.979: [CLSFRAME][1843324672] Disconnected from AGENT process: {Relative|Node:0|Process:1|Type:3}
2014-12-11 13:21:51.979: [    AGFW][1837020928]{1:23871:1052} Agfw Proxy Server received process disconnected notification, count=1
2014-12-11 13:21:51.979: [   CRSPE][1826514688]{1:23871:1051} Disconnected from server: 
2014-12-11 13:21:51.979: [    AGFW][1837020928]{1:23871:1052} /u01/app/11.2.0/grid/bin/oraagent_grid disconnected.
2014-12-11 13:21:51.979: [    AGFW][1837020928]{1:23871:1052} Agent /u01/app/11.2.0/grid/bin/oraagent_grid[77864] stopped!
2014-12-11 13:21:51.979: [ CRSCOMM][1837020928]{1:23871:1052} IpcL: removeConnection: Member 1 does not exist in pending connections.


在我们破坏了ocr所在的asm disk的磁盘后,启动crs明显提示无法找到votedisk信息




###################正式开始恢复###################
以-excl -nocrs 方式启动集群,这将启动ASM实例 但不启动CRS




1.强制关闭crs
[root@node1 ~]# /u01/app/11.2.0/grid/bin/crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'node11'
CRS-2673: Attempting to stop 'ora.crf' on 'node11'
CRS-2673: Attempting to stop 'ora.ctssd' on 'node11'
CRS-2673: Attempting to stop 'ora.evmd' on 'node11'
CRS-2673: Attempting to stop 'ora.asm' on 'node11'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'node11'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'node11'
CRS-2677: Stop of 'ora.crf' on 'node11' succeeded
CRS-2677: Stop of 'ora.evmd' on 'node11' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'node11' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'node11' succeeded
CRS-2677: Stop of 'ora.drivers.acfs' on 'node11' succeeded
CRS-2677: Stop of 'ora.asm' on 'node11' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'node11'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'node11' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'node11'
CRS-2677: Stop of 'ora.cssd' on 'node11' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'node11'
CRS-2677: Stop of 'ora.gipcd' on 'node11' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'node11'
CRS-2677: Stop of 'ora.gpnpd' on 'node11' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'node11' has completed
CRS-4133: Oracle High Availability Services has been stopped.


[root@node2 ~]#  /u01/app/11.2.0/grid/bin/crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'node12'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'node12'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'node12'
CRS-2673: Attempting to stop 'ora.ctssd' on 'node12'
CRS-2673: Attempting to stop 'ora.evmd' on 'node12'
CRS-2673: Attempting to stop 'ora.asm' on 'node12'
CRS-2677: Stop of 'ora.evmd' on 'node12' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'node12' succeeded
CRS-2677: Stop of 'ora.drivers.acfs' on 'node12' succeeded
CRS-2677: Stop of 'ora.asm' on 'node12' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'node12'
CRS-2677: Stop of 'ora.ctssd' on 'node12' succeeded
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'node12' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'node12'
CRS-2677: Stop of 'ora.cssd' on 'node12' succeeded
CRS-2673: Attempting to stop 'ora.crf' on 'node12'
CRS-2677: Stop of 'ora.crf' on 'node12' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'node12'
CRS-2677: Stop of 'ora.gipcd' on 'node12' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'node12'
CRS-2677: Stop of 'ora.gpnpd' on 'node12' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'node12' has completed
CRS-4133: Oracle High Availability Services has been stopped.




2.exclusive模式启动crs   以-excl -nocrs 方式启动集群,这将启动ASM实例 但不启动CRS 启动到独占模式且不启动ora.crsd:


crsctl start crs -excl -nocrs
[root@node1 ~]# /u01/app/11.2.0/grid/bin/crsctl start crs -excl -nocrs
CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.mdnsd' on 'node11'
CRS-2676: Start of 'ora.mdnsd' on 'node11' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'node11'
CRS-2676: Start of 'ora.gpnpd' on 'node11' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'node11'
CRS-2672: Attempting to start 'ora.gipcd' on 'node11'
CRS-2676: Start of 'ora.cssdmonitor' on 'node11' succeeded
CRS-2676: Start of 'ora.gipcd' on 'node11' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'node11'
CRS-2672: Attempting to start 'ora.diskmon' on 'node11'
CRS-2676: Start of 'ora.diskmon' on 'node11' succeeded
CRS-2676: Start of 'ora.cssd' on 'node11' succeeded
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'node11'
CRS-2679: Attempting to clean 'ora.cluster_interconnect.haip' on 'node11'
CRS-2672: Attempting to start 'ora.ctssd' on 'node11'
CRS-2681: Clean of 'ora.cluster_interconnect.haip' on 'node11' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'node11'
CRS-2676: Start of 'ora.drivers.acfs' on 'node11' succeeded
CRS-2676: Start of 'ora.ctssd' on 'node11' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'node11' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'node11'
CRS-2676: Start of 'ora.asm' on 'node11' succeeded


crsctl stat res -t -init
[grid@node1 ~]$ crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  INTERMEDIATE node11                   OCR not started     
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       node11                                       
ora.crf
      1        OFFLINE OFFLINE                                                   
ora.crsd
      1        OFFLINE OFFLINE                                                   
ora.cssd
      1        ONLINE  ONLINE       node11                                       
ora.cssdmonitor
      1        ONLINE  ONLINE       node11                                       
ora.ctssd
      1        ONLINE  ONLINE       node11                   ACTIVE:0            
ora.diskmon
      1        OFFLINE OFFLINE                                                   
ora.drivers.acfs
      1        ONLINE  ONLINE       node11                                       
ora.evmd
      1        OFFLINE OFFLINE                                                   
ora.gipcd
      1        ONLINE  ONLINE       node11                                       
ora.gpnpd
      1        ONLINE  ONLINE       node11                                       
ora.mdnsd
      1        ONLINE  ONLINE       node11          


3.组重建原ocr和votedisk所在磁盘组:
注意:这里是在grid用户下
创建新的存放ocr和vote disk的磁盘组,磁盘组名和原有的一致(如果想改变位置,需修改/etc/oracle/ocr.loc文件)
为了操作方便,建议创建磁盘组和以前ocr所在异常的磁盘组一致





SQL> create diskgroup OCR_VOTE normal redundancy  disk '/dev/mapper/ocrnew1p1','/dev/mapper/ocrnew2p1','/dev/mapper/ocrnew3p1' attribute 'compatible.asm'='11.2.0.4.0', 'compatible.rdbms'='11.2.0.4.0';




Diskgroup created.


4.还原ocr
[root@node1 ~]# cd  /u01/app/11.2.0/grid/cdata/lxxhscan/                 
[root@node1 lxxhscan]# ll
total 58112
-rw------- 1 root root 7438336 Dec 11 09:08 backup00.ocr
-rw------- 1 root root 7438336 Dec 11 05:08 backup01.ocr
-rw------- 1 root root 7438336 Dec 11 01:08 backup02.ocr
-rw------- 1 root root 7438336 Dec 11 11:15 backup_20141211_111501.ocr
-rw------- 1 root root 7438336 Dec 11 09:08 day_.ocr
-rw------- 1 root root 7438336 Dec 10 09:08 day.ocr
-rw------- 1 root root 7438336 Dec  9 09:08 week_.ocr
-rw------- 1 root root 7438336 Nov 11 03:39 week.ocr
[root@node1 lxxhscan]# 
/u01/app/11.2.0/grid/bin/ocrconfig -restore  backup_20141211_111501.ocr
[root@node1 lxxhscan]# /u01/app/11.2.0/grid/bin/ocrconfig -restore  backup_20141211_111501.ocr


恢复表决盘的准备工作:
show parameter asm_diskstring


SQL> show parameter asm_diskstring;


NAME                    TYPE
------------------------------------ ----------------------
VALUE
------------------------------
asm_diskstring               string


如果asm_diskstring没有值,表示ASM磁盘用的是默认ASM磁盘搜索路径。
修改成实际的ASM磁盘搜索路径:(不然要报错)

[root@node1 lxxhscan]# /u01/app/11.2.0/grid/bin/crsctl replace votedisk  +OCR_VOTE
CRS-4602: Failed 27 to add voting file 0814f7a60de74f23bfb78c437c74110c.
CRS-4602: Failed 27 to add voting file 90a17219640c4f40bf25d416d99c58ce.
CRS-4602: Failed 27 to add voting file 7177431c8aaf4f9cbffeab86e069b130.
Failed to replace voting disk group with +OCR_VOTE.
CRS-4000: Command Replace failed, or completed with errors.



alter system set asm_diskstring='/dev/mapper/'
SQL> alter system set asm_diskstring='/dev/mapper/';



System altered.


SQL>  show parameter asm_diskstring;


NAME                    TYPE
------------------------------------ ----------------------
VALUE
------------------------------
asm_diskstring               string
/dev/mapper/
SQL>


6.处理votedisk 检查


/u01/app/11.2.0/grid/bin/crsctl replace votedisk  +OCR_VOTE
[root@node11 lxxhscan]# /u01/app/11.2.0/grid/bin/crsctl replace votedisk  +OCR_VOTE
Successful addition of voting disk 248e98950ba54f37bff0c1a143e8adf3.
Successful addition of voting disk 4b2eb63628424fd3bff050ad53c6a3de.
Successful addition of voting disk 594b9b8769744f78bf74ce3ad737cbd8.
Successfully replaced voting disk group with +OCR_VOTE.
CRS-4266: Voting file(s) successfully replaced
[root@node11 lxxhscan]# 


crsctl query css votedisk
[grid@node11 ~]$  crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   248e98950ba54f37bff0c1a143e8adf3 (/dev/mapper/ocrnew1p1) [OCR_VOTE]
 2. ONLINE   4b2eb63628424fd3bff050ad53c6a3de (/dev/mapper/ocrnew2p1) [OCR_VOTE]
 3. ONLINE   594b9b8769744f78bf74ce3ad737cbd8 (/dev/mapper/ocrnew3p1) [OCR_VOTE]
Located 3 voting disk(s).
[grid@node11 ~]$ 


7.创建asm spfile(原来是有的,重新创建没有。重新创建时备份spfile)----------------------

SQL> show parameter spfile; ---原来


NAME                                 TYPE
------------------------------------ ----------------------
VALUE
------------------------------
spfile                               string
+OCR_DATA/racscan/asmparameter
file/registry.253.861465185
SQL>

SQL> show parameter spfile;--没有

NAME                    TYPE
------------------------------------ ----------------------
VALUE
------------------------------
spfile                  string

SQL> 

create spfile='+OCR_VOTE'  FROM pfile='/home/grid/11';


8重启集群服务,检查是否已经恢复正常:
[root@node1 lxxhscan]# /u01/app/11.2.0/grid/bin/crsctl stop crs
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'node11'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'node11'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'node11'
CRS-2673: Attempting to stop 'ora.ctssd' on 'node11'
CRS-2673: Attempting to stop 'ora.asm' on 'node11'
CRS-2677: Stop of 'ora.mdnsd' on 'node11' succeeded
CRS-2677: Stop of 'ora.asm' on 'node11' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'node11'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'node11' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'node11' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'node11'
CRS-2677: Stop of 'ora.drivers.acfs' on 'node11' succeeded
CRS-2677: Stop of 'ora.cssd' on 'node11' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'node11'
CRS-2677: Stop of 'ora.gipcd' on 'node11' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'node11'
CRS-2677: Stop of 'ora.gpnpd' on 'node11' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'node11' has completed
CRS-4133: Oracle High Availability Services has been stopped.
[root@node1 lxxhscan]# 


/u01/app/11.2.0/grid/bin/crsctl start crs --时间有点长,耐心等待
[root@node1 ~]# /u01/app/11.2.0/grid/bin/crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
[root@node1 ~]# 


9.这里crs已经恢复正常,进一步检查ocr,votedisk,asm spfile情况
[root@node1 ~]# /u01/app/11.2.0/grid/bin/ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          3
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       3084
         Available space (kbytes) :     259036
         ID                       :  995290881
         Device/File Name         :  +OCR_VOTE
                                    Device/File integrity check succeeded


                                    Device/File not configured


                                    Device/File not configured


                                    Device/File not configured


                                    Device/File not configured


         Cluster registry integrity check succeeded


         Logical corruption check succeeded


[root@node1 ~]# 
Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options


SQL> show parameter spfile; 


[grid@node1 ~]$ crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   248e98950ba54f37bff0c1a143e8adf3 (/dev/mapper/ocrnew1p1) [OCR_VOTE]
 2. ONLINE   4b2eb63628424fd3bff050ad53c6a3de (/dev/mapper/ocrnew2p1) [OCR_VOTE]
 3. ONLINE   594b9b8769744f78bf74ce3ad737cbd8 (/dev/mapper/ocrnew3p1) [OCR_VOTE]
Located 3 voting disk(s).
[grid@node1 ~]$ 

CVU验证所有RAC节点OCR的完整性
$ cluvfy comp ocr -n all -verbose

[grid@node1 ~]$ cluvfy comp ocr -n all -verbose
验证 OCR 完整性 
正在检查 OCR 完整性...
正在检查是否缺少非集群配置...
所有节点都没有非集群的, 仅限本地的配置
“ASM 运行”检查通过。ASM 正在所有指定节点上运行
正在检查 OCR 配置文件 "/etc/oracle/ocr.loc"...
OCR 配置文件 "/etc/oracle/ocr.loc" 检查成功
ocr 位置 "+OCR_VOTE" 的磁盘组在所有节点上都可用
NOTE: 
此检查不验证 OCR 内容的完整性。请以授权用户的身份执行 'ocrcheck' 以验证 OCR 的内容。
OCR 完整性检查已通过
OCR 完整性 的验证成功。
[grid@node1 ~]$ asmcmd lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512   4096  1048576   1048570  1011974                0         1011974              0             N  ARCHLOG/
MOUNTED  EXTERN  N         512   4096  1048576   1048570  1005645                0         1005645              0             N  DATA/
MOUNTED  EXTERN  N         512   4096  1048576    511993   511887                0          511887              0             N  FRA/
MOUNTED  NORMAL  N         512   4096  1048576      6141     5341             2047            1647              0             Y  OCR_VOTE/
[grid@node11 ~]$ 

[grid@node1 ~]$ crs_stat -t
Name           Type           Target    State     Host        
------------------------------------------------------------
ora.ARCHLOG.dg ora....up.type ONLINE    ONLINE    node11      
ora.DATA.dg    ora....up.type ONLINE    ONLINE    node11      
ora.FRA.dg     ora....up.type ONLINE    ONLINE    node11      
ora....ER.lsnr ora....er.type ONLINE    ONLINE    node11      
ora....N1.lsnr ora....er.type ONLINE    ONLINE    node11      
ora....VOTE.dg ora....up.type ONLINE    ONLINE    node11      
ora.asm        ora.asm.type   ONLINE    ONLINE    node11      
ora.cvu        ora.cvu.type   ONLINE    ONLINE    node11      
ora.gnnt.db    ora....se.type ONLINE    ONLINE    node11      
ora.gsd        ora.gsd.type   OFFLINE   OFFLINE               
ora....network ora....rk.type ONLINE    ONLINE    node11      
ora....SM1.asm application    ONLINE    ONLINE    node11      
ora....11.lsnr application    ONLINE    ONLINE    node11      
ora.node11.gsd application    OFFLINE   OFFLINE               
ora.node11.ons application    ONLINE    ONLINE    node11      
ora.node11.vip ora....t1.type ONLINE    ONLINE    node11      
ora.node12.vip ora....t1.type ONLINE    ONLINE    node11      
ora.oc4j       ora.oc4j.type  ONLINE    ONLINE    node11      
ora.ons        ora.ons.type   ONLINE    ONLINE    node11      
ora....ry.acfs ora....fs.type ONLINE    ONLINE    node11      
ora.scan1.vip  ora....ip.type ONLINE    ONLINE    node11      
[grid@node1 ~]$ 








10.到此为止表决盘的恢复正常完成了。



你可能感兴趣的:(Oracle,votedisk,ocr,表决盘,ocr恢复,表决盘,ocr损坏,表决盘,ocr)