RAC ocr和votingdisk文件损坏恢复

rac+asm ocr文件和votingdisk文件损坏恢复
1、ocr自动备份与恢复方法
1) 自动备份
oracle会每小时备份一次ocr,自动备份的默认路径是:$CRS_HOME/cdata/$CRS_NAME/下,可以用ocrconfig查看自动备份的信息,如:
[oracle@rac2 ~]$ cd  $CRS_HOME/cdata/$CRS_NAME
[oracle@rac2 cdata]$ pwd
/oracle/ora10g/product/crs/cdata

[oracle@rac2 cdata]$ ocrconfig -showbackup

rac1     2011/12/05 15:04:50     /oracle/ora10g/product/crs/cdata/crs

rac1     2011/12/05 11:04:50     /oracle/ora10g/product/crs/cdata/crs

rac1     2011/12/05 07:04:49     /oracle/ora10g/product/crs/cdata/crs

rac1     2011/12/04 03:04:41     /oracle/ora10g/product/crs/cdata/crs

rac2     2011/11/25 22:17:13     /oracle/ora10g/product/crs/cdata/crs

自动备份只会在一个节点执行。如果用于备份的节点出现异常,则oracle会自动切换到其他节点进行备份。
默认情况下,oracle会保留最近5份ocr备份: 3份最近的、一份昨天和一份上周的

orc 文件在/dev/raw/raw1,我们模拟破坏改文件,然后恢复。
[oracle@rac2 ~]$ dd if=/dev/zero of=/dev/raw/raw1 bs=4M
[oracle@rac2 ~]$ crs_stop
Segmentation fault
[oracle@rac2 ~]$ crs_start -t
Segmentation fault
[oracle@rac1 ~]$ sqlplus / as sysdba
SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORA-03113: end-of-file on communication channel
看看数据库起不来了
在看一个错误
[oracle@rac1 bdump]$ export ORACLE_SID=+ASM1
[oracle@rac1 bdump]$ sqlplus / as sysdba
SQL> shutdown immediate;
ASM diskgroups dismounted
ORA-03113: end-of-file on communication channel
SQL> startup;
ORA-24324: service handle not initialized
ORA-01041: internal error. hostdef extension doesn''t exist
看来数据库是崩掉了
再看看 crs报什么错误
[oracle@rac1 crsd]$ tail -n 20 /oracle/ora10g/product/crs/log/rac1/crsd/ crsd.log
2011-12-05 15:14:07.047: [  OCRRAW][1339816256]proprdc_proprhandle: propb_ctx->proprhandle_gid[249] is NULL
2011-12-05 15:14:07.047: [  OCRRAW][1339816256]proprdc_proprhandle: propb_ctx->proprhandle_gid[250] is NULL
2011-12-05 15:14:07.047: [  OCRRAW][1339816256]proprdc_proprhandle: propb_ctx->proprhandle_gid[251] is NULL
2011-12-05 15:14:07.047: [  OCRRAW][1339816256]proprdc_proprhandle: propb_ctx->proprhandle_gid[252] is NULL
2011-12-05 15:14:07.047: [  OCRRAW][1339816256]proprdc_proprhandle: propb_ctx->proprhandle_gid[253] is NULL
2011-12-05 15:14:07.047: [  OCRRAW][1339816256]proprdc_proprhandle: propb_ctx->proprhandle_gid[254] is NULL
2011-12-05 15:14:07.047: [  OCRRAW][1339816256]proprdc_proprhandle: propb_ctx->proprhandle_gid[255] is NULL
2011-12-05 15:14:07.047: [  OCRRAW][1339816256]proprdc_proprhandle: propb_ctx->proprhandle_num_cached_gid=3
2011-12-05 15:14:07.047: [  OCRRAW][1339816256]proprdc_proprhandle: propb_ctx->proprhandle_num_gid=3
2011-12-05 15:14:07.047: [  OCRRAW][1339816256]proprdc_proprhandle: propb_ctx->proprhandle_ocrctx=[0xef17f40]
2011-12-05 15:14:07.047: [  OCRRAW][1339816256]proprdc_proprhandle: propb_ctx->proprhandle_os_min_block_size=512
2011-12-05 15:14:07.047: [  OCRRAW][1339816256]proprdc_proprhandle: propb_ctx->proprhandle_islocal_only=0
2011-12-05 15:14:07.047: [  OCRRAW][1339816256]proprdc: end dumping backenctx->propb_ctx
2011-12-05 15:14:07.047: [  OCRRAW][1339816256]proprdc: backend_ctx->metactx=[0xee2c290]
2011-12-05 15:14:07.047: [  OCRRAW][1339816256]proprdc: backend_ctx->prop_sctx=[0xee247b0]
2011-12-05 15:14:07.047: [  OCRRAW][1339816256]proprdc: backend_ctx->prop_sltsmx=[0x0]
2011-12-05 15:14:07.047: [  OCRRAW][1339816256]proprdc: backend_ctx->prop_sclsctx=[0xee66d68]
2011-12-05 15:14:07.047: [  OCRRAW][1339816256]proprdc: backend_ctx->prop_ctx_ocrctx=[0xef17f40]
[  OCRAPI][1339816256]procr_ctx_set_invalid_no_abort: ctx set to invalid
[  OCRAPI][1339816256]procr_ctx_set_invalid: aborting...

2) 恢复
自动备份这种备份方式是物理备份,类似于数据库备份中的rman,需要用 restore方式还原,恢复步骤:
a) 用ocrconfig -showbackup命令查看自动备份文件的位置(在那个节点上,在那个路径下)
这段话是从 网上找来的,如果数据库崩溃以后,这个是不行的,所以我们必须知道orc文件备份的路径。
[oracle@rac2 crs]$ ocrconfig -showbackup
Segmentation fault
但是我们是知道orc备份文件的位置的
[oracle@rac2 ~]$ cd $CRS_HOME/cdata/
b) 用ocrdump 验证ocr信息,如:
以root执行:
[root@rac1 ~]# /oracle/ora10g/product/crs/bin/ocrdump - backupfile /oracle/ora10g/product/crs/cdata/crs/week_.ocr     
[root@rac1 ~]#
这个命令会在当前路径下产生一个文件:OCRDUMPFILE,查看这个文件即可。
[root@rac1 ~]# vim OCRDUMPFILE
12/06/2011 09:55:14
/oracle/ora10g/product/crs/cdata/crs/week_.ocr
/oracle/ora10g/product/crs/bin/ocrdump.bin -backupfile /oracle/ora10g/product/crs/cdata/crs/week_.ocr

[SYSTEM]
UNDEF :
SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_READ, OTHER_PERMISSION : PROCR_READ, USER_NAME : root, GROUP_NAME : root}

[oracle@rac2 crs]$ cd $CRS_HOME/cdata/crs
[oracle@rac2 crs]$ pwd
/oracle/ora10g/product/crs/cdata/crs
[oracle@rac2 crs]$ ll
total 23952
-rwxr-xr-x 1 oracle oinstall 4083712 Nov 26 10:17 backup00.ocr
-rwxr-xr-x 1 oracle oinstall 4083712 Nov 26 06:17 backup01.ocr
-rwxr-xr-x 1 oracle oinstall 4083712 Nov 26 02:17 backup02.ocr
-rwxr-xr-x 1 oracle oinstall 4083712 Nov 26 02:17 day_.ocr
-rwxr-xr-x 1 oracle oinstall 4083712 Nov 25 22:17 day.ocr
-rwxr-xr-x 1 oracle oinstall 4083712 Nov 25 22:17 week.ocr
我们就 选择恢复最近backup00.ocr这个

c) 在 所有节点停止crs(使用 root用户执行)
[oracle@rac2 crs]$ su -
Password:
[root@rac2 ~]# /etc/init.d/init.crs stop
Shutting down Oracle Cluster Ready Services (CRS):
Dec 05 15:30:53.944 | INF | daemon shutting down
/etc/init.d/init.cssd: line 802: 28638 Segmentation fault      $CRSCTL stop crs
Shutdown has begun. The daemons should exit soon.
[root@rac1 ~]# /etc/init.d/init.crs stop
Shutting down Oracle Cluster Ready Services (CRS):
Dec 05 15:30:42.340 | INF | daemon shutting down
/etc/init.d/init.cssd: line 802: 30025 Segmentation fault      $CRSCTL stop crs
Shutdown has begun. The daemons should exit soon.

d) 用户 ocrconfig还原
ocrconfig -restore file_name
--file_name为自动备份的ocr文件路径及名称
[root@rac2 ~]# cd /oracle/ora10g/product/crs/cdata/crs
[root@rac2 crs]# ocrconfig -restore backup00.ocr

e) 在所有节点启动crs
/etc/init.d/init.crs start

f) 可以用cluvfy 验证ocr是否正确
cluvfy comp ocr -n all [-verbose]

2、恢复votingdisk
1)对于voting disk可以通过dd来进行备份。
首先通过 crsctl query css votedisk 命令可以找到voting disk的存储地:
[oracle@rac1 ~]$ crsctl query css votedisk
 0.     0    /dev/raw/raw2

located 1 votedisk(s).

然后通过dd备份:
[oracle@rac1 crs]$ dd if=/dev/raw/raw2 of=voting_2011_12_06.bak
1955840+0 records in
1955840+0 records out
1001390080 bytes (1.0 GB) copied, 748.67 seconds, 1.3 MB/s
太慢了,加一个比较牛的参数吧
[oracle@rac1 crs]$ dd if=/dev/raw/raw2 of=voting_2011_12_06_2.bak bs=4M
238+1 records in
238+1 records out
1001390080 bytes (1.0 GB) copied, 12.3552 seconds, 81.1 MB/s

当需要恢复时,通过dd进行恢复即可。这个voting disk记录了什么内容呢?
通过strings可以将其中的字符串取出来看一个印象:
[oracle@rac1 crs]$ strings voting_2011_12_06_2.bak|sort -u
etoV
fSLC
rac1
rac2
ssLckcoT
SslcLlik
sSlcrEp0
}|{z

2)让我们来还原吧

先破坏:[oracle@rac1 crs]$ dd if=/dev/zeroof=/dev/raw/raw2 bs=4M

这里顺序不要搞错

看了半天,还是不动,原来所有数据库服务器全部重启了。
让我们进到每台服务器去看看的
[oracle@rac1 ~]$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.4.0 - Production on Tue Dec 6 10:14:08 2011

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.

Connected to an idle instance.

SQL> startup;
ORA-10997: another startup/shutdown operation of this instance inprogress
ORA-09968: unable to lock file
Linux-x86_64 Error: 11: Resource temporarily unavailable
Additional information: 8529
SQL>
[root@rac2 ~]# su - oracle
[oracle@rac2 ~]$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.4.0 - Production on Tue Dec 6 10:13:44 2011

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options

SQL> desc user_tables;
ERROR:
ORA-03135: connection lost contact

报错了,怎么办呢?很简单的,幸亏有备份,让我们用备份来还原
[oracle@rac1 ~]$ cd /oracle/ora10g/product/crs/cdata/crs/
[oracle@rac1 crs]$ ls
backup00.ocr  backup02.ocr  day.ocr                  voting_2011_12_06.bak
backup01.ocr  day_.ocr      voting_2011_12_06_2.bak  week_.ocr

[oracle@rac1 crs]$ dd if=voting_2011_12_06_2.bak of=/dev/raw/raw2 bs=4M
238+1 records in
238+1 records out
1001390080 bytes (1.0 GB) copied, 15.3953 seconds, 65.0 MB/s
重启所有的服务
[root@rac1 ~]# crs_stop -all
[root@rac1 ~]# crs_start -all
一般到这里就结束了,我的因为是控制文件放在两个不同的asm_diskgroup中,有时候会产生控制文件不一致的情况,下面我们来解决这个问题。
[root@rac1 ~]# crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora.jscn.db    application    ONLINE    OFFLINE
ora....n1.inst application    ONLINE    OFFLINE
ora....n2.inst application    ONLINE    OFFLINE
ora....SM1.asm application    ONLINE    ONLINE    rac1
ora....C1.lsnr application    ONLINE    ONLINE    rac1
ora.rac1.gsd   application    ONLINE    ONLINE    rac1
ora.rac1.ons   application    ONLINE    ONLINE    rac1
ora.rac1.vip   application    ONLINE    ONLINE    rac1
ora....SM2.asm application    ONLINE    ONLINE    rac2
ora....C2.lsnr application    ONLINE    ONLINE    rac2
ora.rac2.gsd   application    ONLINE    ONLINE    rac2
ora.rac2.ons   application    ONLINE    ONLINE    rac2
ora.rac2.vip   application    ONLINE    ONLINE    rac2
还是有服务没有启动,难道要手动启动?
控制文件又不一致了,好烦躁。
[oracle@rac1 ~]$ rman target /

Recovery Manager: Release 10.2.0.4.0 - Production on Tue Dec 6 10:23:22 2011

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

connected to target database: jscn (not mounted)

RMAN> restore controlfile to '+RECOVERY/JSCN/CONTROLFILE/control02.ctl'  from  '+DATA/JSCN/controlfile/Current.263.768517111';

Starting restore at 06-DEC-11
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=1620 instance=jscn1 devtype=DISK

channel ORA_DISK_1: copied control file copy
Finished restore at 06-DEC-11


RMAN> shutdown immediate;

Oracle instance shut down

RMAN> startup;

connected to target database (not started)
Oracle instance started
database mounted
database opened

Total System Global Area    1610612736 bytes

Fixed Size                     2084296 bytes
Variable Size                436208184 bytes
Database Buffers            1157627904 bytes
Redo Buffers                  14692352 bytes

RMAN> exit


Recovery Manager complete.
[oracle@rac1 ~]$ ssh rac2
Last login: Mon Dec  5 19:13:34 2011 from rac1
[oracle@rac2 ~]$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.4.0 - Production on Tue Dec 6 10:27:06 2011

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.

Connected to an idle instance.

SQL> startup;
ORACLE instance started.

Total System Global Area 1610612736 bytes
Fixed Size                  2084296 bytes
Variable Size             385876536 bytes
Database Buffers         1207959552 bytes
Redo Buffers               14692352 bytes
Database mounted.
Database opened.
也启动了,恢复结束,让我们再看看吧
[oracle@rac2 ~]$ crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora.jscn.db    application    ONLINE    ONLINE    rac2
ora....n1.inst application    ONLINE    ONLINE    rac1
ora....n2.inst application    ONLINE    ONLINE    rac2
ora....SM1.asm application    ONLINE    ONLINE    rac1
ora....C1.lsnr application    ONLINE    ONLINE    rac1
ora.rac1.gsd   application    ONLINE    ONLINE    rac1
ora.rac1.ons   application    ONLINE    ONLINE    rac1
ora.rac1.vip   application    ONLINE    ONLINE    rac1
ora....SM2.asm application    ONLINE    ONLINE    rac2
ora....C2.lsnr application    ONLINE    ONLINE    rac2
ora.rac2.gsd   application    ONLINE    ONLINE    rac2
ora.rac2.ons   application    ONLINE    ONLINE    rac2
ora.rac2.vip   application    ONLINE    ONLINE    rac2
[oracle@rac2 ~]$


你可能感兴趣的:(RAC ocr和votingdisk文件损坏恢复)