由于丢失OLR导致的节点无法启动

环境:RHEL6.5+11.2.0.4 RAC,两节点

问题描述:故意把OLR删掉,重启后发现GI无法启动

分析过程:

1.确认GI启动到了哪一个阶段

[grid@rac1 ~]$ crsctl status resource -t -init
CRS-4639: Could not contact Oracle High Availability Services
CRS-4000: Command Status failed, or completed with errors.

解析:发现连OHASD都没有启动,两种可能:1是init.ohasd脚本没有被调用 2是ohasd.bin守护进程没有启动成功,那么:
[grid@rac1 ~]$ ps -ef | grep ohas |grep -v grep
root       960     1  0 09:23 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run

发现,脚本被调用了,但是守护进程没有成功启动。

2.查看ohasd的日志

2016-04-18 12:26:25.918: [ default][1661986592] OHASD Daemon Starting. Command string :restart
2016-04-18 12:26:25.919: [ default][1661986592] Initializing OLR
2016-04-18 12:26:25.919: [  OCROSD][1661986592]utopen:6m': failed in stat OCR file/disk /u01/app/11.2.0.1/grid/cdata/rac1.olr, errno=2, os err string=No such file or directory
2016-04-18 12:26:25.919: [  OCROSD][1661986592]utopen:7: failed to open any OCR file/disk, errno=2, os err string=No such file or directory
2016-04-18 12:26:25.919: [  OCRRAW][1661986592]proprinit: Could not open raw device
2016-04-18 12:26:25.919: [  OCRAPI][1661986592]a_init:16!: Backend init unsuccessful : [26]
2016-04-18 12:26:25.920: [  CRSOCR][1661986592] OCR context init failure.  Error: PROCL-26: Error while accessing the physical storage Operating System error [No such file or directory] [2]
2016-04-18 12:26:25.920: [ default][1661986592] Created alert : (:OHAS00106:) :  OLR initialization failed, error: PROCL-26: Error while accessing the physical storage Operating System error [No such file or directory] [2]
2016-04-18 12:26:25.920: [ default][1661986592][PANIC] OHASD exiting; Could not init OLR
2016-04-18 12:26:25.920: [ default][1661986592] Done.

解析:看报错是OLR打不开,那就过去看看存在不(手动删的,怎么可能存在)

[grid@rac1 cdata]$ ll
total 12
drwxrwxr-x 2 grid oinstall 4096 Apr 18 07:51 liming-cluster
drwxr-xr-x 2 grid oinstall 4096 Apr 18 07:49 localhost
drwxr-xr-x 2 grid oinstall 4096 Apr 18 08:11 rac1

OLR不存在了。

3.查看OLR的备份是否存在

[grid@rac1 rac1]$ ll
total 6644
-rw------- 1 root root 6803456 Apr 18 08:11 backup_20160418_081108.olr

可以的。

4.恢复OLR

<span style="font-size:18px;">[root@rac1 bin]# ./ocrconfig -local -restore /u01/app/11.2.0.1/grid/cdata/rac1/backup_20160418_081108.olr 
PROTL-35: The configured OLR location is not accessible.

书中没写的步骤来了!
[grid@rac1 cdata]$ touch rac1.olr
[root@rac1 bin]# ./ocrconfig -local -restore /u01/app/11.2.0.1/grid/cdata/rac1/backup_20160418_081108.olr 
[root@rac1 bin]# 
[grid@rac1 cdata]$ ll
total 6660
drwxrwxr-x 2 grid oinstall      4096 Apr 18 07:51 liming-cluster
drwxr-xr-x 2 grid oinstall      4096 Apr 18 07:49 localhost
drwxr-xr-x 2 grid oinstall      4096 Apr 18 08:11 rac1
-rw-r--r-- 1 grid oinstall 272756736 Apr 18 13:02 rac1.olr

</span>


5.启动GI,恢复正常

<span style="font-size:18px;">[root@rac1 bin]# ./crsctl start crs </span>








你可能感兴趣的:(由于丢失OLR导致的节点无法启动)