OLR是保存在本地的集群注册表。
OLR主要作用就是为ohasd守护进程提供集群的配置信息和初始化资源的定义信息。
当集群启动时会从/etc/oracle/olr.loc
中读取OLR的位置,OLR默认保存在
[root@rac1 grid]# cat /etc/oracle/olr.loc
olrconfig_loc=/u01/11.2.0/grid/cdata/rac1.olr
crs_home=/u01/11.2.0/grid
[root@rac1 grid]# ls -l /u01/11.2.0/grid/cdata/rac1.olr
-rw------- 1 root oinstall 272756736 Jan 15 12:57 /u01/11.2.0/grid/cdata/rac1.olr
OLR不会自动备份,如果集群配置信息发生改变后,需要手动进行备份
[root@rac1 grid]# ocrconfig -local -manualbackup
rac1 2019/01/15 13:07:02 /u01/11.2.0/grid/cdata/rac1/backup_20190115_130702.olr
rac1 2015/03/18 16:09:38 /u01/11.2.0/grid/cdata/rac1/backup_20150318_160938.olr
当OLR丢失后,可以使用命令ocrconfig -local -restore
来恢复。
#手动删除olr
[root@rac1 ~]# rm -rf /u01/11.2.0/grid/cdata/rac1.olr
#启动节点1
[root@rac1 ~]# crsctl start crs
输入crsctl stat res -t -init
命令,进行查看:
[root@rac1 ~]# crsctl stat res -t -init
CRS-4639: Could not contact Oracle High Availability Services
CRS-4000: Command Status failed, or completed with errors.
由以上程序可以看出ohasd层面没有启动,有可能是/etc/inittab中启动集群的init.ohasd脚本没有调用,或者是ohasd.bin守护进程没有启动成功。因此需要进一步验证:
[root@rac1 ~]# ps -ef|grep has
root 2747 1 0 05:42 ? 00:00:01 /bin/sh /etc/init.d/init.ohasd run
root 24733 2747 0 13:32 ? 00:00:00 /u01/11.2.0/grid/perl/bin/perl -I/u01/11.2.0/grid/perl/lib /u01/11.2.0/grid/bin/crswrapexece.pl /u01/11.2.0/grid/crs/install/s_crsconfig_rac1_env.txt /u01/11.2.0/grid/bin/ohasd.bin restart
root 24738 2747 0 13:32 ? 00:00:00 /u01/11.2.0/grid/perl/bin/perl -I/u01/11.2.0/grid/perl/lib /u01/11.2.0/grid/bin/crswrapexece.pl /u01/11.2.0/grid/crs/install/s_crsconfig_rac1_env.txt /u01/11.2.0/grid/bin/ohasd.bin restart
root 24742 14550 0 13:32 pts/2 00:00:00 grep has
根据上面的输出可推出init.ohasd脚本已经调用了,而且ohasd.bin守护进程也已经被启动,那么问题在于ohasd没有被成功启动。因此,需查看ohasd日志文件进行分析:
[grid@rac1 ohasd]$ tail -100f /u01/11.2.0/grid/log/rac1/ohasd/ohasd.log
...
2019-01-15 13:36:45.723: [ default][651043328] OHASD Daemon Starting. Command string :restart
2019-01-15 13:36:45.724: [ default][651043328] Initializing OLR
2019-01-15 13:36:45.725: [ OCROSD][651043328]utopen:6m':failed in stat OCR file/disk /u01/11.2.0/grid/cdata/rac1.olr, errno=2, os err string=No such file or directory
2019-01-15 13:36:45.725: [ OCROSD][651043328]utopen:7:failed to open any OCR file/disk, errno=2, os err string=No such file or directory
2019-01-15 13:36:45.725: [ OCRRAW][651043328]proprinit: Could not open raw device
2019-01-15 13:36:45.725: [ OCRAPI][651043328]a_init:16!: Backend init unsuccessful : [26]
2019-01-15 13:36:45.726: [ CRSOCR][651043328] OCR context init failure. Error: PROCL-26: Error while accessing the physical storage Operating System error [No such file or directory] [2]
2019-01-15 13:36:45.726: [ default][651043328] OLR initalization failured, rc=26
2019-01-15 13:36:45.726: [ default][651043328]Created alert : (:OHAS00106:) : Failed to initialize Oracle Local Registry
2019-01-15 13:36:45.726: [ default][651043328][PANIC] OHASD exiting; Could not init OLR
2019-01-15 13:36:45.726: [ default][651043328] Done.
...
根据日志文件信息提示,/u01/11.2.0/grid/cdata/rac1.olr
丢失,导致集群不能启动,验证/u01/11.2.0/grid/cdata/rac1.olr
文件是否真丢失
[grid@rac1 ohasd]$ ls -l /u01/11.2.0/grid/cdata/rac1.olr
ls: /u01/11.2.0/grid/cdata/rac1.olr: No such file or directory
OLR备份文件存放在
[grid@rac1 ohasd]$ cd /u01/11.2.0/grid/cdata/rac1/
[grid@rac1 rac1]$ ls -l
total 38236
-rw------- 1 root root 6356992 Mar 18 2015 backup_20150318_160938.olr
-rw------- 1 root root 6385664 Jan 2 14:55 backup_20190102_145536.olr
-rw------- 1 root root 6553600 Jan 15 10:50 backup_20190115_105011.olr
-rw------- 1 root root 6574080 Jan 15 10:50 backup_20190115_105031.olr
-rw------- 1 root root 6594560 Jan 15 13:07 backup_20190115_130702.olr
-rw------- 1 root root 6615040 Jan 15 13:22 backup_20190115_132243.olr
恢复OLR
[root@rac1 rac1]# ocrconfig -local -restore /u01/11.2.0/grid/cdata/rac1/backup_20190115_132243.olr
#验证olr是否恢复
[root@rac1 rac1]# ls -l /u01/11.2.0/grid/cdata/rac1.olr
ls: /u01/11.2.0/grid/cdata/rac1.olr: No such file or directory
#并没有直接恢复,经查询资料,在恢复olr之前,需手动touch olr文件
[root@rac1 rac1]# touch /u01/11.2.0/grid/cdata/rac1.olr
[root@rac1 rac1]# ocrconfig -local -restore /u01/11.2.0/grid/cdata/rac1/backup_20190115_132243.olr
PROTL-19: Cannot proceed while the Oracle High Availability Service is running
#kill掉ohasd启动进程
[root@rac1 rac1]# ps -ef|grep grid
root 12936 31479 0 12:55 pts/2 00:00:00 su - grid
grid 12939 12936 0 12:55 pts/2 00:00:00 -bash
root 14454 2747 4 13:47 ? 00:00:01 /u01/11.2.0/grid/bin/ohasd.bin restart
root 14517 1 0 13:47 ? 00:00:00 /u01/11.2.0/grid/bin/orarootagent.bin
root 14589 1 0 13:47 ? 00:00:00 /u01/11.2.0/grid/bin/cssdagent
root 14591 1 0 13:47 ? 00:00:00 /u01/11.2.0/grid/bin/cssdmonitor
root 14617 9726 0 13:47 pts/1 00:00:00 grep grid
root 14711 12483 0 13:10 pts/1 00:00:00 su - grid
grid 14712 14711 0 13:10 pts/1 00:00:00 -bash
root 31478 31442 0 09:01 pts/2 00:00:00 su - grid
grid 31479 31478 0 09:01 pts/2 00:00:00 -bash
[root@rac1 rac1]# kill -9 2747
[root@rac1 rac1]# ocrconfig -local -restore /u01/11.2.0/grid/cdata/rac1/backup_20190115_132243.olr
#能正常启动节点
[root@rac1 ~]# crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
《RAC核心技术详解》