10G rac 因为ocr原因导致crs不能启动的排查二例

近期遇到过两次RAC节点的主机后记 后CRS不能启动的情况。

案例1:LINUX+10.2.0.5RAC平台,OCR对应的裸设备权限在重启后不正确,因为设置裸设备权限的脚本设置有误。

案例2:主机版本为HP-UX B.11.31,使用的是的HP-UX Service Guard集群件,小机宕机重启后VG未挂载导致OCR所在磁盘无法访问。

记录如下:

案例1:

LINUX+10.2.0.5RAC平台,OCR对应的裸设备权限在重启后不正确,因为设置裸设备权限的脚本设置有误。

情况如下:

[root@rac02 ~]# ps -ef|grep css
root     16820     1  0 May25 ?        00:00:00 /bin/sh /etc/init.d/init.cssd fatal
root     16872 16818  0 May25 ?        00:01:48 /bin/sh /etc/init.d/init.cssd startcheck
root     16924 16820  0 May25 ?        00:01:38 /bin/sh /etc/init.d/init.cssd startcheck
root     17062 16823  0 May25 ?        00:01:50 /bin/sh /etc/init.d/init.cssd startcheck
root     17866 17636  0 19:32 pts/1    00:00:00 grep css

[root@rac02 ~]# tail /var/log/messages
Sep 11 19:33:04 rac02 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.16924.
Sep 11 19:33:04 rac02 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.17062.
Sep 11 19:33:04 rac02 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.16872.
Sep 11 19:34:04 rac02 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.16924.
Sep 11 19:34:04 rac02 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.17062.
Sep 11 19:34:04 rac02 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.16872.
[root@rac02 log]# cat /tmp/crsctl.17062
OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [Permission denied] [13]


[root@rac02 ~]# ls -al /dev/raw*
crw------- 1 root root 162, 0 May 25 01:46 /dev/rawctl

/dev/raw:
total 0
drwxr-xr-x  2 root   root         140 May 25 01:46 .
drwxr-xr-x 14 root   root        5860 May 25 01:46 ..
crw-------  1 root   root     162, 10 May 25 01:46 raw10
crw-------  1 oracle oinstall 162,  3 May 25 01:46 raw3
crw-------  1 oracle oinstall 162,  4 May 25 01:46 raw4
crw-------  1 oracle oinstall 162,  5 May 25 01:46 raw5
crw-------  1 root   root     162,  9 May 25 01:46 raw9


修改脚本使权限如下后正常:--注意脚本设置正确确保下次重启主机后权限仍正确 。
[root@rac02 ~]# ls -al /dev/raw*
crw------- 1 root root 162, 0 May 25 01:46 /dev/rawctl

/dev/raw:
total 0
drwxr-xr-x  2 root   root         140 May 25 01:46 .
drwxr-xr-x 14 root   root        5860 May 25 01:46 ..
crw-r-----  1 root   oinstall 162, 10 May 25 01:46 raw10
crw-r--r--  1 oracle oinstall 162,  3 May 25 01:46 raw3
crw-r--r--  1 oracle oinstall 162,  4 May 25 01:46 raw4
crw-r--r--  1 oracle oinstall 162,  5 May 25 01:46 raw5




案例2:

主机版本为HP-UX B.11.31,使用的是的HP-UX Service Guard集群件,小机宕机重启后VG未挂载导致OCR所在磁盘无法访问。

故障分析:

rac#[/etc]ps -ef|grep crs
    root  2249     1  0  Nov  5  ?         0:00 /bin/sh /sbin/init.d/init.crsd run
    root 29242 26214  0 16:12:54 pts/0     0:00 grep crs
rac#[/etc]ps -ef|grep init
    root     1     0  0  Nov  5  ?         0:01 init
    root    23     0  0  Nov  5  ?         0:00 pagetable_init_daemon
    root 29368 26214  0 16:15:29 pts/0     0:00 grep init
    root  2247     1  0  Nov  5  ?         0:00 /bin/sh /sbin/init.d/init.evmd run
    root  2248     1  0  Nov  5  ?         0:00 /bin/sh /sbin/init.d/init.cssd fatal
    root  2249     1  0  Nov  5  ?         0:00 /bin/sh /sbin/init.d/init.crsd run
    root  2281  2248  0  Nov  5  ?         0:08 /bin/sh /sbin/init.d/init.cssd startcheck
    root  2274  2249  0  Nov  5  ?         0:08 /bin/sh /sbin/init.d/init.cssd startcheck
root  2284  2247  0  Nov  5  ?         0:08 /bin/sh /sbin/init.d/init.cssd startcheck

rac$[/tmp]ls -lrt crsctl*
-rw-rw-rw-   1 oracle     dba            155 Nov  9 15:35 crsctl.2274
-rw-rw-rw-   1 oracle     dba            155 Nov  9 15:35 crsctl.2281
-rw-rw-rw-   1 oracle     dba            155 Nov  9 15:35 crsctl.2284
rac$[/tmp]cat crsctl.2284
OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [No such device or address] [6]
rac$[/tmp]cat crsctl.2281
OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [No such device or address] [6]
rac$[/tmp]cat  crsctl.2274
OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [No such device or address] [6]
检查OCR信息
nbrbdb2$[/home/oracle]ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     130852
         Used space (kbytes)      :       3312
         Available space (kbytes) :     127540
         ID                       :  245644703
         Device/File Name         : /dev/vgora/rocr0
                                    Device/File integrity check succeeded

                                    Device/File not configured

         Cluster registry integrity check succeeded

nbrbdb2$[/home/oracle]ls -al /dev/vgora/rocr0
crw-r-----   1 oracle     dba         64 0x020001 Jun 14  2013 /dev/vgora/rocr0


查看节点1上的信息:
rac$[/oracle/product/10.2.0/crs_1/log/rac/cssd]ls -al /dev/vgora/rocr0
crw-r-----   1 oracle     dba         64 0x020001 Sep 28  2012 /dev/vgora/rocr0

rac#[/]vgdisplay
--- Volume groups ---
VG Name                     /dev/vg00
VG Write Access             read/write     
VG Status                   available                 
Max LV                      255    
Cur LV                      10     
Open LV                     10     
Max PV                      16     
Cur PV                      1      
Act PV                      1      
Max PE per PV               4353         
VGDA                        2   
PE Size (Mbytes)            32              
Total PE                    4343    
Alloc PE                    4073    
Free PE                     270     
Total PVG                   0        
Total Spare PVs             0              
Total Spare PVs in use      0                     

vgdisplay: Volume group not activated.
vgdisplay: Cannot display volume group "/dev/vglog".
vgdisplay: Volume group not activated.
vgdisplay: Cannot display volume group "/dev/vglock".
vgdisplay: Volume group not activated.
vgdisplay: Cannot display volume group "/dev/vgora".

故障解决:
从以上信息可以看到VG未激活,导致OCR不可读写。
使用如下命令激活VG后CRS恢复正常:
#[/]vgchange -a s vgora
#[/]vgchange -a s vglog





你可能感兴趣的:(crs无法启动,ocr权限)