近期遇到过两次RAC节点的主机后记 后CRS不能启动的情况。
案例1:LINUX+10.2.0.5RAC平台,OCR对应的裸设备权限在重启后不正确,因为设置裸设备权限的脚本设置有误。
案例2:主机版本为HP-UX B.11.31,使用的是的HP-UX Service Guard集群件,小机宕机重启后VG未挂载导致OCR所在磁盘无法访问。
记录如下:LINUX+10.2.0.5RAC平台,OCR对应的裸设备权限在重启后不正确,因为设置裸设备权限的脚本设置有误。
情况如下:
[root@rac02 ~]# ps -ef|grep css
root 16820 1 0 May25 ? 00:00:00 /bin/sh /etc/init.d/init.cssd fatal
root 16872 16818 0 May25 ? 00:01:48 /bin/sh /etc/init.d/init.cssd startcheck
root 16924 16820 0 May25 ? 00:01:38 /bin/sh /etc/init.d/init.cssd startcheck
root 17062 16823 0 May25 ? 00:01:50 /bin/sh /etc/init.d/init.cssd startcheck
root 17866 17636 0 19:32 pts/1 00:00:00 grep css
[root@rac02 ~]# tail /var/log/messages
Sep 11 19:33:04 rac02 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.16924.
Sep 11 19:33:04 rac02 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.17062.
Sep 11 19:33:04 rac02 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.16872.
Sep 11 19:34:04 rac02 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.16924.
Sep 11 19:34:04 rac02 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.17062.
Sep 11 19:34:04 rac02 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.16872.
[root@rac02 log]# cat /tmp/crsctl.17062
OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [Permission denied] [13]
[root@rac02 ~]# ls -al /dev/raw*
crw------- 1 root root 162, 0 May 25 01:46 /dev/rawctl
/dev/raw:
total 0
drwxr-xr-x 2 root root 140 May 25 01:46 .
drwxr-xr-x 14 root root 5860 May 25 01:46 ..
crw------- 1 root root 162, 10 May 25 01:46 raw10
crw------- 1 oracle oinstall 162, 3 May 25 01:46 raw3
crw------- 1 oracle oinstall 162, 4 May 25 01:46 raw4
crw------- 1 oracle oinstall 162, 5 May 25 01:46 raw5
crw------- 1 root root 162, 9 May 25 01:46 raw9
修改脚本使权限如下后正常:--注意脚本设置正确确保下次重启主机后权限仍正确 。
[root@rac02 ~]# ls -al /dev/raw*
crw------- 1 root root 162, 0 May 25 01:46 /dev/rawctl
/dev/raw:
total 0
drwxr-xr-x 2 root root 140 May 25 01:46 .
drwxr-xr-x 14 root root 5860 May 25 01:46 ..
crw-r----- 1 root oinstall 162, 10 May 25 01:46 raw10
crw-r--r-- 1 oracle oinstall 162, 3 May 25 01:46 raw3
crw-r--r-- 1 oracle oinstall 162, 4 May 25 01:46 raw4
crw-r--r-- 1 oracle oinstall 162, 5 May 25 01:46 raw5
主机版本为HP-UX B.11.31,使用的是的HP-UX Service Guard集群件,小机宕机重启后VG未挂载导致OCR所在磁盘无法访问。
故障分析:
rac#[/etc]ps -ef|grep crs
root 2249 1 0 Nov 5 ? 0:00 /bin/sh /sbin/init.d/init.crsd run
root 29242 26214 0 16:12:54 pts/0 0:00 grep crs
rac#[/etc]ps -ef|grep init
root 1 0 0 Nov 5 ? 0:01 init
root 23 0 0 Nov 5 ? 0:00 pagetable_init_daemon
root 29368 26214 0 16:15:29 pts/0 0:00 grep init
root 2247 1 0 Nov 5 ? 0:00 /bin/sh /sbin/init.d/init.evmd run
root 2248 1 0 Nov 5 ? 0:00 /bin/sh /sbin/init.d/init.cssd fatal
root 2249 1 0 Nov 5 ? 0:00 /bin/sh /sbin/init.d/init.crsd run
root 2281 2248 0 Nov 5 ? 0:08 /bin/sh /sbin/init.d/init.cssd startcheck
root 2274 2249 0 Nov 5 ? 0:08 /bin/sh /sbin/init.d/init.cssd startcheck
root 2284 2247 0 Nov 5 ? 0:08 /bin/sh /sbin/init.d/init.cssd startcheck
rac$[/tmp]ls -lrt crsctl*
-rw-rw-rw- 1 oracle dba 155 Nov 9 15:35 crsctl.2274
-rw-rw-rw- 1 oracle dba 155 Nov 9 15:35 crsctl.2281
-rw-rw-rw- 1 oracle dba 155 Nov 9 15:35 crsctl.2284
rac$[/tmp]cat crsctl.2284
OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [No such device or address] [6]
rac$[/tmp]cat crsctl.2281
OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [No such device or address] [6]
rac$[/tmp]cat crsctl.2274
OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [No such device or address] [6]
检查OCR信息
nbrbdb2$[/home/oracle]ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 2
Total space (kbytes) : 130852
Used space (kbytes) : 3312
Available space (kbytes) : 127540
ID : 245644703
Device/File Name : /dev/vgora/rocr0
Device/File integrity check succeeded
Device/File not configured
Cluster registry integrity check succeeded
nbrbdb2$[/home/oracle]ls -al /dev/vgora/rocr0
crw-r----- 1 oracle dba 64 0x020001 Jun 14 2013 /dev/vgora/rocr0
查看节点1上的信息:
rac$[/oracle/product/10.2.0/crs_1/log/rac/cssd]ls -al /dev/vgora/rocr0
crw-r----- 1 oracle dba 64 0x020001 Sep 28 2012 /dev/vgora/rocr0
rac#[/]vgdisplay
--- Volume groups ---
VG Name /dev/vg00
VG Write Access read/write
VG Status available
Max LV 255
Cur LV 10
Open LV 10
Max PV 16
Cur PV 1
Act PV 1
Max PE per PV 4353
VGDA 2
PE Size (Mbytes) 32
Total PE 4343
Alloc PE 4073
Free PE 270
Total PVG 0
Total Spare PVs 0
Total Spare PVs in use 0
vgdisplay: Volume group not activated.
vgdisplay: Cannot display volume group "/dev/vglog".
vgdisplay: Volume group not activated.
vgdisplay: Cannot display volume group "/dev/vglock".
vgdisplay: Volume group not activated.
vgdisplay: Cannot display volume group "/dev/vgora".
故障解决:
从以上信息可以看到VG未激活,导致OCR不可读写。
使用如下命令激活VG后CRS恢复正常:
#[/]vgchange -a s vgora
#[/]vgchange -a s vglog