环境:VirtualBox+RHEL6.5+11gR2
今天重启了虚拟机,然后就只能节点1(rac1)启动,节点2(rac2)无法启动,由于是RAC小白,所以有点凌乱:
[oracle@rac2 ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Wed Sep 9 13:42:54 2015
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup
ORA-01078: failure in processing system parameters
ORA-01565: error in identifying file '+MYSHARE/orcl/spfileorcl.ora'
ORA-17503: ksfdopn:2 Failed to open file +MYSHARE/orcl/spfileorcl.ora
ORA-15077: could not locate ASM instance serving a required diskgroup
看ASM实例起来了没有
[oracle@rac2 ~]$ srvctl status asm -n rac2
PRCR-1070 : Failed to check if resource ora.asm is registered
Cannot communicate with crsd
[grid@rac2 ~]$ ocrcheck
PROT-602: Failed to retrieve data from the cluster registry
PROC-26: Error while accessing the physical storage
ORA-29701: unable to connect to Cluster Synchronization Service
网上查了一些,都很乱,然后找群友,群友表示看日志,然后我就找日志:
查看grid用户下$ORACLE_HOME/log/alterrac2.log和$ORACLE_HOME/log/crsd/下的日志:
[crsd(6801)]CRS-0805:Cluster Ready Service aborted due to failure to communicate with Cluster Synchronization Service with error [3]. Details at (:CRSD00109:) in /u01/app/11.2.0/grid/log/rac2/crsd/crsd.log.
2015-09-10 02:28:04.221: [ CSSD][3984840448]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0
2015-09-10 02:28:04.821: [ CSSD][3999033088]clssnmvDHBValidateNcopy: node 1, rac1, has a disk HB, but no network HB, DHB has rcfg 337257663, wrtcnt, 559657, LATS 42751044, lastSeqNo 559654, uniqueness 1441762827, timestamp 1441823278/59749684
2015-09-10 02:28:04.888: [ CSSD][3994302208]clssnmvDHBValidateNcopy: node 1, rac1, has a disk HB, but no network HB, DHB has rcfg 337257663, wrtcnt, 559659, LATS 42751114, lastSeqNo 559656, uniqueness 1441762827, timestamp 1441823278/59749694
2015-09-10 02:28:05.223: [ CSSD][3984840448]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0
2015-09-10 02:28:05.825: [ CSSD][3999033088]clssnmvDHBValidateNcopy: node 1, rac1, has a disk HB, but no network HB, DHB has rcfg 337257663, wrtcnt, 559660, LATS 42752054, lastSeqNo 559657, uniqueness 1441762827, timestamp 1441823279/59750684
2015-09-10 02:28:05.891: [ CSSD][3994302208]clssnmvDHBValidateNcopy: node 1, rac1, has a disk HB, but no network HB, DHB has rcfg 337257663, wrtcnt, 559662, LATS 42752114, lastSeqNo 559659, uniqueness 1441762827, timestamp 1441823279/59750694
我灵机一动,HB会不会是HB=heart beat?心跳?priv-ip?赶紧查:
[root@rac1 bin]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
186.168.100.18 rac1
186.168.100.19 rac2
186.168.100.21 rac1-vip
186.168.100.22 rac2-vip
11.168.100.11 rac1-priv
11.168.100.13 rac2-priv
186.168.100.16 scan
186.168.100.17 scan
[root@rac1 bin]# ping 11.168.100.13
PING 11.168.100.13 (11.168.100.13) 56(84) bytes of data.
From 11.168.100.11 icmp_seq=1 Destination Host Unreachable
From 11.168.100.11 icmp_seq=2 Destination Host Unreachable
From 11.168.100.11 icmp_seq=3 Destination Host Unreachable
^C
--- 11.168.100.13 ping statistics ---
4 packets transmitted, 0 received, +3 errors, 100% packet loss, time 3500ms
pipe 3
[root@rac1 bin]#
初步判定:心跳网络问题,搞通了之后,重新测试。
心碎了,原来是网络不通,是VirtualBox的网卡设置出了问题,赶紧修改,然后调通priv-ip。
[oracle@rac2 ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Thu Sep 10 03:48:56 2015
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SQL>
成功。
总结:这个问题是心跳网络不通了。看查的结果,有很多错误都会报差不多的错,所以具体问题具体分析,只是提供其中一种的原因。