ORA-15077 PROC-26 CRSD Fails During CRS Startup on 11gR2 [ID 1152583.1] |
||
|
||
|
Modified 04-AUG-2010 Type PROBLEM Status PUBLISHED |
|
In this Document
Symptoms
Changes
Cause
Solution
References
Oracle Server - Enterprise Edition - Version: 11.2.0.1 and later [Release: 11.2 and later ]
Information in this document applies to any platform.
2 node RAC, node 2 rebooted manually, after node restart and restart CRS, CRSD crashed with:
The OCR location +DG_DATA_01 is inaccessible
2010-06-27 09:58:56.869: [ OCRASM][4156924400]proprasmo: Error in open/create file in dg [DG_DATA_01]
[ OCRASM][4156924400]SLOS : SLOS: cat=7, opn=kgfoAl06, dep=15077, loc=kgfokge
ORA-15077: could not locate ASM instance serving a required diskgroup
2010-06-27 09:58:56.871: [ CRSOCR][4156924400] OCR context init failure. Error: PROC-26: Error while accessing the physical storage ASM error [SLOS: cat=7, opn=kgfoAl06, dep=15077, loc=kgfokge
ORA-15077: could not locate ASM instance serving a required diskgroup] [7]
2010-06-27 09:58:56.871: [ CRSD][4156924400][PANIC] CRSD exiting: Could not init OCR, code: 26
alertracnode2.log shows:
2010-06-27 09:45:04.759
[cssd(13087)]CRS-1713:CSSD daemon is started in clustered mode
2010-06-27 09:45:24.911
[cssd(13087)]CRS-1601:CSSD Reconfiguration complete. Active nodes are racnode1 racnode2 .
2010-06-27 09:45:43.399
[crsd(13556)]CRS-1201:CRSD started on node racnode2.
2010-06-27 09:58:43.026
[crsd(13556)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /opt/oracle/11.2.0/grid/log/racnode2/crsd/crsd.log.
2010-06-27 09:58:43.207
[/opt/oracle/11.2.0/grid/bin/oraagent.bin(14944)]CRS-5822:Agent '/opt/oracle/11.2.0/grid/bin/oraagent_oracle' disconnected from server. Details at (:CRSAGF00117:) in /opt/oracle/11.2.0/grid/log/racnode2/agent/crsd/oraagent_oracle/oraagent_oracle.log.
2010-06-27 09:58:43.465
[ohasd(12493)]CRS-2765:Resource 'ora.crsd' has failed on server 'racnode2'.
...
2010-06-27 09:59:02.943
[crsd(15055)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /opt/oracle/11.2.0/grid/log/racnode2/crsd/crsd.log.
2010-06-27 09:59:03.713
[ohasd(12493)]CRS-2765:Resource 'ora.crsd' has failed on server 'racnode2'.
2010-06-27 09:59:03.713
[ohasd(12493)]CRS-2771:Maximum restart attempts reached for resource 'ora.crsd'; will not restart.
reboot the node
This issue is caused by VIP address being already assigned in the network due to a wrong system configuration.
In the crsd.log, we can see:
2010-06-27 09:49:15.743: [UiServer][1519442240] Container [ Name: ORDER
MESSAGE:
TextMessage[CRS-2672: Attempting to start 'ora.racnode2.vip' on 'racnode2']
2010-06-27 09:49:35.827: [UiServer][1519442240] Container [ Name: ORDER
MESSAGE:
TextMessage[
CRS-5005: IP Address: 10.18.14.16 is already in use in the network]
2010-06-27 09:49:35.829: [UiServer][1519442240] Container [ Name: ORDER
MESSAGE:
TextMessage[CRS-2674: Start of 'ora.racnode2.vip' on 'racnode2' failed]
2010-06-27 09:51:32.746: [UiServer][1519442240] Container [ Name: ORDER
MESSAGE:
TextMessage[Attempting to stop `ora.asm` on member `racnode2`]
2010-06-27 09:58:44.543: [ CRSOCR][1147494896] OCR context init failure. Error: PROC-26: Error while accessing the physical storage ASM error [SLOS: cat=7, opn=kgfoAl06, dep=15077, loc=kgfokge
ORA-15077: could not locate ASM instance serving a required diskgroup] [7]
2010-06-27 09:58:44.543: [ CRSD][1147494896][PANIC] CRSD exiting: Could not init OCR, code: 26
2010-06-27 09:58:44.543: [ CRSD][1147494896] Done.
So ASM and OCR diskgroup were ONLINE, CRSD was starting resource, when it starts VIP, due to VIP address already used in network, it failed to startora.racnode2.vip, it then shutdown ASM, causing OCR device access failure and CRSD abort.
Checking network, we see:
/etc/hosts
# public node names
10.12.14.13 racnode1
10.12.14.14 racnode2
#Oracle RAC VIP
10.12.14.15 racnode1-vip
10.12.14.16 racnode2-vip
The ifconfig output from node 2 shows that the VIP address for racnode2 is permanently assigned to eth1:
eth1 Link encap:Ethernet HWaddr 00:22:64:F7:0C:E8
inet addr:
10.12.14.16 Bcast:10.12.14.255 Mask:255.255.248.0
inet6 addr: fe80::222:64ff:fef7:ce8/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2772 errors:0 dropped:0 overruns:0 frame:0
TX packets:119 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:203472 (198.7 KiB) TX bytes:22689 (22.1 KiB)
Interrupt:177 Memory:f4000000-f4012100
while it should have been bound to the public interface on node 1 (eth1:<n>) while CRS was down on node 2:
eth0 Link encap:Ethernet HWaddr 00:22:64:F7:0B:22
inet addr:
10.12.14.13 Bcast:10.12.14.255 Mask:255.255.248.0
inet6 addr: fe80::222:64ff:fef7:b22/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
eth0:1 Link encap:Ethernet HWaddr 00:22:64:F7:0B:22
inet addr:
10.12.14.15 Bcast:10.12.14.255 Mask:255.255.248.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:169 Memory:f2000000-f2012100
Modify network configuration at OS layer, eg:
/etc/sysconfig/network-scripts/ifcfg-eth*
script, remove the VIP IP from ifcfg-eth1 definition.
Restart network service, check ifconfig -a result, ensure VIP is not assigned to network interface before CRSD is started (unless it is failed over to the other node).
Restart CRSD on node 2.
NOTE:1050908.1 - How to Troubleshoot Grid Infrastructure Startup Issues