Applies to:
Oracle Server - Enterprise Edition - Version 11.2.0.2 and later
Information in this document applies to any platform.
Symptoms
Installing 11.2.0.2 Grid Infrastructure on 2 node RAC cluster with VLAN configured for underlying network, root.sh fails with:
......
Start of resource "ora.cluster_interconnect.haip" failed
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'db1'
CRS-5017: The resource action "ora.cluster_interconnect.haip start" encountered the following error:
Start action for HAIP aborted
CRS-2674: Start of 'ora.cluster_interconnect.haip' on 'db1' failed
CRS-2679: Attempting to clean 'ora.cluster_interconnect.haip' on 'db1'
CRS-2681: Clean of 'ora.cluster_interconnect.haip' on 'db1' succeeded
CRS-4000: Command Start failed, or completed with errors.
Failed to start Oracle Clusterware stack
Failed to start High Availability IP at /u01/app/11.2.0/grid/crs/install/crsconfig_lib.pm line 1043.
/u01/app/11.2.0/grid/perl/bin/perl -I/u01/app/11.2.0/grid/perl/lib -I/u01/app/11.2.0/grid/crs/install /u01/app/11.2.0/grid/crs/install/rootcrs.pl execution failed
This happens on both nodes.
Changes
New installation
Cause
The problem happens as IP address 10.1.15.254/24 is configured as default gateway for VLAN for private network on Cisco switch, causing HAIP retrieve MAC address e8:b7:48:e3:10:d4 associated with IP: 10.1.15.254/24 instead of the real MAC 00:10:3e:14:8e:19 associated with private network adapter 10.1.15.30 and HAIP startup fails with conflict MAC address.
orarootagent_root.log shows:
2011-09-20 01:34:29.591: [ USRTHRD][1099024704] {0:0:167} HAIP: initializing to 1 interfaces
2011-09-20 01:34:29.592: [ USRTHRD][1099024704] {0:0:167} HAIP: configured to use 1 interfaces
2011-09-20 01:34:29.595: [ USRTHRD][1099024704] {0:0:167} HAIP: Updating member info HAIP1;10.1.15.0#0
2011-09-20 01:34:29.595: [ USRTHRD][1099024704] {0:0:167} InitializeHaIps[ 0] infList 'inf eth1, ip 10.1.15.30, sub 10.1.15.0'
2011-09-20 01:34:29.596: [ USRTHRD][1099024704] {0:0:167} Error in getting Key SYSTEM.network.haip.group.cluster_interconnect.interface.valid in OCR
2011-09-20 01:34:29.598: [ CLSINET][1099024704] failed to open OLR HAIP subtype SYSTEM.network.haip.group.cluster_interconnect.interface.valid key, rc=4
2011-09-20 01:34:29.598: [ USRTHRD][1099024704] {0:0:167} HAIP reset on new modified startup, ipSize 0 != numInf 1
2011-09-20 01:34:29.598: [ USRTHRD][1099024704] {0:0:167} HAIP: starting inf 'eth1', suggestedIp '', assignedIp ''
2011-09-20 01:34:29.598: [ USRTHRD][1099024704] {0:0:167} Thread:[NetHAWork]start {
2011-09-20 01:34:29.598: [ USRTHRD][1099024704] {0:0:167} Thread:[NetHAWork]start }
2011-09-20 01:34:29.598: [ USRTHRD][1119660352] {0:0:167} [NetHAWork] thread started
2011-09-20 01:34:29.598: [ USRTHRD][1119660352] {0:0:167} Arp::sCreateSocket {
2011-09-20 01:34:29.627: [ USRTHRD][1119660352] {0:0:167} Arp::sCreateSocket }
2011-09-20 01:34:29.627: [ USRTHRD][1119660352] {0:0:167} Starting Probe for ip 169.254.12.247
2011-09-20 01:34:29.627: [ USRTHRD][1119660352] {0:0:167} Transitioning to Probe State
2011-09-20 01:34:30.115: [ USRTHRD][1119660352] {0:0:167} Arp::sProbe {
2011-09-20 01:34:30.115: [ USRTHRD][1119660352] {0:0:167} Arp::sSend: sending type 1
2011-09-20 01:34:30.115: [ USRTHRD][1119660352] {0:0:167} Arp::sProbe }
2011-09-20 01:34:30.116: [ USRTHRD][1119660352] {0:0:167}
PROBE: got conflicting source ip 169.254.12.247, addr e8:b7:48:e3:10:d4
2011-09-20 01:34:30.116: [ USRTHRD][1119660352] {0:0:167}
PROBE: conflict detected src { 169.254.12.247, e8:b7:48:e3:10:d4 }, target { 0.0.0.0, 00:10:3e:14:8e:19 }
2011-09-20 01:34:30.116: [ USRTHRD][1119660352] {0:0:167} Starting Probe for ip 169.254.38.147
2011-09-20 01:34:30.116: [ USRTHRD][1119660352] {0:0:167} Transitioning to Probe State
2011-09-20 01:34:30.760: [ USRTHRD][1119660352] {0:0:167} Arp::sProbe {
2011-09-20 01:34:30.760: [ USRTHRD][1119660352] {0:0:167} Arp::sSend: sending type 1
2011-09-20 01:34:30.760: [ USRTHRD][1119660352] {0:0:167} Arp::sProbe }
2011-09-20 01:34:30.762: [ USRTHRD][1119660352] {0:0:167}
PROBE: got conflicting source ip 169.254.38.147, addr e8:b7:48:e3:10:d4
2011-09-20 01:34:30.762: [ USRTHRD][1119660352] {0:0:167}
PROBE: conflict detected src { 169.254.38.147, e8:b7:48:e3:10:d4 }, target { 0.0.0.0, 00:10:3e:14:8e:19 }
...
<< repeated 10 times with different HAIP IP and abort:
2011-09-20 01:34:35.459: [ USRTHRD][1119660352] {0:0:167} Rate limiting attempts, numConflict 10
2011-09-20 01:35:29.501: [ AGFW][1113356608] {0:0:167} Created alert : (:CRSAGF00113:) : Aborting the command: start for resource: ora.cluster_interconnect.haip 1 1
2011-09-20 01:35:35.708: [ora.cluster_interconnect.haip][1115457856] {0:0:167} [start] Start of HAIP aborted
2011-09-20 01:35:35.709: [ AGENT][1115457856] {0:0:167} UserErrorException: Locale is
2011-09-20 01:35:35.709: [ora.cluster_interconnect.haip][1115457856] {0:0:167} [start] clsnUtils::error Exception type=2 string=
CRS-5017: The resource action "ora.cluster_interconnect.haip start" encountered the following error:
Start action for HAIP aborted
Network configuration shows no Mac address e8:b7:48:e3:10:d4 is defined on the host physical network:
$ /sbin/ifconfig -a
eth0 Link encap:Ethernet HWaddr 00:10:3E:58:3E:E7
inet addr:10.2.14.30 Bcast:10.2.14.255 Mask:255.255.255.0
inet6 addr: fe80::216:3eff:fe58:3ee7/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:4273973 errors:0 dropped:0 overruns:0 frame:0
TX packets:3176416 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:4309493182 (4.0 GiB) TX bytes:2326925399 (2.1 GiB)
eth1 Link encap:Ethernet HWaddr 00:10:3E:14:8E:19
inet addr:10.1.15.30 Bcast:10.1.15.255 Mask:255.255.255.0
inet6 addr: fe80::216:3eff:fe14:8e19/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1441782 errors:0 dropped:0 overruns:0 frame:0
TX packets:1156267 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:935044730 (891.7 MiB) TX bytes:682093588 (650.4 MiB)
Per network admin, MAC address e8:b7:48:e3:10:d4 is associated with IP 10.1.15.254/24, it is created as gateway IP for VLAN for private network on Cisco switch.
#show int Vlan15
Vlan15 is up, line protocol is up
Hardware is EtherSVI, address is e8b7.48e3.10d4 (bia e8b7.48e3.10d4)
Description: Cluster
Internet address is 10.1.15.254/24
Solution
It's recommended to have private network on dedicated switches, but in case VLAN is used for private network, on Cisco switch, gateway is not needed for the private network VLAN.
After removing the gateway IP 10.1.15.254/24 from the Cisco switch, deconfig the failed Grid Infrastructure installation:
as root user:
# $GRID_HOME/crs/install/rootcrs.pl -deconfig -force
On the last node:
# $GRID_HOME/crs/install/rootcrs.pl -deconfig -force -lastnode
rerun root.sh as root user:
# $GRID_HOME/root.sh