root.sh Fails to Start HAIP as Default Gateway is Configured for Private Network VLAN (文档 ID 1366211

Applies to:

Oracle Server - Enterprise Edition - Version 11.2.0.2 and later
Information in this document applies to any platform.

Symptoms

Installing 11.2.0.2 Grid Infrastructure on 2 node RAC cluster with VLAN configured for underlying network, root.sh fails with:

......
Start of resource "ora.cluster_interconnect.haip" failed
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'db1'
CRS-5017: The resource action "ora.cluster_interconnect.haip start" encountered the following error:
Start action for HAIP aborted
CRS-2674: Start of 'ora.cluster_interconnect.haip' on 'db1' failed
CRS-2679: Attempting to clean 'ora.cluster_interconnect.haip' on 'db1'
CRS-2681: Clean of 'ora.cluster_interconnect.haip' on 'db1' succeeded
CRS-4000: Command Start failed, or completed with errors.
Failed to start Oracle Clusterware stack
Failed to start High Availability IP at /u01/app/11.2.0/grid/crs/install/crsconfig_lib.pm line 1043.
/u01/app/11.2.0/grid/perl/bin/perl -I/u01/app/11.2.0/grid/perl/lib -I/u01/app/11.2.0/grid/crs/install /u01/app/11.2.0/grid/crs/install/rootcrs.pl execution failed


This happens on both nodes.

Changes

New installation

Cause

The problem happens as IP address 10.1.15.254/24 is configured as default gateway for VLAN for private network on Cisco switch, causing HAIP retrieve MAC address e8:b7:48:e3:10:d4 associated with IP: 10.1.15.254/24 instead of the real MAC 00:10:3e:14:8e:19 associated with private network adapter 10.1.15.30 and HAIP startup fails with conflict MAC address.

orarootagent_root.log shows:

2011-09-20 01:34:29.591: [ USRTHRD][1099024704] {0:0:167} HAIP: initializing to 1 interfaces
2011-09-20 01:34:29.592: [ USRTHRD][1099024704] {0:0:167} HAIP: configured to use 1 interfaces
2011-09-20 01:34:29.595: [ USRTHRD][1099024704] {0:0:167} HAIP: Updating member info HAIP1;10.1.15.0#0
2011-09-20 01:34:29.595: [ USRTHRD][1099024704] {0:0:167} InitializeHaIps[ 0] infList 'inf eth1, ip 10.1.15.30, sub 10.1.15.0'
2011-09-20 01:34:29.596: [ USRTHRD][1099024704] {0:0:167} Error in getting Key SYSTEM.network.haip.group.cluster_interconnect.interface.valid in OCR
2011-09-20 01:34:29.598: [ CLSINET][1099024704] failed to open OLR HAIP subtype SYSTEM.network.haip.group.cluster_interconnect.interface.valid key, rc=4
2011-09-20 01:34:29.598: [ USRTHRD][1099024704] {0:0:167} HAIP reset on new modified startup, ipSize 0 != numInf 1
2011-09-20 01:34:29.598: [ USRTHRD][1099024704] {0:0:167} HAIP: starting inf 'eth1', suggestedIp '', assignedIp ''
2011-09-20 01:34:29.598: [ USRTHRD][1099024704] {0:0:167} Thread:[NetHAWork]start {
2011-09-20 01:34:29.598: [ USRTHRD][1099024704] {0:0:167} Thread:[NetHAWork]start }
2011-09-20 01:34:29.598: [ USRTHRD][1119660352] {0:0:167} [NetHAWork] thread started
2011-09-20 01:34:29.598: [ USRTHRD][1119660352] {0:0:167} Arp::sCreateSocket {
2011-09-20 01:34:29.627: [ USRTHRD][1119660352] {0:0:167} Arp::sCreateSocket }
2011-09-20 01:34:29.627: [ USRTHRD][1119660352] {0:0:167} Starting Probe for ip 169.254.12.247
2011-09-20 01:34:29.627: [ USRTHRD][1119660352] {0:0:167} Transitioning to Probe State
2011-09-20 01:34:30.115: [ USRTHRD][1119660352] {0:0:167} Arp::sProbe {
2011-09-20 01:34:30.115: [ USRTHRD][1119660352] {0:0:167} Arp::sSend: sending type 1
2011-09-20 01:34:30.115: [ USRTHRD][1119660352] {0:0:167} Arp::sProbe }
2011-09-20 01:34:30.116: [ USRTHRD][1119660352] {0:0:167} PROBE: got conflicting source ip 169.254.12.247, addr e8:b7:48:e3:10:d4
2011-09-20 01:34:30.116: [ USRTHRD][1119660352] {0:0:167} PROBE: conflict detected src { 169.254.12.247, e8:b7:48:e3:10:d4 }, target { 0.0.0.0, 00:10:3e:14:8e:19 }
2011-09-20 01:34:30.116: [ USRTHRD][1119660352] {0:0:167} Starting Probe for ip 169.254.38.147
2011-09-20 01:34:30.116: [ USRTHRD][1119660352] {0:0:167} Transitioning to Probe State
2011-09-20 01:34:30.760: [ USRTHRD][1119660352] {0:0:167} Arp::sProbe {
2011-09-20 01:34:30.760: [ USRTHRD][1119660352] {0:0:167} Arp::sSend: sending type 1
2011-09-20 01:34:30.760: [ USRTHRD][1119660352] {0:0:167} Arp::sProbe }
2011-09-20 01:34:30.762: [ USRTHRD][1119660352] {0:0:167} PROBE: got conflicting source ip 169.254.38.147, addr e8:b7:48:e3:10:d4
2011-09-20 01:34:30.762: [ USRTHRD][1119660352] {0:0:167} PROBE: conflict detected src { 169.254.38.147, e8:b7:48:e3:10:d4 }, target { 0.0.0.0, 00:10:3e:14:8e:19 }
...
<< repeated 10 times with different HAIP IP and abort:

2011-09-20 01:34:35.459: [ USRTHRD][1119660352] {0:0:167} Rate limiting attempts, numConflict 10
2011-09-20 01:35:29.501: [ AGFW][1113356608] {0:0:167} Created alert : (:CRSAGF00113:) : Aborting the command: start for resource: ora.cluster_interconnect.haip 1 1
2011-09-20 01:35:35.708: [ora.cluster_interconnect.haip][1115457856] {0:0:167} [start] Start of HAIP aborted
2011-09-20 01:35:35.709: [ AGENT][1115457856] {0:0:167} UserErrorException: Locale is
2011-09-20 01:35:35.709: [ora.cluster_interconnect.haip][1115457856] {0:0:167} [start] clsnUtils::error Exception type=2 string=
CRS-5017: The resource action "ora.cluster_interconnect.haip start" encountered the following error:
Start action for HAIP aborted


Network configuration shows no Mac address e8:b7:48:e3:10:d4 is defined on the host physical network:

$ /sbin/ifconfig -a

eth0 Link encap:Ethernet HWaddr 00:10:3E:58:3E:E7
     inet addr:10.2.14.30 Bcast:10.2.14.255 Mask:255.255.255.0
     inet6 addr: fe80::216:3eff:fe58:3ee7/64 Scope:Link
     UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
     RX packets:4273973 errors:0 dropped:0 overruns:0 frame:0
     TX packets:3176416 errors:0 dropped:0 overruns:0 carrier:0
     collisions:0 txqueuelen:1000
     RX bytes:4309493182 (4.0 GiB) TX bytes:2326925399 (2.1 GiB)


eth1 Link encap:Ethernet HWaddr 00:10:3E:14:8E:19
     inet addr:10.1.15.30 Bcast:10.1.15.255 Mask:255.255.255.0
     inet6 addr: fe80::216:3eff:fe14:8e19/64 Scope:Link
     UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
     RX packets:1441782 errors:0 dropped:0 overruns:0 frame:0
     TX packets:1156267 errors:0 dropped:0 overruns:0 carrier:0
     collisions:0 txqueuelen:1000
     RX bytes:935044730 (891.7 MiB) TX bytes:682093588 (650.4 MiB)



Per network admin, MAC address e8:b7:48:e3:10:d4 is associated with IP 10.1.15.254/24, it is created as gateway IP for VLAN for private network on Cisco switch.

#show int Vlan15
Vlan15 is up, line protocol is up
Hardware is EtherSVI, address is e8b7.48e3.10d4 (bia e8b7.48e3.10d4)
Description: Cluster
Internet address is 10.1.15.254/24

Solution

It's recommended to have private network on dedicated switches, but in case VLAN is used for private network, on Cisco switch, gateway is not needed for the private network VLAN.

After removing the gateway IP 10.1.15.254/24 from the Cisco switch,  deconfig the failed Grid Infrastructure installation:

as  root user:

# $GRID_HOME/crs/install/rootcrs.pl -deconfig -force

On the last node:
# $GRID_HOME/crs/install/rootcrs.pl -deconfig -force -lastnode


rerun root.sh as root user:

# $GRID_HOME/root.sh

你可能感兴趣的:(Oracle,Rac,HAIP,oracle,RAC)