In this Document
Purpose |
Details |
Bug 10332426 - HAIP fails to start due to network mismatch |
Bug 19270660 - AIX: category: -2, operation: open, loc: bpfopen:1,os, OS error: 2, other: ARP device /dev/bpf4, interface en8 |
Bug 16445624 - AIX: HAIP fails to start |
Bug 13989181 - AIX: HAIP fails to start with: category: -2, operation: SETIF, loc: bpfopen:21,o, OS error: 6, other: dev /dev/bpf0 |
Note 1447517.1 - AIX: HAIP fails to start if bpf and other devices using same major/minor number |
Bug 10253028 - "oifcfg iflist -p -n" not showing HAIP on AIX as expected |
Bug 13332363 - Wrong MTU for HAIP on Solaris |
Bug 10114953 - only one HAIP is create on HP-UX |
Bug 10363902 - HAIP Infiniband support for Linux and Solaris |
Bug 10357258 - Many HAIP started on Solaris IPMP - not affecting 11.2.0.3 |
Bug 10397652/ 12767231 - HAIP not failing over when private network fails - not affecting 11.2.0.3 |
Bug 11077756 - allow root script to continue upon HAIP failure |
Bug 12546712 - not affecting 11.2.0.3 |
HAIP fails to start if default gateway is configured for VLAN for private network on network switch |
Bug 12425730 - HAIP does not start, 11.2.0.3 not affected |
ASM on Non-First Node (Second or Others) Fails to Start: PMON (ospid: nnnn): terminating the instance due to error 481 |
11gR2 GI HAIP Resource Not Created in Solaris 11 if IPMP is Used for Private Network |
References |
This document lists knowns HAIP issues in 11gR2/12c Grid Infrastructure. Refer to note 1210883.1 for explanation of HAIP feature.
Issue: HAIP fails to start while running rootupgrade.sh
Symptom:
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'racnode1'
CRS-5017: The resource action "ora.cluster_interconnect.haip start"
encountered the following error:
Start action for HAIP aborted
CRS-2674: Start of 'ora.cluster_interconnect.haip' on 'racnode1' failed
2010-12-12 09:41:35.201: [ CLSINET][1088543040] Returning NETDATA: 0 interfaces
2010-12-12 09:41:40.201: [ CLSINET][1088543040] Returning NETDATA: 0 interfaces
Solution:
The cause is mismatch of private network information in OCR and on OS, output of the following should be consistent with each other regarding network adapter name, subnet and netmask - see note 1296579.1 for what to check.
oifcfg iflist -p -n
oifcfg getif
ifconfig
Issue: HAIP fails to start on AIX
Symptom:
2014-07-21 16:38:59.240: [ USRTHRD][4372]{0:0:2} failed to create arp
2014-07-21 16:38:59.240: [ USRTHRD][4115]{0:0:2} (null) category: -2, operation: open, loc: bpfopen:1,os, OS error: 2, other: ARP device /dev/bpf4, interface en8
Solution/Workaround:
bug 19270660 is fixed in 12.1.0.2, apply interim patch 19270660 if the issue is encountered.
Issue: HAIP fails to start if root script (root.sh or rootupgrade.sh) is executed via sudo (not as root user directly) or if bpf device is not functionin properly
Symptom:
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'racnode1'
CRS-5017: The resource action "ora.cluster_interconnect.haip start" encountered the following error:
Start action for HAIP aborted
CRS-2674: Start of 'ora.cluster_interconnect.haip' on 'racnode1' failed
2010-12-04 17:19:54.893: [ USRTHRD][2084] {0:3:37} failed to create arp
2010-12-04 17:19:54.893: [ USRTHRD][2084] {0:3:37} (null) category: -2, operation: ioctl, loc: bpfopen:2,os, OS error: 14, other:
OR
2011-09-29 16:44:46.771: [ USRTHRD][3600] {0:3:14} (null) category: -2, operation: open, loc: bpfopen:1,os, OS error: 2, other:
OR
2011-09-29 16:44:46.771: [ USRTHRD][3600] {0:3:14} (null) category: -2, operation: open, loc: bpfopen:1,os, OS error: 22, other:
OR
2011-09-29 16:44:46.771: [ USRTHRD][3600] {0:3:14} (null) category: -2, operation: open, loc: bpfopen:1,os, OS error: 22, other:
OR2012-04-21 12:36:43.951: [ USRTHRD][2572] {0:0:2} (null) category: -2, operation: SETIF, loc: bpfopen:21,o, OS error: 6, other: dev /dev/bpf0, ifr en2
OR
Various other OS error code can be seen as well
Solution/Workaround:
It's known on AIX and Solaris that command executed via sudo etc may not have full root environment, which could cause HAIP startup failure.
The solution is to obtain and apply patch 16445624 on AIX.
The workaround is to execute root script (root.sh or rootupgrade.sh) as real root user directly.
If root script already failed, try one or all of the following:
- reboot the node
- execute "/usr/sbin/tcpdump -D" as root user, if the timestamp of the bpf device didn't get updated, delete the device and re-run the same "tcpdump -D" command
Before re-running root script, verify whether the following exists and the timestamp is updated
Duplicate Bug 14358011
Issue: HAIP fails to start on AIX
Symptom:
2012-04-21 12:36:43.951: [ USRTHRD][2572] {0:0:2} failed to create arp
2012-04-21 12:36:43.951: [ USRTHRD][2572] {0:0:2} (null) category: -2, operation: SETIF, loc: bpfopen:21,o, OS error: 6, other: dev /dev/bpf0, ifr en2
...
2012-04-21 17:12:41.086: [ora.cluster_interconnect.haip][3343] {0:0:2} [start] Start of HAIP aborted
2012-04-21 17:12:41.086: [ AGENT][3343] {0:0:2} UserErrorException: Locale is
2012-04-21 17:12:41.087: [ora.cluster_interconnect.haip][3343] {0:0:2} [start] clsnUtils::error Exception type=2 string=CRS-5017: The resource action "ora.cluster_interconnect.haip start" encountered the following error: Start action for HAIP aborted
Solution/Workaround:
bug 13989181 is fixed in 11.2.0.4, apply interim patch 13989181 if the issue is encountered.
Issue: HAIP fails to start on AIX as other system devices using same major/minor number as bpf devices
orarootagent_root.log shows: category: -2, operation: SETIF, loc: bpfopen:21,o, OS error: 22, other: dev /dev/bpf0, ifr en15
The solution is to ensure no other device is using same major/minor as bpf device, refer to note 1447517.1 for more details.
Issue: "oifcfg iflist -p -n" not showing HAIP on AIX
Fixed in: Expected behaviour on AIX
Symptom:
en12 10.0.1.0 global public
en13 10.1.1.0 global cluster_interconnect
en13: flags=5e080863,c0
inet 10.1.1.143 netmask 0xffffff00 broadcast 10.1.1.255
inet 169.254.228.154 netmask 0xffff0000 broadcast 169.254.255.255
tcp_sendspace 131072 tcp_recvspace 65536 rfc1323 0
..
Note HAIP exists
SQL> select * from gv$cluster_interconnects;
INST_ID NAME IP_ADDRESS IS_ SOURCE
---------- --------------- ---------------- ---
1 en13 169.254.228.154 NO
2 en13 169.254.55.162 NO
en12 10.0.1.0 PUBLIC 255.255.255.0
en13 10.1.1.0 PUBLIC 255.255.255.0
Note usually we expect HAIP to be listed here as well, however it's not listed on AIX
Issue: Wrong MTU size for HAIP on Solaris, refer to note 1290585.1 for more details.
Fixed in: 11.2.0.2 GI PSU5, 11.2.0.3 GI PSU1, 11.2.0.4
Issue: Only one HAIP created on HP-UX
2013-05-29 17:21:31.280: [ USRTHRD][29499] {0:0:56578} Arp::sCreateSocket {
2013-05-29 17:21:31.280: [ USRTHRD][29499] {0:0:56578} failed to create arp
2013-05-29 17:21:31.281: [ USRTHRD][29499] {0:0:56578} (null) category: -2, operation: ssclsi_dlpi_request, loc: dlpireq:8,na, OS error: 4, other:
The bug is fixed in 11.2.0.4, patch 10114953 is required before 11.2.0.4 is released.
OS kernel parameter dlpi_max_ub_promisc must be set to greater than 1 for the patch to be effective.
To find out value of dlpi_max_ub_promisc: kctune -v dlpi_max_ub_promisc
Refer to bug 15940367
Issue: GIPC HA disabled or HAIP fails to start if cluster interconnect is Infiniband or any other network hardware that has hardware address (MAC) longer than 6 bytes
Fixed in: 11.2.0.3 for Linux and Solaris
Symptom:
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'racnode1'
CRS-5017: The resource action "ora.cluster_interconnect.haip start"
encountered the following error:
Start action for HAIP aborted
CRS-2674: Start of 'ora.cluster_interconnect.haip' on 'racnode1' failed
2010-12-07 13:23:08.560: [ USRTHRD][3858] {0:0:62} Arp::sCreateSocket {
2010-12-07 13:23:08.560: [ USRTHRD][3858] {0:0:62} failed to create arp
2010-12-07 13:23:08.561: [ USRTHRD][3858] {0:0:62} (null) category: -2,
operation: ssclsi_aix_get_phys_addr, loc: aixgetpa:4,n, OS error: 2, other:
Issue: many HAIP created after active NIC fails in IPMP
Fixed in: 11.2.0.3, 11.2.0.2 GI PSU3, interim patch 10357258 exists for 11.2.0.2, patch 11865154 for 11.2.0.2.1, affects Solaris only
Symptom:
nxge3:2: flags=21000843mtu 1500 index 5
inet 169.254.20.88 netmask ffff0000 broadcast 169.254.255.255
nxge3:3: flags=21000842mtu 1500 index 5
inet 169.254.20.88 netmask ffff0000 broadcast 169.254.255.255
..
Note the same HAIP shows up multiple times
Issue: HAIP does not failover even when private network experiences problem (i.e. switch port disabled or such) as OS is not providing reliable link information
Fixed in: Bug 12767231 is fixed in 11.2.0.2 GI PSU4, 11.2.0.3
Workaround on AIX is to set "MONITOR" flag for all private network adapters
# ifconfig en1 monitor
# ifconfig en1
en1: flags=5e080863,2c0GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),PSEG,LARGESEND,CHAIN, MONITOR>
inet 192.168.10.83 netmask 0xfffffc00 broadcast 192.168.11.255
inet 169.254.74.136 netmask 0xffff8000 broadcast 169.254.127.255
tcp_sendspace 131072 tcp_recvspace 65536 rfc1323 0
Issue: Startup failure of HAIP fails root script, fix of the bug will allow root script to continue so HAIP issue can be worked later.
Fixed in: 11.2.0.2 GI PSU6, 11.2.0.3 and above
Note: the consequence is that HAIP will be disabled. Once the cause is identified and solution is implemented, HAIP needs to be enabled when there's an outage window. To enable, as root on ALL nodes:
# $GRID_HOME/bin/crsctl modify res ora.cluster_interconnect.haip -attr "ENABLED=1" -init
# $GRID_HOME/bin/crsctl stop crs
# $GRID_HOME/bin/crsctl start crs
Issue: ASM crashes as HAIP does not fail over when two or more private network fails , refer to note 1323995.1 for more details.
Issue: HAIP fails to start if default gateway is configured for VLAN for private network on network switch
orarootagent_root.log shows: PROBE: conflict detected src { 169.254.12.247,
The solution is to remove default gateway setting on network switch for private network (VLAN), refer to Note 1366211.1 for more details.
Issue: HAIP fails to start, gipcd.log shows rank 0 or "-1" for private network
Fixed in: 11.2.0.2 GI PSU6, 11.2.0.3 and onward, refer to note 1374360.1 for details.
HAIP not running could affect instance start. Refer Note 1383737.1 for details
HAIP will not be enabled on Solaris 11 if IPMP is configured for private network. This is by design. Refer to Note 1512141.1