In this Document
Symptoms
Changes
Cause
Solution
Scalability RAC Community
References
APPLIES TO:
Oracle Database - Enterprise Edition - Version 11.2.0.2 to 12.1.0.1 [Release 11.2 to 12.1]
Information in this document applies to any platform.
This issue impacts environments that do not have multicast enabled for the private network in the following situations:
New installations of Oracle Grid Infrastructure 11.2.0.2 where multicast is not enabled on 230.0.1.0
Upgrades to Oracle Grid Infrastructure 11.2.0.2 from a pre-11.2.0.2 release where multicast is not enabled on 230.0.1.0 or 224.0.0.251
Installation of GI PSU 11.2.0.3.5, 11.2.0.3.6, 11.2.0.3.7 where multicast is not enabled on 230.0.1.0 or 224.0.0.251
Installation or upgrade to 12.1.0.1.0 where multicast is not enabled on 230.0.1.0 or 224.0.0.251
SYMPTOMS
If multicast based communication is not enabled as required either on the nodes of the cluster or on the network switches used for the private interconnect, the root.sh, which is called as part of a fresh installation of Oracle Grid Infrastructure 11.2.0.2, or the rootupgrade.sh (called as part of an upgrade to Oracle Grid Infrastructure 11.2.0.2) will only succeed on the first node of the cluster, but will fail on subsequent nodes with the symptoms shown below:
CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node node1, number 1, and is terminating
An active cluster was found during exclusive startup, restarting to join the cluster
Failed to start Oracle Clusterware stack
Failed to start Cluster Synchorinisation Service in clustered mode at /u01/app/crs/11.2.0.2/crs/install/crsconfig_lib.pm line 1016.
/u01/app/crs/11.2.0.2/perl/bin/perl -I/u01/app/crs/11.2.0.2/perl/lib -I/u01/app/crs/11.2.0.2/crs/install /u01/app/crs/11.2.0.2/crs/install/rootcrs.pl execution failed
Note: The symptoms will be the same whether an upgrade or a fresh installation of Oracle Grid Infrastructure 11.2.0.2 is performed; so will be the required diagnostics. This issue is also documented in the Oracle Database Readme 11g Release 2 Section 2.39 - "Open Bugs" under BUG: 9974223.
Note: This issue also impacts the following 11.2.0.3 PSUs: 11.2.0.3.5, 11.2.0.3.6, 11.2.0.3.7 as well as 12.1.0.1 installations if multicast is not enabled on the 230.0.1.0 or 224.0.0.251 multicast addresses (one of the 2 must be enabled/functional). With 11.2.0.3 GI was enhanced to utilize broadcast or multicast to bootstrap. However the 11.2.0.3.5 GI PSU introduced a new issue that effectively disables the broadcast functionality (Bug 16547309).
Symptom verification
To verify that Oracle CSS daemon fails to start in clustered mode due to a multicasting issue on the network, the ocssd.log file (located under $GI_HOME/log/<nodename>/cssd/ocssd.log) must be reviewed. In case, joining the cluster fails because of such an issue, the following can be observed:
1. When CSS starts in clustered mode to join an existing cluster, we will see an entry in the CSSD log indicating that CSS will attempt to establish communication with a peer in the cluster. For this analysis, we see in the CSSD log for node2 that communication is attempted with node1, which looks similar to:
2010-09-16 23:13:14.862: [GIPCHGEN][1107937600] gipchaNodeCreate: adding new node 0x2aaab408d4a0 { host 'node1', haName 'CSS_ttoprf10cluster', srcLuid 54d7bb0e-ef4a0c7e, dstLuid 00000000-00000000 numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [0 : 0], createTime 9563084, flags 0x0 }
2. Shortly after the above log entry we will see an attempt to establish communication to node1 from node2 via multicast address 230.0.1.0, port 42424 on the private interconnect (here: 192.168.1.2):
2010-09-16 23:13:14.862: [GIPCHTHR][1106360640] gipchaWorkerUpdateInterface: created remote interface for node 'node1', haName 'CSS_mycluster', inf 'mcast://230.0.1.0:42424/192.168.1.2'
3. If the communication can be established successfully, we will see a log entry on node2 containing "gipchaLowerProcessAcks: ESTABLISH finished" for the peer node (node1). If the communication cannot be established, we will not see this log entry. Instead, we will see an entry indicating that the network communication cannot be established. This entry will look similar to the one shown below:
2010-09-16 23:13:15.839: [ CSSD][1087465792]clssnmvDHBValidateNCopy: node 1, node1, has a disk HB, but no network HB, DHB has rcfg 180134562, wrtcnt, 8627, LATS 9564064, lastSeqNo 8624, uniqueness 1284701023, timestamp 1284703995/10564774
The above log entry indicates that CSSD is unable to establish network communication on the interface used for the private interconnect. In this particular case, the issue was that multicast communication on the 230.0.1.0 IP was blocked on the network used as the private interconnect.
CHANGES
New installations of Oracle Grid Infrastructure 11.2.0.2
Upgrades of a previous release to Oracle Grid Infrastructure 11.2.0.2
Installation of the 11.2.0.3.5, 11.2.0.3.6, 11.2.0.3.7 GI PSUs where multicast is not enabled on the 230.0.1.0 or 224.0.0.251 multicast addresses
New installations of Oracle Grid Infrastructure 12.1.0.1 where multicast is not enabled on the 230.0.1.0 or 224.0.0.251 multicast addresses
Upgrades of a previous release to Oracle Grid Infrastructure 12.1.0.1 where multicast is not enabled on the 230.0.1.0 or 224.0.0.251 multicast addresses
Note: 11.2.0.4 Grid Infrastructure is not impacted by this issue.
CAUSE
Assuming that Cluster Verify (cluvfy) has succeeded regarding the network checks on all nodes of the cluster or the symptoms described above are observed as part of an upgrade to Oracle Grid Infrastructure 11.2.0.2 (which means that the current release does not encounter such communication issues), this issue is probably caused by multicast not being enabled on the network used as the private interconnect.
Background information for 11.2.0.2
Oracle Grid Infrastructure 11.2.0.2 introduces a new feature called "Redundant Interconnect Usage", which provides an Oracle internal mechanism to make use of physically redundant network interfaces for the Oracle (private) interconnect. As part of this new feature, multicast based communication on the private interconnect is utilized to establish communication with peers in the cluster on each startup of the stack on a node. Once the connection with the peers in the cluster has been established, the communication is switched back to unicast. Per default, the 230.0.1.0 address (port 42424) on the private interconnect network is used for multicasting. Another IP can be enabled using the patch mentioned below, if it is determined that using the 230.0.1.0 IP causes the multicast communication to fail. Multicasting on either of these IPs and the respective port must, however, be enabled and functioning across the network and on each node meant to be part of the cluster. If multicasting is not enabled as required, nodes will fail to join the cluster with the symptoms discussed.
Background information for 11.2.0.3.5, 11.2.0.3.6, 11.2.0.3.7 GI PSUs and 12.1.0.1
With 11.2.0.3 GI was enhanced to utilize broadcast or multicast (on 230.0.1.0 or 224.0.0.251 addresses) to bootstrap. However the 11.2.0.3.5 GI PSU introcuces a new issue with effectivly disables the broadcast functionality (Bug 16547309). Do note that most networks do support multicast on the 224.0.0.251 multicast address without any special configuration, therefore the odds of this being an issue for 11.2.0.3.5 - 11.2.0.3.7 and 12.1.0.1 are greatly reduced.
Note: The Oracle CSS daemon may fail to establish network communication with peer nodes for other reasons than multicast not working as required on the private interconnect, which is discussed in this note. Therefore, refer to Note: 1054902.1 for general network communication troubleshooting, if you determine that multicasting is not the root cause for such issues on your system.