Oracle Clusterware Cannot Start on all Nodes: Network communication with node
In this Document
Purpose |
Troubleshooting Steps |
Step 1. Basic connectivity: |
Step 2. After basic connectivity is confirmed, check advanced connectivity checks: |
Step 3. Known bugs |
Community Discussions |
References |
This note is a troubleshooting guide for the following situation: Oracle Clusterware cannot be started on all nodes at once. For example, in a 2-node cluster, the Oracle Clusterware on the 2nd node won't start, or, attempting to start clusterware on the second node causes the first node's clusterware to shutdown.
In the clusterware alert log ($GRID_HOME/log/
In the clusterware alert log ($GRID_HOME/log/
The Oracle clusterware cannot be up on two (or more) nodes if those nodes cannot communicate with each other over the interconnect.
The CRS-1612, CRS-1611, CRS-1610 messages "Network communication with node NAME(n) missing for PCT% of timeout interval" are warning that ocssd on that node cannot communicate with ocssd on the other node(s) over the interconnect. If this persists for the full timeout interval (usually thirty seconds - reference: Document 294430.1) then Oracle Clusteware is designed to evict one of the nodes.
Therefore, the issue that requires troubleshooting in such as case is: why the nodes cannot communicate over the interconnect
Follow the steps in Note 1054902.1 to validate the network connectivity:
Note 1054902.1 - How to Validate Network and Name Resolution Setup for the Clusterware and RAC
Note: If the problem is intermittent, also conduct the test from the following My Oracle Support document:
To check TCP/IP communication:
Note 1445075.1 - Node reboot or eviction: How to check if your private interconnect CRS can transmit network heartbeats
1. Firewall
Firewall needs to be turned off on the private network.
If unsure whether there is any firewall between the nodes, use a tool like ipmon or wireshark.
Linux: Turn off iptables completely on all nodes and test:
If clusterware on all nodes can come up when iptables is turned off completely, but cannot come up when iptables is running, then the IP packet filter rules need adjusting to allow ALL traffic between the private interconnects of all the nodes.
2. Multicast
In 11.2.0.2 (only), multicast must be configured on either 230.0.1.0 or 224.0.0.251 for Clusterware startup. Follow the steps in Document 1212703.1 to check multicast communication.
Reference: Grid Infrastructure 11.2.0.2 Installation or Upgrade may fail due to Multicasting Requirement (Doc ID 1212703.1)
3. Jumbo Frames Configuration
If Jumbo Frames is configured, check to make sure its is configured properly
a) Check the MTU on the private interconnect interface(s) of each node:
Note: In the above example MTU is set to 1500 for eth0
b) If MTU > 1500 on any interface, follow the steps in Note 1085638.1 to check if Jumbo Frames are properly configured.
4. Third-party mDNS daemons running
HAIP uses mDNS. If there are any 3rd-party mDNS daemons running, such as avahi or bonjour, they can actually remove the HAIP addresses and prevent cluster communication. Make sure that there are no 3rd party mDNS daemons running on the server.
Note 1501093.1 - CSSD Fails to Join the Cluster After Private Network Recovered if avahi Daemon is up and Running
5. Advanced UDP checks
Please refer to the steps in the following document to check UDP communication over the interconnect:
Note 563566.1 - Troubleshooting gc block lost and Poor Network Performance in a RAC Environment
After reviewing all of the above, if no problems were found, check the following known issues:
Document 1488378.1 - List of gipc defects that prevent GI from starting/joining after private network is restored or node rebooted