In a RAC environment VIP listener report:
WARNING: Subscription for node down event still pending
in listener.log even so ONS service is running:
crsctl stat res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
...............
ora.ons
ONLINE ONLINE rorac1
ONLINE ONLINE rorac2
These WARNINGS appear every minute or so, showing that the listener cannot connect to ONS server.
Additionally incoming connections start to fail and the following messages would be reported in the listener.log:
TNS-12518: TNS:listener could not hand off client connection
TNS-12560: TNS:protocol adapter error
TNS-00530: Protocol adapter error
Linux Error: 24: Too many open files
or
TNS-12518: TNS:listener could not hand off client connection
TNS-12560: TNS:protocol adapter error
TNS-00530: Protocol adapter error
Solaris Error: 24: Too many open files
The Lsnrctl utility will no longer be able to connect to the listener. It may hang OR report the following messages:
Lsnrctl status or services might yield:
When the problems with the listener begin, as a result of this issue, the listener may either hang or crash.
Investigations show that the OS limit for open files is high enough
/etc/security/llimits.conf
oracle soft nofile 131072
oracle hard nofile 131072
$ulimit -n
131072
An increase in file descriptors will delay the onset of the this problem but eventually, it will occur again.
Running the LSOF utility against the listener process shows huge number of used TCP file descriptors.
ps -ef | grep tnslsnr
lsof -p <tnslsnr OSPID>
Solaris
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
tnslsnr 12945 oracle 39u IPv4 0x601713ca780 0t104 TCP *:* (IDLE)
tnslsnr 12945 oracle 39u IPv4 0x601713ca780 0t104 TCP *:* (IDLE)
..........
Linux
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
tnslsnr 20977 oracle 16u sock 0,4 266915036 can't identify protocol
tnslsnr 20977 oracle 16u sock 0,4 266915036 can't identify protocol
..........
The number of sockets opened by listener process is increasing in time.
Listener is leaking sockets and these file descriptors will increase in number exhausting the server resource.
No changes to the system.
This was investigated in unpublished bug:
BUG 14067245 - STBH: LISTENER REACHED FD MAX LIMIT : 131072
Bug 14067245 was closed as duplicate of unpublished bug:
BUG 13078786 - LISTENER GOES DOWN SUDDENLY W/ LINUX ERROR: 32: BROKEN PIPE
Bug 13078786 affects 11.2.0.1, 11.2.0.2 and 11.2.0.3
The fix will be included in 11.2.0.4
Several backports exist for 11.2.0.2 and 11.2.0.3 so Patch 13078786 can be applied.
As a workaround, if the application doesn't use FCF or FAN, the listener subscription to the the ONS can be disabled using either of the following:
1) Add SUBSCRIBE_FOR_NODE_DOWN_EVENT_<listener_name>=OFF in listener.ora
2) Change the 'localport' value in $GI_HOME/opmn/conf/ons.config to a value differnet then 6100
Bug 13078786 affects 11.2.0.1, 11.2.0.2 and 11.2.0.3
The fix will be included in 11.2.0.4
Both require a reload/restart of the listener.