Sat Jul 27 02:06:59 2019
skgxpvfynet: mtype: 61 process 155678 failed because of a resource problem in the OS. The OS has most likely run out of buffers (rval: 4)
Errors in file /u01/app/oracle/diag/rdbms/newmdb/nexxxdb1/trace/nexxxdb1_ora_155678.trc (incident=968005):
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:sendmsg failed with status: 105
ORA-27301: OS failure message: No buffer space available
ORA-27302: failure occurred at: sskgxpsnd2

很早以前遇到过IPC这样的问题无数,这个就是咱们现在大内存的主机下,缓冲区碎片整理单元有点小,并且回环地址的mtu值有问题。我们按照官方MOS文档指示进行更改即可;

  1. On servers with High Physical Memory, the parameter vm.min_free_kbytes should be set in the order of 0.4% of total Physical Memory. This helps in keeping a larger range of defragmented memory pages available for network buffers reducing the probability of a low-buffer-space conditions.

For example, on a server which is having 256GB RAM, the parameter vm.min_free_kbytes should be set to 1073742

On NUMA Enabled Systems, the value of vm.min_free_kbytes should be multiplied by the number of NUMA nodes since the value is to be split across all the nodes.

On NUMA Enabled Systems, the value of vm.min_free_kbytes = n * 0.4% of total Physical Memory. Here 'n' is the number of NUMA nodes.

  1. Additionally, the MTU value should be modified as below

#ifconfig lo mtu 16436

To make the change persistent over reboot add the following line in the file /etc/sysconfig/network-scripts/ifcfg-lo :

MTU=16436
Save the file and restart the network service to load the changes

#service network restart

Note : While making the changes in CRS nodes, if network is restarted while CRS is up, it can hung CRS. So cluster services should be stopped prior to the network restart.