12.1.0.2 interface doesnt maintain connectivity across cluster nodes.(一)

    http://blog.itpub.net/29582917/viewspace-2122704/  12.1.0.2 interface doesnt maintain connectivity across cluster nodes.(二)
  安装oracle RAC ,在图形界面选择网卡信息时报错,节点无法联通。
[INS-41112]Specified network interface doesnt maintain connectivity across cluster nodes
数据库版本:12.1.0.2
操作系统:HP UNIX 11.31

12.1.0.2 interface doesnt maintain connectivity across cluster nodes.(一)_第1张图片

故障原因分析:
1、/etc/hosts 文件
2、验证ssh 等效性,除了public 我们需要验证心跳的等效性
3、网关或者防火墙(linux系统尤其明显)
   经过排查数据库配置方面没有问题,hosts文件和等效性都是正确的。且在HP UNIX 生产中安装很多次oracle RAC了,数据库配置方面方面不会有什么问题。且咨询现场惠普工程师,惠普没有像linux那样关闭防火墙的命令,也没有防火墙。
   现场网络工程师也说网络方面没有安全限制,且经过他们清除一些可能影响的残留配置信息,也不能解决。问题似乎是个僵局。都说自己方面没有问题。
进一步分析原因:
  
上面的lan16是public网段,lan17是心跳网段。心跳网段没有报错,且确定心跳线是直连的! lan16是public网段上面经过了万M交换机,问题似乎是网络方面存在安全限制或者防火墙。

运行预检测脚本,报错信息:

 
xcywa2:/u01/media/grid> ./runcluvfy.sh comp nodecon -i lan16    -n xcywa1,xcywa2   -verbose


Verifying node connectivity 


Checking node connectivity...


Checking hosts config file...
  Node Name                             Status                  
  ------------------------------------  ------------------------
  xcywa2                                passed                  
  xcywa1                                passed                  


Verification of the hosts config file successful




Interface information for node "xcywa2"
 Name   IP Address      Subnet          Gateway         Def. Gateway    HW Address        MTU   
 ------ --------------- --------------- --------------- --------------- ----------------- ------
 lan16  10.241.8.11     10.241.8.0      10.241.8.11     10.241.8.254    8A:98:AD:92:B3:BD 1500  




Interface information for node "xcywa1"
 Name   IP Address      Subnet          Gateway         Def. Gateway    HW Address        MTU   
 ------ --------------- --------------- --------------- --------------- ----------------- ------
 lan16  10.241.8.10     10.241.8.0      10.241.8.10     10.241.8.254    F2:97:2C:48:01:C1 1500  




Check: Node connectivity using interfaces on subnet "10.241.8.0"


Check: Node connectivity of subnet "10.241.8.0"
  Source                          Destination                     Connected?      
  ------------------------------  ------------------------------  ----------------
  xcywa2[10.241.8.11]             xcywa1[10.241.8.10]             yes             
Result: Node connectivity passed for subnet "10.241.8.0" with node(s) xcywa2,xcywa1




Check: TCP connectivity of subnet "10.241.8.0"
  Source                          Destination                     Connected?      
  ------------------------------  ------------------------------  ----------------
  xcywa2 : 10.241.8.11            xcywa2 : 10.241.8.11            passed          
  xcywa1 : 10.241.8.10            xcywa2 : 10.241.8.11            failed          


ERROR: 
PRVG-11850 : The system call "connect" failed with error "238" while executing exectask on node "xcywa1"
Connection timed out
  xcywa2 : 10.241.8.11            xcywa1 : 10.241.8.10            failed          


ERROR: 
PRVG-11850 : The system call "connect" failed with error "238" while executing exectask on node "xcywa2"
Connection timed out
  xcywa1 : 10.241.8.10            xcywa1 : 10.241.8.10            failed          


ERROR: 
PRVG-11850 : The system call "connect" failed with error "239" while executing exectask on node "xcywa1"
Connection refused
Result: TCP connectivity check failed for subnet "10.241.8.0"


Checking subnet mask consistency...
Subnet mask consistency check passed for subnet "10.241.8.0".
Subnet mask consistency check passed.

Result: Node connectivity check failed

Verification of node connectivity was unsuccessful on all the specified nodes. 

xcywa2:/u01/media/grid> ./runcluvfy.sh comp nodecon -i lan17    -n xcywa1,xcywa2   -verbose


Verifying node connectivity 


Checking node connectivity...


Checking hosts config file...
  Node Name                             Status                  
  ------------------------------------  ------------------------
  xcywa2                                passed                  
  xcywa1                                passed                  


Verification of the hosts config file successful




Interface information for node "xcywa2"
 Name   IP Address      Subnet          Gateway         Def. Gateway    HW Address        MTU   
 ------ --------------- --------------- --------------- --------------- ----------------- ------
 lan17  10.10.10.11     10.10.10.0      10.10.10.11     10.241.8.254    D6:2A:4D:6B:98:8C 1500  




Interface information for node "xcywa1"
 Name   IP Address      Subnet          Gateway         Def. Gateway    HW Address        MTU   
 ------ --------------- --------------- --------------- --------------- ----------------- ------
 lan17  10.10.10.10     10.10.10.0      10.10.10.10     10.241.8.254    F2:69:5F:F0:72:53 1500  




Check: Node connectivity using interfaces on subnet "10.10.10.0"


Check: Node connectivity of subnet "10.10.10.0"
  Source                          Destination                     Connected?      
  ------------------------------  ------------------------------  ----------------
  xcywa2[10.10.10.11]             xcywa1[10.10.10.10]             yes             
Result: Node connectivity passed for subnet "10.10.10.0" with node(s) xcywa2,xcywa1




Check: TCP connectivity of subnet "10.10.10.0"
  Source                          Destination                     Connected?      
  ------------------------------  ------------------------------  ----------------
  xcywa2 : 10.10.10.11            xcywa2 : 10.10.10.11            passed          
  xcywa1 : 10.10.10.10            xcywa2 : 10.10.10.11            failed          


ERROR: 
PRVG-11850 : The system call "connect" failed with error "238" while executing exectask on node "xcywa1"
Connection timed out
  xcywa2 : 10.10.10.11            xcywa1 : 10.10.10.10            failed          


ERROR: 
PRVG-11850 : The system call "connect" failed with error "238" while executing exectask on node "xcywa2"
Connection timed out
  xcywa1 : 10.10.10.10            xcywa1 : 10.10.10.10            failed          


ERROR: 
PRVG-11850 : The system call "connect" failed with error "239" while executing exectask on node "xcywa1"
Connection refused
Result: TCP connectivity check failed for subnet "10.10.10.0"


Checking subnet mask consistency...
Subnet mask consistency check passed for subnet "10.10.10.0".
Subnet mask consistency check passed.

Result: Node connectivity check failed

Verification of node connectivity was unsuccessful on all the specified nodes.

我们看到 public和心跳检测都是报错的,但是图形界面安装中只是提示public网卡节点连通性存在问题。到底哪个是准备的,给我带来了困扰同时又更不好判断问题到底在网络方面还是系统方面。

根据错误信息查找mos,获得文档如下:
首先说明这篇只是获得了检测命令,对于问题的诊断没有提供信息。文档最后说根据单独检测网卡口命令返回的结果,再去查相关的文档。但是通过上面的命令没有返回相关 的PRVF-7617 报错信息。


[INS-41112] Specified network interface doesnt maintain connectivity across cluster nodes. (
文档 ID 1427202.1)


In this Document

Purpose

Details

References

APPLIES TO:

Oracle Database - Enterprise Edition - Version 11.2.0.1 and later
Information in this document applies to any platform.

PURPOSE

The note lists problems, solutions or workarounds that's related to the following 11gR2 GI OUI error:

[FATAL] [INS-41112] Specified network interface doesnt maintain connectivity across cluster nodes.
CAUSE: Installer has detected that network interface eth1 does not maintain connectivity on all cluster nodes.
ACTION: Ensure that the chosen interface has been configured across all cluster nodes.


The following can also be part of the output:

PRVG-11850 : The system call "string" failed with error "number" while executing exectask on node "racnode"

 

DETAILS


[INS-41112] is a high level error number, the workarounds/solutions depend on the error code from lower layer, however, [INS-41112] does tell which interface is having the issue:

CAUSE: Installer has detected that network interface eth1 does not maintain connectivity on all cluster nodes.

## >> in this case, it's eth1 that's having connectivityissue



To find out lower layer error code, execute the following as grid user:

runcluvfy.sh comp nodecon -i -n ,, -verbose



Refer to the following once CVU reports real error code:

  • PRVF-7617

Refer to note 1335136.1 for details.

  • PRVF-6020 : Different MTU values used across network interfaces in subnet "10.10.10.0"

Refer to note 1429104.1 for details.

  • Version of exectask could not be retrieved from node "racnode1"

The cause is the installation files in staged area are corrupted, download again and install

REFERENCES

NOTE:1429104.1 - PRVF-6020 : Different MTU values used across network interfaces in subnet "10.10.10.0"
NOTE:1335136.1 - PRVF-7617: TCP connectivity check failed for subnet

 


我们不妨也看看 NOTE:1335136.1  - PRVF-7617: TCP connectivity check failed for subnet的内容。

都是一些bug信息,或者告诉你可以忽略一些不相关的网段检测信息。

 


PRVF-7617: TCP connectivity check failed for subnet (
文档 ID 1335136.1)


In this Document

Purpose

Details

 

Known Issues

 

To verify manually

 

 
When to ignore the error?

References

APPLIES TO:

Oracle Database - Enterprise Edition - Version 11.2.0.1 and later
Information in this document applies to any platform.

PURPOSE

The note is to list problems, solutions or workarounds that's related to the following error:

PRVF-7617: TCP connectivity check failed for subnet

OR

PRVF-7617 : Node connectivity between "racnode1 : 10.10.10.148" and "racnode2 : 10.10.10.149" failed


TCP connectivity check failed for subnet "10.10.10.0"

 
OR 
 
 
PRVF-7616 : Node connectivity failed for subnet "10.10.16.0" between "racnode1 - eth5 : 10.10.16.109" and "racnode2 - eth5 : 10.10.16.121"

Result: Node connectivity failed for subnet "10.10.16.0"



When the error happens, likely OUI will report: 

[INS-41110] Specified network interface doesnt maintain connectivity across cluster nodes.
[INS-41112] Specified network interface doesnt maintain connectivity across cluster nodes.



DETAILS

 

Known Issues

 

  • bug 12849377 - CVU should check only selected network interfaces (ignore "do not use")

CVU checks network interfaces that's marked "do not use", fixed in 11.2.0.3 GI PSU1

  • bug 9952812 - CVU SHOULD RETURN WARNING INSTEAD OF FATAL ERROR FOR VIRBR0

Happens on Linux if network adapter virbr0 exists, fixed in 11.2.0.3. 

The fix introduces new CVU parameter (-network) to check only specified networks:

runcluvfy.sh stage -pre crsinst -n , -networks "eth*" -verbose

 

  • bug 11903488 - affects Solaris only, fixed 11.2.0.3

As Solaris does not support the socket option SO_RCVTIMEO, TCP server fails to start:

In this example, racnode1 is nodename and 10.1.0.11 is the IP to test connectivity:

/tmp/CVU__/exectask.sh -runTCPserver racnode1 10.1.0.11

location:prvnconss1 opname:free port unavailable category:0 DepInfo: 99
Exectask:runTCPServer failed
..
Error running TCP server

bug 11903488 also remove the port range of 49900-50000 to use the first available
exectask.sh -chkTCPclient

 

  • bug 12353524 - affects hp-ux only, fixed in 11.2.0.3


location:prvnconcc3 opname:client to server connection fail
category:0 otherInfo: Client to server connection failed, errno: 227 
DepInfo: 227 -1
Exectask:chkTCPClient failed 1
Error checking TCP communication 
1

 

  • bug 12608083 - affects Windows only, fixed in 11.2.0.3

When more than one network interface are on the same subnet, it is possible that the wrong interface is used to verify TCP connectivity.

 

  • bug 10106374 - affects Windows only, fixed in 11.2.0.2

Refer to note 1286394.1 for details.

 

  • bug 16953470 - affects Solaris only, happens when "hostmodel" is set to strong

CVU trace:

[7041@racnode1] [Thread-408] [ 2013-06-13 12:41:17.772 GMT+04:00 ] [StreamReader.run:65] OUTPUT>/usr/sbin/ping -i 192.168.169.2 192.168.169.2 3 /usr/sbin/ping: sendto Network is unreachable

Manually run the "ping -i" command, receives same error

To find out current "hostmodel":

ipadm show-prop -p hostmodel ip 
PROTO PROPERTY PERM CURRENT PERSISTENT DEFAULT POSSIBLE 
ipv6 hostmodel rw weak weak weak strong, src-prio, rity, weak
ipv4 hostmodel rw weak weak weak strong, src-prio, rity, weak

To change hostmodel: 

ipadm set-prop -p hostmodel=weak ipv4 
ipadm set-prop -p hostmodel=weak ipv6

The workaround is to set hostmodel to weak  

In addition, Solaris bug 16827053 is open to fix on OS level.

 

  • bug 17043435 

The bug is closed as duplicate of internal bug 17070860 which is fixed in 11.2.0.4

 

To verify manually


Repeat the following for each interface as grid user:

runcluvfy.sh comp nodecon -i -n ,, -verbose

Sample output

Check: Node connectivity for interface "eth1"
Result: Node connectivity passed for interface "eth1"

Check: TCP connectivity of subnet "10.64.131.0"
Result: TCP connectivity check passed for subnet "10.64.131.0"

Result: Node connectivity check passed

  

 
When to ignore the error?

If the error happened on network that's not related to Oracle Clusterware, it can be ignored, i.e. if happened on administrative network and not affecting anything, it can be ignored. 

 

到这里我们还是没有获取到明确的信息到底问题出在哪里?网络还是惠普的操作系统?请关注第二篇博客!
















来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/29582917/viewspace-2122703/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/29582917/viewspace-2122703/

你可能感兴趣的:(12.1.0.2 interface doesnt maintain connectivity across cluster nodes.(一))