解决SAP PI Cluster系统故障

文档已经交付给用户了,这里总结一下:


SAP PI的PI服务当在MSCS群集的node1和node2中都启动的时候,MSCS故障,所有PI资源组会在node1和node2中来回切换,导致Oracle OFS资源和MSCS资源也切换,由于PI占用内存很大,有30GB内存,这样的自动来回切换约8次后,pubilc网卡down,up多次崩溃。


由于MSCS切换和OFS资源切换都没有问题,检查MSCS的集群配置参数,无误。

检查操作系统,看是否有不利于MSCS的补丁,无误

检查网络设置和网卡属性中的BOE offload,RSS,speed,无误

检查针对WINDOWS 2003 R2 SP2中的伸缩端缩放,补丁已达,无误

检查public和private的网络千兆交换机环境,无误


最后发现:

node1和node2的网卡 HP NC357i驱动都是最新的556版本,而node1 的网卡固件是 527版本,node2的网卡固件是534,经查确认,527固件和556驱动不匹配。找到问题了

解决,驱动由于是最新,不必重装驱动,刷固件


C:\SWSetup\SP50817>nxflash_x64.exe -i private --all
0/8 - Init
*** Currently in flash ***
Board Type : HP NC375i Integrated Quad Port Multifunction Gigabit Server Adapter
Firmware Version : 4.0.534
MAC Address 0 : 68:B5:99:C4:B2:B8
MAC Address 1 : 68:B5:99:C4:B2:B9
MAC Address 2 : 68:B5:99:C4:B2:BA
MAC Address 3 : 68:B5:99:C4:B2:BB
Serial Number : 牋牋牋牋牋牋牋牋牋牋牋牋牋牋牋牋??
NIC binary romimage found in C:\SWSetup\SP50817
Rom Image : C:\SWSetup\SP50817\phantom_romimage
1/8 - Extracting Romimage
Firmware version From Board: 4.0.534
Firmware version From Romimage: 4.0.539
WARNING: This operation will take the NIC offline.
Do you wish to upgrade? (Y/N) y
Disabling devices
Disabling devices
Disabling devices
Disabling devices
Driver Loaded in Quiesce mode
2/8 - Restoring License
100% - DONE
100% - DONE
No vNIC property area in romimage
No VPD area in romimage
3/8 - Calculating MD5
100% - DONE
4/8 - Backing up current flash
100% - DONE
Backup file : "flashbackup__v4.0.534_Sat-Oct-13-22-06-50-2012" - completed successfully.
5/8 - Updating flash
WARNING: This is a very sensitive operation.
Do not interrupt until operation is complete.
setting up the flash_write
100% - DONE
6/8 - Verifying Flash MD5
Flashing completed successfully.
Reboot system for firmware to take effect
Enabling devices
Enabling devices
Enabling devices
Enabling devices
Driver Loaded in Normal mode
7/8 - Performing cleanup
8/8 - Finished

C:\SWSetup\SP50817>

在2号机node2上



C:\SWSetup\SP50817>nxflash_x64.exe -i private --all
0/8 - Init
*** Currently in flash ***
Board Type : HP NC375i Integrated Quad Port Multifunction Gigabit Server A
dapter
Firmware Version : 4.0.527
MAC Address 0 : 68:B5:99:B3:3C:58
MAC Address 1 : 68:B5:99:B3:3C:59
MAC Address 2 : 68:B5:99:B3:3C:5A
MAC Address 3 : 68:B5:99:B3:3C:5B
Serial Number : 牋牋牋牋牋牋牋牋牋牋牋牋牋牋牋牋??
NIC binary romimage found in C:\SWSetup\SP50817
Rom Image : C:\SWSetup\SP50817\phantom_romimage
1/8 - Extracting Romimage
Firmware version From Board: 4.0.527
Firmware version From Romimage: 4.0.539
WARNING: This operation will take the NIC offline.
Do you wish to upgrade? (Y/N) y
Disabling devices
Disabling devices
Disabling devices
Disabling devices
Driver Loaded in Quiesce mode
2/8 - Restoring License
100% - DONE
100% - DONE
No vNIC property area in romimage
No VPD area in romimage
3/8 - Calculating MD5
100% - DONE
4/8 - Backing up current flash
100% - DONE
Backup file : "flashbackup__v4.0.527_Sat-Oct-13-21-06-28-2012" - completed succe
ssfully.
5/8 - Updating flash
WARNING: This is a very sensitive operation.
Do not interrupt until operation is complete.
setting up the flash_write
100% - DONE
6/8 - Verifying Flash MD5
Flashing completed successfully.
Reboot system for firmware to take effect
Enabling devices
Enabling devices
Enabling devices
Enabling devices
Driver Loaded in Normal mode
7/8 - Performing cleanup
8/8 - Finished

C:\SWSetup\SP50817>

问题解决!

后来和采购确认,两台机器来源采购相差半年,不是同一批次。2号机是开发机,半年后才新购1号机生产机,然后实施的时候开发机和生产机做MSCS PI。

看来实施MSCS的人技术很毛躁,不靠谱。Windows企业环境要更加精细化,对技术素养要更高,因为很多错误你无法深入内核解决,我不可能遇到问题就看dump崩溃核心转储文件,或者拿出windbg就开工。——当然这是最后的办法

你可能感兴趣的:(cluster)