X3950通过MGMT关机产生NMI,PCI,LOG报错

症状

IBM X3950类型为8878服务器的光通路诊断面板上点亮了LOG灯,通过服务器后面的MGMT口登录服务器http://192.168.70.125(mgmt口管理IP192.168.70.125,用户名:USERID 密码:PASSW0RD,注意密码中0不是大写的password,01230)。查看到服务器上之前启动时的错误日志,点击右下角按钮清除日志,需要关机断电将电源拨掉后才能清除光通路诊断面板上点亮了LOG灯。

清除日志后于是就在管理页面的电源管理选项中立即关闭电源了,然后将服务器的2根电源线拨掉,过了片刻再将服务器的电源线插好加电开机,服务器的光通路诊断面板上这次亮了三个灯:NMI,PCI,LOG.,且服务器的所有风扇工作在97%-100%状态,声音很大,一直不停。

再次登录MGMT管理口查看日志,发现如下报错信息:

22     WARN       SERVPROC      01/03/12   19:06:39   Software NMI

23     ERR SERVPROC      01/03/12   19:06:36   Address of special cycle DPE on PCI primary Chassis#=1 Slot#=2 Bus#=4 Dev.ID=0xfd00 Vend.ID=0x10df Status=0xc238 DevFun#=0x8

24     ERR SERVPROC      01/03/12   19:06:36   System Error PCI Bus

25     ERR SERVPROC      01/03/12   19:06:36   SMI handler has reported a PCI SERR.

26     ERR SERVPROC      01/03/12   19:06:36   Uncorrectable ECC error on PCI primary Chassis#=1 Slot#=2 Bus#=4 Dev.ID=0xfd00 Vend.ID=0x10df Status=0xc238 DevFun#=0x8

27     ERR SERVPROC      01/03/12   19:06:35   Parity Error PCI Bus

28     ERR SERVPROC      01/03/12   19:06:35   SMI handler has reported a PCI PERR.

29     ERR SERVPROC      01/03/12   19:06:35   Additional uncorrectable ECC error on PCI primary Chassis#=1 Slot#=2 Bus#=4 Dev.ID=0xfd00 Vend.ID=0x10df Status=0xc238 DevFun#=0x8

30     ERR SERVPROC      01/03/12   19:06:35   Parity Error PCI Bus

31     ERR SERVPROC      01/03/12   19:06:35   SMI handler has reported a PCI PERR.

32     ERR SERVPROC      01/03/12   19:06:35   Device signaled SERR on PCI primary. Chassis#=1 Slot#=2 Bus#=4 Dev.ID=0x 2a 1 Vend.ID=0x1014 Status=0x64b0 DevFun#=0x0

33     ERR SERVPROC      01/03/12   19:06:35   System Error PCI Bus

34     ERR SERVPROC      01/03/12   19:06:35   SMI handler has reported a PCI SERR.

35     ERR SERVPROC      01/03/12   19:06:35   PCI Bus SERR# Detected Chassis#=1 Slot#=2 Bus#=4 Dev.ID=0x 2a 1 Vend.ID=0x1014 Status=0x64b0 DevFun#=0x0

36     ERR SERVPROC      01/03/12   19:06:34   System Error PCI Bus

37     ERR SERVPROC      01/03/12   19:06:34   SMI handler has reported a PCI SERR.

 

查找原因如下:

PCIe的不支持的请求和致命的流量控制产生的错误PCI SERR和软件NMIRSA日志事件的报告和调查。 These events occur intermittently during manual or scheduled restarts in Microsoft Windows Server 2003.这些事件发生间歇性地在手动或计划在Microsoft Windows Server 2003重新启动。

The root cause was determined to be memory read/write requests that were inadvertently sent to the on-board Broadcom devices after the devices were already put to the PCIe D3hot low power state in preparation for the restart.根本原因被确定为内存读/写,不经意间发送到板上的Broadcom设备后,设备已经准备重新启动到PCIe D3hot低功耗状态的请求。

A fix was provided in the Broadcom driver to reject any memory requests to the onboard Broadcom devices when they are in the D3hot state.一个修复提供了Broadcom驱动程序拒绝任何内存请求,板载的Broadcom设备,当他们在D3hot状态。 The fix is included in Broadcom driver version 4.6.55 or higher as seen in the Broadcom Advanced Control Suite (BACS).该修补程序包含Broadcom驱动的Broadcom高级控制套件(BACS 55 4 6 或更高版本。 See the image below for an example of how to see the driver version in BACS.

该系统错误可对任何下列IBM服务器:

·                                 System x3850 M2, type 7141, any model任何模型的System x 3850 M2 ,键入7141

·                                 System x3850 M2, type 7144, any model任何模型的System x 3850 M2 ,键入7144

·                                 System x3850 M2, type 7233, any model任何模型的System x 3850 M2 ,键入7233

·                                 System x3850 M2, type 7234, any model任何模型的System x 3850 M2 ,键入7234

·                                 System x3950 M2, type 7141, any modelSystem x 3950 M2 7141型,任何模型

·                                 System x3950 M2, type 7233, any modelSystem x 3950 M2 7233型,任何模型

·                                 System x3950 M2, type 7234, any modelSystem x 3950 M2 7234型,任何模型

This tip is not option specific.这个提示是不是选项的具体。

·                                 The Windows device driver for the on-board Broadcom 5709 is affected.板上的Broadcom 5709Windows设备驱动程序的影响。

The system is configured with at least one of the following:该系统配置至少有以下之一:

·                                 Microsoft Windows 2003 Server for 32-bit Servers, any service pack微软Windows 2003 Server32位服务器,任何服务包

·                                 Microsoft Windows 2003 Server for 64-bit Servers, any service pack Microsoft Windows 200364位服务器的服务器,任何服务包

·                                 Microsoft Windows 2003 Server, EE x64, any service pack Microsoft Windows 2003服务器,EE X64,任何服务包

·                                 Microsoft Windows 2003 Server, x64 Edition, any service pack Microsoft Windows 2003服务器,x64版,任何服务包

Note: This does not imply that the network operating system will work under all combinations of hardware and software. 注:这并不意味着网络操作系统下工作的硬件和软件的所有组合。

Please see the compatibility page for more information:更多信息,请参阅兼容性页面:

 

http://www.ibm.com/servers/eserver/serverproven/compat/us/ http://www.ibm.com/servers/eserver/serverproven/compat/us/

Solution解决方案

This symptom is resolved in the Broadcom Windows driver available for download at the following URL:这种症状是解决BroadcomWindows驱动程序,可在以下网址下载:

 

http://www.ibm.com/support/docview.wss?uid=psg1MIGR-5070012 http://www.ibm.com/support/docview.wss?uid=psg1MIGR-5070012

 

是由于我通过远程管理服务器页面关闭服务器电源产生的控制流量产生的错误导致了网卡处于高功耗状态,所以风扇才会全部工作。

解决方法为需要更新网卡驱动,但在IBM官网上搜索了一下X3950 Broadcom的网卡驱动,为无效连接,打电话给IBM400,服务器已经过保,苦逼了,于是将服务器的网卡驱动卸载再扫描一下后,将服务器上所有连接网线全部拨掉,清除掉日志后关闭服务器电源,再拨掉电源线,然后再重新启动,服务器的风扇工作正常了,光通路诊断面板上的灯也全部不亮了。

你可能感兴趣的:(X3950,通路诊断面板,PCI报错,LOG报错,NMI报错)