背景:rx8620设备下电后无法启动
服务器有一个CELL板,三个BPS(最小配置)
现象:rx8620电源正常,MP-》PE选项加电后仍然无法启动,前面板PWR灯灭
基础扫盲:
1.针对rx8620机器故障的诊断可以在mp卡(控制台)里面,结合各选项来分析故障点
2.介绍几个有用的选项,VFP(虚拟面板,可以查看机器的状态),CM->PS(查看各部件的加电状态),SL(查看事件日志),各部件的状态指示灯(根据不同的状态有不同的含义。
下面根据这次真实case情况具体说明
1.电缆A0,A1插入机器后插口,测电压没问题。
2.查看前面板standby pwr灯亮,说明3.3v电源已经加上了h
3.在MP卡CM选项下执行PE->T加电,机器没有加电的反应
4.借助查看事件日志SL,SEL,FPL,没有相关日志记录
5.下面看看相关操作的步骤和结果
MP MAIN MENU:
CO: Consoles
VFP: Virtual Front Panel (partition status)
CM: Command Menu
CL: Console Logs
SL: Show Event Logs
HE: Help
X: Exit Connection
[mp] MP> cm
Enter HE to get a list of available commands
[mp] MP:CM> pe
This command controls power enable to a hardware device.
T - Cabinet
C - Cell
P - IO Chassis
X - Complex
R - Partition
Select Device: t
The power state is ON for Cabinet 0.
In what state do you want the power? (ON/OFF) on
[mp] MP:CM>
MP MAIN MENU:
CO: Consoles
VFP: Virtual Front Panel (partition status)
CM: Command Menu
CL: Console Logs
SL: Show Event Logs
HE: Help
X: Exit Connection
[mp] MP> cm
Enter HE to get a list of available commands
[mp] MP:CM> ps
Display detailed status of the selected MP bus device.
The following MP bus devices were found:
+---+----+-----+-------+-------+-----------+
| | | | | | |
| | | Sys | | IO | Bulk Pwr |
|Cab| MP |Bkpln| Cells |Chassis| Supplies |
| # |M S| |0 1 2 3| 0 1 |0 1 2 3 4 5|
+---+----+-----+-------+-------+-----------+
| 0 |* | * |* | * * |* * * * |
+---+----+-----+-------+-------+-----------+
You may display detailed power and hardware status for the following items:
T - Cabinet
S - System Backplane
G - MP (Core I/O)
P - IO Chassis
C - Cell
Select Device: t
HW status for rx8620 cabinet : NO FAILURE DETECTED
Power switch is on
Right Door is closed
Top Door is closed
Left Door is closed
Total Power Available 4000 VA
Total Power Needed 1104 VA
Power Redundancy : redundant
Power Viability : viable
Power Status
---------------+-----+-------+-------+-----------+
| | | | |
| Sys | | IO | Bulk Pwr |
|Bkpln| Cells |Chassis| Supplies |
| |0 1 2 3| 0 1 |0 1 2 3 4 5|
---------------+-----+-------+-------+-----------+
Populated | * |* | * * |* * * * |
Enabled | * | | |* * * * |
Power OK | * | | |* * * * |
Warning/Fault | | | | |
Attention LED | | | | |
AC Line status:
Line A0 Present
Line B0 NOT PRESENT
Line A1 Present
Line B1 NOT PRESENT
-- Press <CR> to continue, or 'Q' to Quit --
Front Fan Speed : normal
Rear Fan speed : normal
I/O Bay Fan Speed : normal
Temperature state : normal
Main Fan Redundancy : redundant
I/O Fan Redundancy : redundant
Overtemp Shutdown Enabled
| BPS | PCI |
| Fans | Fans |
| 0 1 2 3 4 5 | 0 1 2 3 4 5 |
+-----------+-------------+-------------+
Populated | * * * * | * * * * * * |
Failing | | |
Failed | | |
| Standby/Main Fans |
| 1 1 1 1 1 1 1 1 1 1 2 |
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 |
+-----------+-------------------------------------------+
Populated | * * * * * * * * * * * * * * * * * * * * * |
Failing | |
Failed | |
| Cell Fans |
| CPU C |
| 0 1 2 3 C |
+-----------+------------+
Cell 0 | |
Populated | * |
Failing | |
Failed | |
[mp] MP:CM>
[mp] MP:CM> cm
Unknown Command. Type HE for a list of commands.
[mp] MP:CM> pe
This command controls power enable to a hardware device.
T - Cabinet
C - Cell
P - IO Chassis
X - Complex
R - Partition
Select Device: p
Enter IO Chassis number: 0
The power state is OFF for PCI Domain 0.
In what state do you want the power? (ON/OFF) on
[mp] MP:CM> pe
This command controls power enable to a hardware device.
T - Cabinet
C - Cell
P - IO Chassis
X - Complex
R - Partition
Select Device: p
Enter IO Chassis number: 1
The power state is OFF for PCI Domain 1.
In what state do you want the power? (ON/OFF) on
[mp] MP:CM> pe
This command controls power enable to a hardware device.
T - Cabinet
C - Cell
P - IO Chassis
X - Complex
R - Partition
Select Device: c
Enter cell number: 0
The power state is OFF for Cell 0.
In what state do you want the power? (ON/OFF) on
Cell board power disabled - incompatible processors/cell.
Error attempting to power ON : RtnCodeFailure
[mp] MP:CM> ps
Display detailed status of the selected MP bus device.
The following MP bus devices were found:
+---+----+-----+-------+-------+-----------+
| | | | | | |
| | | Sys | | IO | Bulk Pwr |
|Cab| MP |Bkpln| Cells |Chassis| Supplies |
| # |M S| |0 1 2 3| 0 1 |0 1 2 3 4 5|
+---+----+-----+-------+-------+-----------+
| 0 |* | * |* | * * |* * * * |
+---+----+-----+-------+-------+-----------+
You may display detailed power and hardware status for the following items:
T - Cabinet
S - System Backplane
G - MP (Core I/O)
P - IO Chassis
C - Cell
Select Device: p
Enter IO Chassis number: 0
HW status for IO Chassis 0 : No Fault Detected
Local Power Monitor Version is 2.000
Power is on, no fault
Power Module's Brick 0 VRM 1 VRM 3
Present : * * *
Module OK : * * *
Enabled : * * *
[mp] MP:CM> pe
This command controls power enable to a hardware device.
T - Cabinet
C - Cell
P - IO Chassis
X - Complex
R - Partition
Select Device: p
Enter IO Chassis number: 1
The power state is ON for PCI Domain 1.
In what state do you want the power? (ON/OFF) on
[mp] MP:CM> ps
Display detailed status of the selected MP bus device.
The following MP bus devices were found:
+---+----+-----+-------+-------+-----------+
| | | | | | |
| | | Sys | | IO | Bulk Pwr |
|Cab| MP |Bkpln| Cells |Chassis| Supplies |
| # |M S| |0 1 2 3| 0 1 |0 1 2 3 4 5|
+---+----+-----+-------+-------+-----------+
| 0 |* | * |* | * * |* * * * |
+---+----+-----+-------+-------+-----------+
You may display detailed power and hardware status for the following items:
T - Cabinet
S - System Backplane
G - MP (Core I/O)
P - IO Chassis
C - Cell
Select Device: p
Enter IO Chassis number: 1
HW status for IO Chassis 1 : No Fault Detected
Local Power Monitor Version is 2.000
Power is on, no fault
Power Module's Brick 1 VRM 2 VRM 4
Present : * * *
Module OK : * * *
Enabled : * * *
[mp] MP:CM> ps
Display detailed status of the selected MP bus device.
The following MP bus devices were found:
+---+----+-----+-------+-------+-----------+
| | | | | | |
| | | Sys | | IO | Bulk Pwr |
|Cab| MP |Bkpln| Cells |Chassis| Supplies |
| # |M S| |0 1 2 3| 0 1 |0 1 2 3 4 5|
+---+----+-----+-------+-------+-----------+
| 0 |* | * |* | * * |* * * * |
+---+----+-----+-------+-------+-----------+
You may display detailed power and hardware status for the following items:
T - Cabinet
S - System Backplane
G - MP (Core I/O)
P - IO Chassis
C - Cell
Select Device: c
Enter cell number: 0
HW status for Cell 0 : FAILURE DETECTED
Power status : NOT VIABLE, no fault
Boot is blocked
PDH memory is not shared
Processor Compatibility : FAULT
RIO cable status : UNDEFINED
RIO cable connection physical location : cannot be determined
Core cell is INVALID
Attention Led is off
PDHC status Leds : ----
CPU Module Slot 0 1 2 3
Populated
Local 48V Good
Power Enabled
Power Good
(* - True, P - Processor, T - Terminator)
DIMMs populated:
0 . . . 4 . . . 8 . . .12 . . .
1 1 1 1 1 1
VRM's 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
Present :
Enabled :
Pwr Good :
Front Side Bus Freq. : 250 MHz
CPU Core Freq. : 0 MHz
CPU Part Number :
System Boot Rom (SFW) firmware rev 0.000
PDH controller (PDHC) firmware rev 0.000, built THU JAN 01 00:00:00 1970
MICE revision is 0.0
请注意以上红色字体,PCI电源都正常,但是CELL0是无法加电的,再次对电源进行分析仍然没问题存在。
以上情况,只有一个CELL0存在,无法进行交叉测试,如果有多个备件,工程师可以考虑交叉测试,确定故障备件,然后进行更换。
最后确定故障备件为CELL板。
也许有的工程师会问,为什么不怀疑CPU或者内存呢?、
这里说下我的思路:因为这里,CELL是加不上电的,即使可能是还有别的部件损坏,那么我们通常先把最有可能的换掉,再来考虑其他故障问题。
最后解决:更换CELL板,问题解决。
本文出自 “学步” 博客,转载请与作者联系!