HP RX8620下电无法启动诊断修复实例

背景:rx8620设备下电后无法启动
            服务器有一个CELL板,三个BPS(最小配置)
 
 
现象:rx8620电源正常,MP-》PE选项加电后仍然无法启动,前面板PWR灯灭
基础扫盲:
1.针对rx8620机器故障的诊断可以在mp卡(控制台)里面,结合各选项来分析故障点
2.介绍几个有用的选项,VFP(虚拟面板,可以查看机器的状态),CM->PS(查看各部件的加电状态),SL(查看事件日志),各部件的状态指示灯(根据不同的状态有不同的含义。
 
下面根据这次真实case情况具体说明
1.电缆A0,A1插入机器后插口,测电压没问题。
2.查看前面板standby pwr灯亮,说明3.3v电源已经加上了h
3.在MP卡CM选项下执行PE->T加电,机器没有加电的反应
4.借助查看事件日志SL,SEL,FPL,没有相关日志记录
5.下面看看相关操作的步骤和结果
 MP MAIN MENU:
         CO: Consoles
        VFP: Virtual Front Panel (partition status)
         CM: Command Menu
         CL: Console Logs
         SL: Show Event Logs
         HE: Help
          X: Exit Connection
[mp] MP> cm

                Enter HE to get a list of available commands
 
[mp] MP:CM> pe
This command controls power enable to a hardware device.
    T - Cabinet
    C - Cell
    P - IO Chassis
    X - Complex
    R - Partition
        Select Device: t
    The power state is ON for Cabinet 0.
    In what state do you want the power? (ON/OFF) on
[mp] MP:CM>
    MP MAIN MENU:
         CO: Consoles
        VFP: Virtual Front Panel (partition status)
         CM: Command Menu
         CL: Console Logs
         SL: Show Event Logs
         HE: Help
          X: Exit Connection
[mp] MP> cm

                Enter HE to get a list of available commands
 
[mp] MP:CM> ps
Display detailed status of the selected MP bus device.

The following MP bus devices were found:
+---+----+-----+-------+-------+-----------+
|   |    |     |       |       |           |
|   |    | Sys |       |  IO   | Bulk Pwr  |
|Cab| MP |Bkpln| Cells |Chassis| Supplies  |
| # |M  S|     |0 1 2 3| 0   1 |0 1 2 3 4 5|
+---+----+-----+-------+-------+-----------+
| 0 |*   |  *  |*      | *   * |* * * *    |
+---+----+-----+-------+-------+-----------+
You may display detailed power and hardware status for the following items:
    T - Cabinet
    S - System Backplane
    G - MP (Core I/O)
    P - IO Chassis
    C - Cell
        Select Device: t
HW status for rx8620 cabinet : NO FAILURE DETECTED
Power switch is on
Right Door is closed
Top Door is closed
Left Door is closed
Total Power Available 4000 VA
Total Power Needed 1104 VA
Power Redundancy : redundant
Power Viability : viable

 Power Status
---------------+-----+-------+-------+-----------+
               |     |       |       |           |
               | Sys |       |  IO   | Bulk Pwr  |
               |Bkpln| Cells |Chassis| Supplies  |
               |     |0 1 2 3| 0   1 |0 1 2 3 4 5|
---------------+-----+-------+-------+-----------+
 Populated     |  *  |*      | *   * |* * * *    |
 Enabled       |  *  |       |       |* * * *    |
 Power OK      |  *  |       |       |* * * *    |
 Warning/Fault |     |       |       |           |
 Attention LED |     |       |       |           |

 AC Line status:
  Line A0 Present
  Line B0 NOT PRESENT
  Line A1 Present
  Line B1 NOT PRESENT
               -- Press <CR> to continue, or 'Q' to Quit --
Front Fan Speed   : normal
Rear Fan speed    : normal
I/O Bay Fan Speed : normal
Temperature state : normal
Main Fan Redundancy   : redundant
I/O Fan Redundancy    : redundant
Overtemp Shutdown Enabled

            |     BPS     |     PCI     |
            |     Fans    |     Fans    |
            | 0 1 2 3 4 5 | 0 1 2 3 4 5 |
+-----------+-------------+-------------+
  Populated | * * * *     | * * * * * * |
  Failing   |             |             |
  Failed    |             |             |
            |              Standby/Main Fans            |
            |                     1 1 1 1 1 1 1 1 1 1 2 |
            | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 |
+-----------+-------------------------------------------+
  Populated | * * * * * * * * * * * * * * * * * * * * * |
  Failing   |                                           |
  Failed    |                                           |
            |  Cell Fans |
            |   CPU    C |
            | 0 1 2 3  C |
+-----------+------------+
 Cell 0     |            |
  Populated |          * |
  Failing   |            |
  Failed    |            |
[mp] MP:CM>
[mp] MP:CM> cm
Unknown Command.  Type HE for a list of commands.
[mp] MP:CM> pe
This command controls power enable to a hardware device.
    T - Cabinet
    C - Cell
    P - IO Chassis
    X - Complex
    R - Partition
        Select Device: p
    Enter IO Chassis number: 0
    The power state is OFF for PCI Domain 0.
    In what state do you want the power? (ON/OFF) on
[mp] MP:CM> pe
This command controls power enable to a hardware device.
    T - Cabinet
    C - Cell
    P - IO Chassis
    X - Complex
    R - Partition
        Select Device: p
    Enter IO Chassis number: 1
    The power state is OFF for PCI Domain 1.
    In what state do you want the power? (ON/OFF) on
[mp] MP:CM> pe
This command controls power enable to a hardware device.
    T - Cabinet
    C - Cell
    P - IO Chassis
    X - Complex
    R - Partition
        Select Device: c
    Enter cell number: 0
    The power state is OFF for Cell 0.
    In what state do you want the power? (ON/OFF) on

Cell board power disabled - incompatible processors/cell.
Error attempting to power ON : RtnCodeFailure
[mp] MP:CM> ps
Display detailed status of the selected MP bus device.

The following MP bus devices were found:
+---+----+-----+-------+-------+-----------+
|   |    |     |       |       |           |
|   |    | Sys |       |  IO   | Bulk Pwr  |
|Cab| MP |Bkpln| Cells |Chassis| Supplies  |
| # |M  S|     |0 1 2 3| 0   1 |0 1 2 3 4 5|
+---+----+-----+-------+-------+-----------+
| 0 |*   |  *  |*      | *   * |* * * *    |
+---+----+-----+-------+-------+-----------+
You may display detailed power and hardware status for the following items:
    T - Cabinet
    S - System Backplane
    G - MP (Core I/O)
    P - IO Chassis
    C - Cell
        Select Device: p
    Enter IO Chassis number: 0
HW status for IO Chassis 0 : No Fault Detected
Local Power Monitor Version is 2.000
Power is on, no fault
Power Module's   Brick 0  VRM 1  VRM 3
  Present      :   *        *      *
  Module OK    :   *        *      *
  Enabled      :   *        *      *
[mp] MP:CM> pe
This command controls power enable to a hardware device.
    T - Cabinet
    C - Cell
    P - IO Chassis
    X - Complex
    R - Partition
        Select Device: p
    Enter IO Chassis number: 1
    The power state is ON for PCI Domain 1.
    In what state do you want the power? (ON/OFF) on
[mp] MP:CM> ps
Display detailed status of the selected MP bus device.

The following MP bus devices were found:
+---+----+-----+-------+-------+-----------+
|   |    |     |       |       |           |
|   |    | Sys |       |  IO   | Bulk Pwr  |
|Cab| MP |Bkpln| Cells |Chassis| Supplies  |
| # |M  S|     |0 1 2 3| 0   1 |0 1 2 3 4 5|
+---+----+-----+-------+-------+-----------+
| 0 |*   |  *  |*      | *   * |* * * *    |
+---+----+-----+-------+-------+-----------+
You may display detailed power and hardware status for the following items:
    T - Cabinet
    S - System Backplane
    G - MP (Core I/O)
    P - IO Chassis
    C - Cell
        Select Device: p
    Enter IO Chassis number: 1
HW status for IO Chassis 1 : No Fault Detected
Local Power Monitor Version is 2.000
Power is on, no fault
Power Module's   Brick 1  VRM 2  VRM 4
  Present      :   *        *      *
  Module OK    :   *        *      *
  Enabled      :   *        *      *
[mp] MP:CM> ps
Display detailed status of the selected MP bus device.

The following MP bus devices were found:
+---+----+-----+-------+-------+-----------+
|   |    |     |       |       |           |
|   |    | Sys |       |  IO   | Bulk Pwr  |
|Cab| MP |Bkpln| Cells |Chassis| Supplies  |
| # |M  S|     |0 1 2 3| 0   1 |0 1 2 3 4 5|
+---+----+-----+-------+-------+-----------+
| 0 |*   |  *  |*      | *   * |* * * *    |
+---+----+-----+-------+-------+-----------+
You may display detailed power and hardware status for the following items:
    T - Cabinet
    S - System Backplane
    G - MP (Core I/O)
    P - IO Chassis
    C - Cell
        Select Device: c
    Enter cell number: 0
HW status for Cell 0 : FAILURE DETECTED
Power status : NOT VIABLE, no fault
Boot is blocked
PDH memory is not shared
Processor Compatibility : FAULT
RIO cable status : UNDEFINED
RIO cable connection physical location : cannot be determined
Core cell is INVALID
Attention Led is off
PDHC status Leds :  ----
CPU Module Slot    0 1 2 3
 Populated               
 Local 48V Good          
 Power Enabled           
 Power Good              
   (* - True, P - Processor, T - Terminator)
 
DIMMs populated:
0 . . . 4 . . . 8 . . .12 . . .
                               
                                  1 1 1 1 1 1
VRM's           1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
 Present      :                             
 Enabled      :                             
 Pwr Good     :                             

Front Side Bus Freq.    : 250 MHz
CPU Core Freq.          : 0 MHz
CPU Part Number         :
System Boot Rom (SFW) firmware rev   0.000
PDH controller (PDHC) firmware rev 0.000, built THU JAN 01 00:00:00 1970
MICE revision is 0.0
请注意以上红色字体,PCI电源都正常,但是CELL0是无法加电的,再次对电源进行分析仍然没问题存在。
 
以上情况,只有一个CELL0存在,无法进行交叉测试,如果有多个备件,工程师可以考虑交叉测试,确定故障备件,然后进行更换。
 
最后确定故障备件为CELL板。
 
也许有的工程师会问,为什么不怀疑CPU或者内存呢?、
这里说下我的思路:因为这里,CELL是加不上电的,即使可能是还有别的部件损坏,那么我们通常先把最有可能的换掉,再来考虑其他故障问题。
 
 
最后解决:更换CELL板,问题解决。
  
 

本文出自 “学步” 博客,转载请与作者联系!

你可能感兴趣的:(职场,无法启动,休闲,CELL板无法加点,RX8620)