一、故障信息
Log摘要 #dmesg Mar 20 11:56:06 shtifme SUNW,UltraSPARC-IV+: [ID 929070 kern.info] NOTICE: [AFT0] Corrected system bus (CE) Event detected by CPU0 at TL=0, errID 0x006c59a9.5c742270 Mar 20 11:56:06 shtifme AFSR 0x00000002 Mar 20 11:56:06 shtifme Fault_PC 0xf840a860 Esynd 0x01b1 Slot B: J8000 Mar 20 11:56:06 shtifme SUNW,UltraSPARC-IV+: [ID 992219 kern.info] [AFT0] errID 0x006c59a9.5c742270 Corrected Memory Error on Slot B: J8000 is Intermittent Mar 20 11:56:06 shtifme SUNW,UltraSPARC-IV+: [ID 603939 kern.info] [AFT0] errID 0x006c59a9.5c742270 Data Bit 70 was in error and corrected Mar 20 11:56:06 shtifme SUNW,UltraSPARC-IV+: [ID 923891 kern.info] [AFT2] errID 0x006c59a9.5c742270 E$tag PA=0x000000b1.e324a940 does not match AFAR=0x000000b1.39a4a940 Mar 20 11:56:06 shtifme SUNW,UltraSPARC-IV+: [ID 777630 kern.info] [AFT2] errID 0x006c59a9.5c742270 PA=0x000000b1.e324a940 Mar 20 11:56:06 shtifme E$tag 0x00000163.c600082c E$state_0 Modified Mar 20 11:56:06 shtifme SUNW,UltraSPARC-IV+: [ID 895151 kern.info] [AFT2] E$Data (0x00) 0x00000000.00000080 0x00000000.00000000 ECC 0x03e Mar 20 11:56:06 shtifme SUNW,UltraSPARC-IV+: [ID 895151 kern.info] [AFT2] E$Data (0x10) 0x00000000.00000000 0x00000000.00000000 ECC 0x000 Mar 20 11:56:06 shtifme SUNW,UltraSPARC-IV+: [ID 895151 kern.info] [AFT2] E$Data (0x20) 0x00010000.058af8e0 0x00000000.00000000 ECC 0x0ca Mar 20 11:56:06 shtifme SUNW,UltraSPARC-IV+: [ID 895151 kern.info] [AFT2] E$Data (0x30) 0x00000000.00000000 0x00000300.08efa958 ECC 0x04c Mar 20 11:56:06 shtifme SUNW,UltraSPARC-IV+: [ID 923891 kern.info] [AFT2] errID 0x006c59a9.5c742270 E$tag PA=0x000000b1.e224a940 does not match AFAR=0x000000b1.39a4a940 Mar 20 11:56:06 shtifme SUNW,UltraSPARC-IV+: [ID 777630 kern.info] [AFT2] errID 0x006c59a9.5c742270 PA=0x000000b1.e224a940 Mar 20 11:56:06 shtifme E$tag 0x00000163.c400082c E$state_0 Modified Mar 20 11:56:06 shtifme SUNW,UltraSPARC-IV+: [ID 895151 kern.info] [AFT2] E$Data (0x00) 0x00000000.0c3f0000 0x00000000.00000000 ECC 0x074 Mar 20 11:56:06 shtifme SUNW,UltraSPARC-IV+: [ID 895151 kern.info] [AFT2] E$Data (0x10) 0x00000000.00000080 0x00000000.00000000 ECC 0x03e Mar 20 11:56:06 shtifme SUNW,UltraSPARC-IV+: [ID 895151 kern.info] [AFT2] E$Data (0x20) 0x00000000.00000000 0x00000000.00000000 ECC 0x000 Mar 20 11:56:06 shtifme SUNW,UltraSPARC-IV+: [ID 895151 kern.info] [AFT2] E$Data (0x30) 0x00010000.0588d6be 0x00000000.00000000 ECC 0x04d Mar 20 11:56:06 shtifme SUNW,UltraSPARC-IV+: [ID 923891 kern.info] [AFT2] errID 0x006c59a9.5c742270 E$tag PA=0x000000b1.e1a4a940 does not match AFAR=0x000000b1.39a4a940 Mar 20 11:56:06 shtifme SUNW,UltraSPARC-IV+: [ID 777630 kern.info] [AFT2] errID 0x006c59a9.5c742270 PA=0x000000b1.e1a4a940 Mar 20 11:56:06 shtifme E$tag 0x00000163.c300082c E$state_0 Modified Mar 20 11:56:06 shtifme SUNW,UltraSPARC-IV+: [ID 895151 kern.info] [AFT2] E$Data (0x00) 0x00000700.12e4a918 0x00000300.956b2000 ECC 0x080 Mar 20 11:56:06 shtifme SUNW,UltraSPARC-IV+: [ID 895151 kern.info] [AFT2] E$Data (0x10) 0x00000001.00000000 0x00000000.00000000 ECC 0x128 Mar 20 11:56:06 shtifme SUNW,UltraSPARC-IV+: [ID 895151 kern.info] [AFT2] E$Data (0x20) 0x00000000.00000000 0x00000000.00000000 ECC 0x000 Mar 20 11:56:06 shtifme SUNW,UltraSPARC-IV+: [ID 895151 kern.info] [AFT2] E$Data (0x30) 0x00000000.00000000 0x02020000.0587c5ad ECC 0x097 Mar 20 11:56:06 shtifme SUNW,UltraSPARC-IV+: [ID 923891 kern.info] [AFT2] errID 0x006c59a9.5c742270 E$tag PA=0x000000b1.e2a4a940 does not match AFAR=0x000000b1.39a4a940 Mar 20 11:56:06 shtifme SUNW,UltraSPARC-IV+: [ID 777630 kern.info] [AFT2] errID 0x006c59a9.5c742270 PA=0x000000b1.e2a4a940 Mar 20 11:56:06 shtifme E$tag 0x00000163.c500082c E$state_0 Modified Mar 20 11:56:06 shtifme SUNW,UltraSPARC-IV+: [ID 895151 kern.info] [AFT2] E$Data (0x00) 0x00000000.00000000 0x00000000.00000080 ECC 0x02c Mar 20 11:56:06 shtifme SUNW,UltraSPARC-IV+: [ID 895151 kern.info] [AFT2] E$Data (0x10) 0x00000000.00000000 0x00000000.00000000 ECC 0x000 Mar 20 11:56:06 shtifme SUNW,UltraSPARC-IV+: [ID 895151 kern.info] [AFT2] E$Data (0x20) 0x00000000.00000000 0x00020000.0589e7cf ECC 0x078 Mar 20 11:56:06 shtifme SUNW,UltraSPARC-IV+: [ID 895151 kern.info] [AFT2] E$Data (0x30) 0x00000000.00000000 0x00000000.00000000 ECC 0x000 Mar 20 11:56:06 shtifme SUNW,UltraSPARC-IV+: [ID 263638 kern.info] [AFT2] errID 0x006c59a9.5c742270 PA=0x000000b1.39a4a940 Mar 20 11:56:06 shtifme L2$tag 0x000000b1.39a05192 L2$state Exclusive Mar 20 11:56:06 shtifme SUNW,UltraSPARC-IV+: [ID 803991 kern.info] [AFT2] L2$Data (0x00) 0x00000000.f840a840 0xf844bf10.00000000 ECC 0x1fc Mar 20 11:56:06 shtifme SUNW,UltraSPARC-IV+: [ID 803991 kern.info] [AFT2] L2$Data (0x10) 0x00000001.a7c00d58 0x00000000.00001538 ECC 0x02b Mar 20 11:56:06 shtifme SUNW,UltraSPARC-IV+: [ID 803991 kern.info] [AFT2] L2$Data (0x20) 0xab64a9b0.a7c01280 0xa7c01260.a7c01270 ECC 0x167 Mar 20 11:56:06 shtifme SUNW,UltraSPARC-IV+: [ID 803991 kern.info] [AFT2] L2$Data (0x30) 0x00000018.ff010025 0x00180019.00030000 ECC 0x0ec Mar 20 11:56:06 shtifme SUNW,UltraSPARC-IV+: [ID 929717 kern.info] [AFT2] D$ data not available Mar 20 11:56:06 shtifme SUNW,UltraSPARC-IV+: [ID 335345 kern.info] [AFT2] I$ data not available Mar 20 11:56:06 shtifme SUNW,UltraSPARC-IV+: [ID 481181 kern.warning] WARNING: [AFT1] Uncorrectable system bus (UE) Event detected by CPU17 User Data Access at TL=0, errID 0x006c59a9.76af3ecc Mar 20 11:56:06 shtifme AFSR 0x00000004 Mar 20 11:56:06 shtifme Fault_PC 0xf84165bc Esynd 0x01e9 Slot B: J7900 J7901 J8001 J8000 Mar 20 11:56:06 shtifme SUNW,UltraSPARC-IV+: [ID 120151 kern.notice] [AFT1] errID 0x006c59a9.76af3ecc Two Bits were in error Mar 20 11:56:06 shtifme SUNW,UltraSPARC-IV+: [ID 349853 kern.info] [AFT2] errID 0x006c59a9.76af3ecc E$tag PA=0x000000a1.58a01b40 does not match AFAR=0x000000b1.39a01b40 Mar 20 11:56:06 shtifme SUNW,UltraSPARC-IV+: [ID 833182 kern.info] [AFT2] errID 0x006c59a9.76af3ecc PA=0x000000a1.58a01b40 Mar 20 11:56:06 shtifme SUNW,UltraSPARC-IV+: [ID 215305 kern.warning] WARNING: [AFT1] Uncorrectable system bus (UE) Event detected by CPU2 Privileged Data Access at TL=0, errID 0x006c59a9.799ae668 Mar 20 11:56:06 shtifme AFSR 0x00100006 Mar 20 11:56:06 shtifme Fault_PC 0x105009c Esynd 0x01e9 Slot B: J7900 J7901 J8001 J8000 Mar 20 11:56:07 shtifme SUNW,UltraSPARC-IV+: [ID 605688 kern.notice] [AFT1] errID 0x006c59a9.799ae668 Two Bits were in error Mar 20 11:56:07 shtifme SUNW,UltraSPARC-IV+: [ID 828494 kern.info] [AFT2] errID 0x006c59a9.799ae668 E$tag PA=0x000000b1.d8247940 does not match AFAR=0x000000b1.d9a47940 Mar 20 11:56:07 shtifme SUNW,UltraSPARC-IV+: [ID 373058 kern.info] [AFT2] errID 0x006c59a9.799ae668 PA=0x000000b1.d8247940 Mar 20 11:56:07 shtifme E$tag 0x00000163.b0000bdc E$state_0 Modified Mar 20 11:56:07 shtifme SUNW,UltraSPARC-IV+: [ID 895151 kern.info] [AFT2] E$Data (0x00) 0x00000000.00000000 0x00000000.00000000 ECC 0x000 Mar 20 11:56:07 shtifme SUNW,UltraSPARC-IV+: [ID 895151 kern.info] [AFT2] E$Data (0x10) 0x00010000.05038102 0x00000000.00000000 ECC 0x168 Mar 20 11:56:07 shtifme SUNW,UltraSPARC-IV+: [ID 895151 kern.info] [AFT2] E$Data (0x20) 0x00000000.00000000 0x00000300.09c32e70 ECC 0x147 Mar 20 11:56:07 shtifme SUNW,UltraSPARC-IV+: [ID 895151 kern.info] [AFT2] E$Data (0x30) 0x00000700.127d8378 0x00000700.0987ba40 ECC 0x1f0 Mar 20 11:56:07 shtifme SUNW,UltraSPARC-IV+: [ID 828494 kern.info] [AFT2] errID 0x006c59a9.799ae668 E$tag PA=0x000000b1.d7247940 does not match AFAR=0x000000b1.d9a47940 Mar 20 11:56:07 shtifme SUNW,UltraSPARC-IV+: [ID 373058 kern.info] [AFT2] errID 0x006c59a9.799ae668 PA=0x000000b1.d7247940 Mar 20 11:56:07 shtifme E$tag 0x00000163.ae000bdc E$state_0 Modified Mar 20 11:56:07 shtifme SUNW,UltraSPARC-IV+: [ID 895151 kern.info] [AFT2] E$Data (0x00) 0x00000000.00000080 0x00000000.00000000 ECC 0x03e Mar 20 11:56:07 shtifme SUNW,UltraSPARC-IV+: [ID 895151 kern.info] [AFT2] E$Data (0x10) 0x00000000.00000000 0x00000000.00000000 ECC 0x000 Mar 20 11:56:07 shtifme SUNW,UltraSPARC-IV+: [ID 895151 kern.info] [AFT2] E$Data (0x20) 0x00010000.05015ee0 0x00000000.00000000 ECC 0x138 Mar 20 11:56:07 shtifme SUNW,UltraSPARC-IV+: [ID 895151 kern.info] [AFT2] E$Data (0x30) 0x00000000.00000000 0x00000300.0edd2678 ECC 0x1dc Mar 20 11:56:07 shtifme SUNW,UltraSPARC-IV+: [ID 828494 kern.info] [AFT2] errID 0x006c59a9.799ae668 E$tag PA=0x000000b0.b5a47940 does not match AFAR=0x000000b1.d9a47940 Mar 20 11:56:07 shtifme SUNW,UltraSPARC-IV+: [ID 373058 kern.info] [AFT2] errID 0x006c59a9.799ae668 PA=0x000000b0.b5a47940 Mar 20 11:56:07 shtifme E$tag 0x00000161.6b000bd8 E$state_0 Invalid Mar 20 11:56:07 shtifme SUNW,UltraSPARC-IV+: [ID 895151 kern.info] [AFT2] E$Data (0x00) 0x6346756e.6374696f 0x6e3d3130.31342c00 ECC 0x051 Mar 20 11:56:07 shtifme SUNW,UltraSPARC-IV+: [ID 895151 kern.info] [AFT2] E$Data (0x10) 0x00000088.4e202920 0x19d708c8.69636549 ECC 0x037 Mar 20 11:56:07 shtifme SUNW,UltraSPARC-IV+: [ID 895151 kern.info] [AFT2] E$Data (0x20) 0xffffffff.79726974 0x1a03e810.00000000 ECC 0x080 Mar 20 11:56:07 shtifme SUNW,UltraSPARC-IV+: [ID 895151 kern.info] [AFT2] E$Data (0x30) 0x19ee3fb8.00000015 0x00000000.00000000 ECC 0x1ba Mar 20 11:56:07 shtifme SUNW,UltraSPARC-IV+: [ID 828494 kern.info] [AFT2] errID 0x006c59a9.799ae668 E$tag PA=0x000000b1.d7a47940 does not match AFAR=0x000000b1.d9a47940 Mar 20 11:56:07 shtifme SUNW,UltraSPARC-IV+: [ID 373058 kern.info] [AFT2] errID 0x006c59a9.799ae668 PA=0x000000b1.d7a47940 Mar 20 11:56:07 shtifme E$tag 0x00000163.af000bdc E$state_0 Modified Mar 20 11:56:07 shtifme SUNW,UltraSPARC-IV+: [ID 895151 kern.info] [AFT2] E$Data (0x00) 0x00000000.00000000 0x00000000.00000000 ECC 0x000 Mar 20 11:56:07 shtifme SUNW,UltraSPARC-IV+: [ID 895151 kern.info] [AFT2] E$Data (0x10) 0x00000000.00000000 0x00020000.05026ff1 ECC 0x067 Mar 20 11:56:07 shtifme SUNW,UltraSPARC-IV+: [ID 895151 kern.info] [AFT2] E$Data (0x20) 0x00000000.00000000 0x00000000.00000000 ECC 0x000 Mar 20 11:56:07 shtifme SUNW,UltraSPARC-IV+: [ID 895151 kern.info] [AFT2] E$Data (0x30) 0x00000300.08c95848 0x00000700.0d87f420 ECC 0x03c Mar 20 11:56:07 shtifme SUNW,UltraSPARC-IV+: [ID 421436 kern.info] [AFT2] errID 0x006c59a9.799ae668 PA=0x000000b1.d9a47940 Mar 20 11:56:07 shtifme L2$tag 0x000000b1.d9a058c2 L2$state Exclusive Mar 20 11:56:07 shtifme SUNW,UltraSPARC-IV+: [ID 803991 kern.info] [AFT2] L2$Data (0x00) 0x00000000.00000000 0x00000000.00000000 ECC 0x000 Mar 20 11:56:07 shtifme SUNW,UltraSPARC-IV+: [ID 803991 kern.info] [AFT2] L2$Data (0x10) 0x00000300.08c94018 0x00000000.00000000 ECC 0x0a5 Mar 20 11:56:07 shtifme SUNW,UltraSPARC-IV+: [ID 803991 kern.info] [AFT2] L2$Data (0x20) 0x00000700.094d6cc8 0x00000700.0f89dbf8 ECC 0x0ab Mar 20 11:56:07 shtifme SUNW,UltraSPARC-IV+: [ID 637060 kern.info] [AFT2] L2$Data (0x30) 0x00000700.0ae47900 0x00000700.0ae47950 ECC 0x104 *Bad* Esynd=0x1e9 Mar 20 11:56:07 shtifme SUNW,UltraSPARC-IV+: [ID 929717 kern.info] [AFT2] D$ data not available Mar 20 11:56:07 shtifme SUNW,UltraSPARC-IV+: [ID 444155 kern.warning] WARNING: [AFT1] Corrected system bus (CE) Event detected by CPU2 at TL=0, errID 0x006c59a9.799ae668 Mar 20 11:56:07 shtifme AFSR 0x00100006 Mar 20 11:56:07 shtifme Fault_PC 0x105009c Esynd 0x01e9 INVALID Mar 20 11:56:07 shtifme SUNW,UltraSPARC-IV+: [ID 629729 kern.warning] WARNING: [AFT1] WDU Event detected by CPU2 at TL=0, errID 0x006c59a9.799ae668 Mar 20 11:56:07 shtifme AFSR 0x00000020 Mar 20 11:56:07 shtifme Fault_PC 0x105009c Esynd 0x01e9 Mar 20 11:56:07 shtifme SUNW,UltraSPARC-IV+: [ID 605688 kern.notice] [AFT1] errID 0x006c59a9.799ae668 Two Bits were in error Mar 20 11:56:07 shtifme SUNW,UltraSPARC-IV+: [ID 835873 kern.info] [AFT3] errID 0x006c59a9.799ae668: cannot schedule clearing of error on page 0x000000b1.d9a46000; page not in VM system Mar 20 11:56:07 shtifme SUNW,UltraSPARC-IV+: [ID 946861 kern.warning] WARNING: [AFT1] Corrected system bus (CE) Event detected by CPU0 at TL=0, errID 0x006c59a9.7a7bfb1c Mar 20 11:56:07 shtifme AFSR 0x00000002 Mar 20 11:56:07 shtifme Fault_PC 0x0 Esynd 0x0058 Slot B: J8000 Mar 20 11:56:08 shtifme SUNW,UltraSPARC-IV+: [ID 301897 kern.notice] [AFT1] errID 0x006c59a9.7a7bfb1c Data Bit 68 was in error and corrected Mar 20 11:56:08 shtifme unix: [ID 836849 kern.notice] Mar 20 11:56:08 shtifme ^Mpanic[cpu2]/thread=300095d4680: Mar 20 11:56:08 shtifme unix: [ID 184892 kern.notice] [AFT1] errID 0x006c59a9.799ae668 UE CE WDU Error(s) |
二、故障定位
在log中出现错误信息:
插槽A,CPU0对应的J8000 DIMM插槽和CPU2对应的J7900 J7901 J8001 J8000 DIMM插槽;
插槽B,CPU17对应的 J7900 J7901 J8001 J8000 DIMM插槽。
因CPU0和CPU2同在一个插槽上A,所以涉及到的DIMM插槽有8个:
插槽A中J7900 J7901 J8001 J8000
插槽B中J7900 J7901 J8001 J8000
因为故障是ECC效验错误,涉及到8个DIMM插槽,不代表全部8个DIMM有问题。所以需要进行开机POST自检准确定位故障,根据定位的DIMM插槽位置进行更换处理。
三、准备项
准备确认项 |
||
类型 |
准备项 |
状态 |
硬件 |
笔记本一台 |
已准备就绪 |
串口线一根 |
已准备就绪 |
|
一字、十字螺丝刀各一把 |
已准备就绪 |
|
防静电护腕一个 |
已准备就绪 |
|
新内存4根 |
已准备就绪 |
|
软件 |
||
其它 |
||
四、操作项
操作项列表 |
|||
序号 |
操作项 |
操作内容 |
状态 |
1 |
确认系统关机 |
建议客户应用及业务数据备份 |
|
2 |
使用POST诊断 |
定位系统故障的确切位置 |
|
3 |
佩戴防静电护腕 |
确认已经佩戴防静电护腕,并且防静电护腕连接到机柜上的未涂漆部分 |
|
4 |
断开电源 |
断开主电源和次电源 |
|
5 |
移除服务检修盖 |
||
6 |
拆除处理器板 |
||
7 |
将取下的处理器板放置在防静电的材质表面 |
||
8 |
拆开移除处理器前盖 |
||
9 |
确认更换内存位置 |
||
10 |
从防静电包装中取出内存 |
||
11 |
安装内存 |
||
12 |
重新安装处理器板 |
||
13 |
确认故障影响消失 |
确认新更换的硬件无告警 |
|
确认新的硬件在系统中就绪 |
|||
用户确认应用及业务数据不受影响 |
|||
14 |
收尾 |
清理现场,结束工作 |
五、参考信息