环境描述:
Superdome SX2000服务器一台(HPUX 11.23系统),外接MSA60根盘柜,两块根盘,磁盘设备文件名分别为:c2t3d0,c3t3d0(PV Link);c2t4d0,c3t4d0(pv link).
故障描述:
其中一块根盘c2t3d0在event log中报mdeia error,正常更换根盘后,发现lvlnboot信息无法更新(lvlnboot信息不正确,重启或宕机后机器可能会无法启动)。
分析过程描述:
在更换根盘前,vgdisplay -v vg00 的输出如下:
hostname#[/]vgdisplay -v vg00
--- Volume groups ---
VG Name /dev/vg00
VG Write Access read/write
VG Status available
Max LV 255
Cur LV 10
Open LV 10
Max PV 16
Cur PV 2
Act PV 2
Max PE per PV 4356
VGDA 4
PE Size (Mbytes) 32
Total PE 8712
Alloc PE 4087
Free PE 4625
Total PVG 0
Total Spare PVs 0
Total Spare PVs in use 0
(中间lv详细信息省略)
--- Physical volumes ---
PV Name /dev/dsk/c2t3d0s2
PV Name /dev/dsk/c3t3d0s2 Alternate Link
PV Status available
Total PE 4356
Free PE 4356
Autoswitch On
Proactive Polling On
PV Name /dev/dsk/c3t4d0s2
PV Name /dev/dsk/c2t4d0s2 Alternate Link
PV Status available
Total PE 4356
Free PE 269
Autoswitch On
Proactive Polling On
由上述输出可以看出,vg00总共包括两块pv:c2t3d0s2,c3t3d0s2(pvlink)和c3t4d0s2,c2t4d0s2(pvlink).现在由于
c2t3d0有media error,所以要将其换掉。
在更换根盘前,lvlnboot的输出如下,不知各位有没有发现是否有异常呢?
hostname#[/]lvlnboot -v
Current path "/dev/dsk/c3t3d0s2" is an alternate link, skip.
Current path "/dev/dsk/c2t4d0s2" is an alternate link, skip.
Boot Definitions for Volume Group /dev/vg00:
Physical Volumes belonging in Root Volume Group:
/dev/dsk/c2t3d0s2 (0/0/13/0/0/0/0.0.0.3.0) -- Boot Disk
/dev/dsk/c3t3d0s2 (0/0/2/0/0/0/0.0.0.3.0)
/dev/dsk/c3t4d0s2 (0/0/2/0/0/0/0.0.0.4.0) //此处应该也有Boot Disk才对
/dev/dsk/c2t4d0s2 (0/0/13/0/0/0/0.0.0.4.0)
Boot: lvol1 on: /dev/dsk/c2t3d0s2
/dev/dsk/c3t3d0s2
/dev/dsk/c3t4d0s2
/dev/dsk/c2t4d0s2
Root: lvol3 on: /dev/dsk/c2t3d0s2
/dev/dsk/c3t3d0s2
/dev/dsk/c3t4d0s2
/dev/dsk/c2t4d0s2
Swap: lvol2 on: /dev/dsk/c2t3d0s2
/dev/dsk/c3t3d0s2
/dev/dsk/c3t4d0s2
/dev/dsk/c2t4d0s2
Dump: lvol2 on: /dev/dsk/c2t3d0s2, 0
接下来便是正常的更换根盘步骤,填充EFI,镜像lv等。新根盘设备文件名为:c2t6d0,c3t6d0(pv link);换完根盘后的vg00所包含的pv信息如下:
--- Physical volumes ---
PV Name /dev/dsk/c3t4d0s2
PV Name /dev/dsk/c2t4d0s2 Alternate Link
PV Status available
Total PE 4356
Free PE 269
Autoswitch On
Proactive Polling On
PV Name /dev/dsk/c3t6d0s2
PV Name /dev/dsk/c2t6d0s2 Alternate Link
PV Status available
Total PE 4356
Free PE 4356
Autoswitch On
Proactive Polling On
至此,更换根盘的过程就已经结束了,该是执行lvlnboot -R的时候了,在执行lvlnboot -R前lvlnboot的
输出如下(此时,已经可以看出c3t4d0这快盘的结构有问题了):
hostname#[/]lvlnboot -v
Current path "/dev/dsk/c2t4d0s2" is an alternate link, skip.
Current path "/dev/dsk/c2t6d0s2" is an alternate link, skip.
Boot Definitions for Volume Group /dev/vg00:
Physical Volumes belonging in Root Volume Group:
/dev/dsk/c3t4d0s2 (0/0/2/0/0/0/0.0.0.4.0)
/dev/dsk/c2t4d0s2 (0/0/13/0/0/0/0.0.0.4.0)
/dev/dsk/c3t6d0s2 (0/0/2/0/0/0/0.0.0.6.0) -- Boot Disk
/dev/dsk/c2t6d0s2 (0/0/13/0/0/0/0.0.0.6.0)
No Boot Logical Volume configured
No Root Logical Volume configured
No Swap Logical Volume configured
No Dump Logical Volume configured
上面的输出说明此时的操作系统没有有效的lvlnboot信息,swap,dump,root,boot等lv均未定义,如果此时宕机或重启,则机器肯定无法启动,如果没有备份,可能需要重新安装操作系统!
在执行完lvlnboot -R后,依然无法更新lvlnboot信息。分别执行lvlnboot -r;lvlnboot -b等信息均报错,输出如下:
hostname#[/]lvlnboot -r /dev/vg00/lvol3
lvlnboot: Physical Volume "/dev/dsk/c3t4d0s2" on which Logical
Volume "/dev/vg00/lvol3" resides is not a Boot Physical Volume.
hostname#[/]lvlnboot -d /dev/vg00/lvol2
lvlnboot: A Root Logical Volume must be assigned before
a Dump or Swap Logical Volume can be assigned.
hostname#[/]lvlnboot -s /dev/vg00/lvol2
lvlnboot: A Root Logical Volume must be assigned before
a Dump or Swap Logical Volume can be assigned.
hostname#[/]lvlnboot -R
Volume Group configuration for /dev/vg00 has been saved in /etc/lvmconf/vg00.conf
hostname#[/]lvlnboot -v
Current path "/dev/dsk/c2t4d0s2" is an alternate link, skip.
Current path "/dev/dsk/c2t6d0s2" is an alternate link, skip.
Boot Definitions for Volume Group /dev/vg00:
Physical Volumes belonging in Root Volume Group:
/dev/dsk/c3t4d0s2 (0/0/2/0/0/0/0.0.0.4.0)
/dev/dsk/c2t4d0s2 (0/0/13/0/0/0/0.0.0.4.0)
/dev/dsk/c3t6d0s2 (0/0/2/0/0/0/0.0.0.6.0) -- Boot Disk
/dev/dsk/c2t6d0s2 (0/0/13/0/0/0/0.0.0.6.0)
Root LV not yet configured !! Mirror information will not be displayed
Boot: lvol1 on: /dev/dsk/c3t6d0s2
No Root Logical Volume configured
No Swap Logical Volume configured
No Dump Logical Volume configured
试了很多次,均无法解决问题,由于此机器为客户的生产机,且是重要业务的生产,绝对不允许宕机;后来,我认真的看了上面的其中一句话,也就是上面报错信息中的一句:
Physical Volume "/dev/dsk/c3t4d0s2" on which Logical
Volume "/dev/vg00/lvol3" resides is not a Boot Physical Volume.
这句话的大概意思是说:lvol3这个lv所在的c3t4d0s2这个分区不是一个可启动的PV,即不是一个有效的Boot Disk,为什么系统不认为它是一个有效的Boot Disk呢?其实,这一点在开头就可以看出来了,在没有维修前lvlnboot的输出就只有一块Boot Disk标识(见开头lvlnboot输出中被标红的字体)。
经过详细检查和case跟踪,其他所有原因都排出了(lv镜像,EFI区等),最后发现原因是因为c3t4d0这块盘当初被pvcreate加进vg00时没有加-B参数(即,当初把c3t4d0这块盘加进vg00时,执行的是pvcreate /dev/rdsk/c3t4d0s2,正常的应该是
执行pvcreate -B /dev/rdsk/c3t4d0s2),未加-B参数直接导致盘上没有BDRA区域,且操作系统不认为该盘是Boot Disk。所以lvlnboot的信息一直无法同步。
解决方法:
将c2t4d0s2和它的pv link c3t4d0s2这块盘从vg00中剔除(剔除前需要先将所有lv的mirror从其中拆掉),重新pvreate -B,再加入vg00,再mirror vg00下的所有lv后,问题解决。
解决后,正常后的lvlnboot的输入如下:
hostname#[/]lvlnboot -v
Current path "/dev/dsk/c2t6d0s2" is an alternate link, skip.
Current path "/dev/dsk/c2t4d0s2" is an alternate link, skip.
Boot Definitions for Volume Group /dev/vg00:
Physical Volumes belonging in Root Volume Group:
/dev/dsk/c3t6d0s2 (0/0/2/0/0/0/0.0.0.6.0) -- Boot Disk
/dev/dsk/c2t6d0s2 (0/0/13/0/0/0/0.0.0.6.0)
/dev/dsk/c3t4d0s2 (0/0/2/0/0/0/0.0.0.4.0) -- Boot Disk
/dev/dsk/c2t4d0s2 (0/0/13/0/0/0/0.0.0.4.0)
Boot: lvol1 on: /dev/dsk/c3t6d0s2
/dev/dsk/c2t6d0s2
/dev/dsk/c3t4d0s2
/dev/dsk/c2t4d0s2
Root: lvol3 on: /dev/dsk/c3t6d0s2
/dev/dsk/c2t6d0s2
/dev/dsk/c3t4d0s2
/dev/dsk/c2t4d0s2
Swap: lvol2 on: /dev/dsk/c3t6d0s2
/dev/dsk/c2t6d0s2
/dev/dsk/c3t4d0s2
/dev/dsk/c2t4d0s2
Dump: lvol2 on: /dev/dsk/c3t6d0s2, 0
通过上面标红的字体,大家可以看出,此时系统均已将两块盘标识为Boot Disk,同事lvlnboot的信息也恢复正常。
至此,故障处理结束。