HP VA7400存储故障诊断,数据恢复有可能
环境:VA7400
两个盘笼,每个盘笼分别14块硬盘 总共28块硬盘,分别做了两个RAID GROUP 每个RAIDGOURP是AUTORAID(RAID 0+1)
其中无法读取数据的VG(一读到这个VG里LV里的某些固定的文件的时候,主机HUANG住,存储不停的在扫描硬盘,硬件已经判定有不止一块有坏道的盘),该VG总共有两个LUN组成 分别在存储两个RAIDGROUP上,我们做过DD测试,当在其中一个RAIDGRUOP中用DD LUN的时候 正常, 但在另外个RAIDGROUP中DD LUN的时候 发生主机HUANG住 存储不停扫瞄硬盘(现象跟读取那个VG里的数据情况一样)
所以现在可以肯定的是,存储两个RAIDGROUP中,有一个是完全正常的,另一个RAIDGROUP有问题,而那个VG中的两个LUN,正好有一个LUN在有问题的RAIDGROUP中.
另外,这个有问题的RAIDGROUP,同时坏过两块盘(控制器报出来的)
我们需要的数据也正好在那个VG上
附件是硬件日志以及LUN信息的分布,您可以参考一下
{本文由达思总工程师覃廷良撰写,转载请注明出处(http://www.bnuol.com 达思数据恢复技术博客)}
以下截取日志片断
SUB-SYSTEM SETTINGS
RAID Level:___________________________HPAutoRAID
Auto Format Drive:____________________On
Hang Detection:_______________________On
Capacity Depletion Threshold:_________100%
Queue Full Threshold Maximum:_________4096
Enable Optimize Policy:_______________True
Enable Manual Override:_______________False
Manual Override Destination:__________False
Read Cache Disable:___________________False
Rebuild Priority:_____________________Low
Security Enabled:_____________________False
Shutdown Completion:__________________0
Subsystem Type ID:____________________1
Unit Attention:_______________________True
Volume Set Partition (VSpart):________False
Write Cache Enable:___________________True
Write Working Set Interval:___________8640
Enable Prefetch:______________________False
Disable Secondary Path Presentation:__False
Enclosure at M
Enclosure ID__________________________0
Enclosure Status______________________Failed
Enclosure Type________________________HP StorageWorks Virtual Array 7400
Node WWN______________________________50060b000014e7d6
FRU HW COMPONENT IDENTIFICATION ID STATUS
===========================================================================
M Enclosure 00SG223J0074 Failed
M/P1 Power Supply 94020HE00808 Good
M/P2 Power Supply 94020HE00717 Good
M/MP1 MidPlane 000601310041 Good
M/C2 Controller 00PR05B50445 Good
M/C2.H1 Host Port
M/C2.J1 BackEnd Port
M/C2.B1 Battery 40133:MOLTECHPS:NI2040:2002/7/19 Good
M/C2.PM1 Processor HP:A6189A:HP19 Good
M/C2.M1 DIMM 512 Good
M/C1 Controller Failed
M/D1 Disk 3EK1NM33 Good
M/D2 Disk 3EK0MF81 Good
M/D3 Disk 3EK1NXQ6 Good
M/D4 Disk 3HZ0G1QD Good
M/D5 Disk 3EK1NQEM Good
M/D6 Disk 3EK1NX69 Good
M/D7 Disk 3EK1NMZT Good
M/D8 Disk 3EK10AZS Good
M/D9 Disk 3KP17QL80000 Good
M/D10 Disk 3HZ92CQ9 Good
M/D11 Disk 3EK1KDSJ Good
M/D12 Disk 3HZ0MVX7 Good
M/D13 Disk 3EK24C4H Good
M/D14 Disk 3EK1NHSA Good
Enclosure at JA0
Enclosure ID__________________________0
Enclosure Status______________________Good
Enclosure Type________________________HP StorageWorks Disk System DS2405
Node WWN______________________________50060b0000195066
FRU HW COMPONENT IDENTIFICATION ID STATUS
===========================================================================
JA0 Enclosure SG22200001 Good
JA0/MP1 MidPlane SG22200001 Good
JA0/P1 Power Supply 62020FD01285 Good
JA0/P2 Power Supply 62020FD01267 Good
JA0/C2 LCC R25DK1444151 Good
JA0/C2.H1 Front Port
JA0/D1 Disk 3EK1MCCP Good
JA0/D2 Disk 3EK01ZQN Good
JA0/D3 Disk 3EK1NJNS Good
JA0/D4 Disk 3EK1NL2T Good
JA0/D5 Disk 3EK1NFRN Good
JA0/D6 Disk 3EK1N23S Good
JA0/D7 Disk 3EK1NLZL Good
JA0/D8 Disk 3EK1NFJM Good
JA0/D9 Disk 3EK1SBD8 Good
JA0/D10 Disk 3HZY5F6L Good
JA0/D11 Disk 3EK1NVJZ Good
JA0/D12 Disk 3EK1NQ2J Good
JA0/D13 Disk 3EK1NLX5 Good
JA0/D14 Disk 3EK16N2S Good
Disk at JA0/D9:
Status:_______________________________Good
Disk State:___________________________Included
Vendor ID:____________________________HP 73.4G
Product ID:___________________________ST373405FC
Product Revision:_____________________HP09
Data Capacity:________________________66.757 GB (140000000 blocks)
Block Length:_________________________520 bytes
Address:______________________________8
Node WWN:_____________________________20000004cfa1a362
Initialize State:_____________________Ready
Redundancy Group:_____________________1
Volume Set Serial Number:_____________000027C200000003
Serial Number:________________________3EK1SBD8
Firmware Revision:____________________HP09
Recovery Maps are on this disk.
Disk at JA0/D13:
Status:_______________________________Good
Disk State:___________________________Included
Vendor ID:____________________________HP 73.4G
Product ID:___________________________ST373405FC
Product Revision:_____________________HP09
Data Capacity:________________________66.757 GB (140000000 blocks)
Block Length:_________________________520 bytes
Address:______________________________12
Node WWN:_____________________________20000004cf98f82c
Initialize State:_____________________Ready
Redundancy Group:_____________________1
Volume Set Serial Number:_____________000027C200000003
Serial Number:________________________3EK1NLX5
Firmware Revision:____________________HP09
Recovery Maps are on this disk.
初步看了日志,HP VA7400存储使用的硬盘采用520字节进行格式化,
(Block Length:_________________________520 bytes),如果要进行数据恢复,则必须把硬盘镜像出来,然后进行Raid组合。
HP VA7400,采用AutoRaid方式,然后划分出LUN,LUN空间的分配不是线性平行分配,而是由Block Map方式记录LUN空间分配地址,即便把Raid原样组合出来,还不能完全确定LUN的空间分配,要弄清楚LUN的空间分配,就得查看分析MetaData所在的硬盘,一般会有两个硬盘存放MetaData(该硬盘被标记上Recovery Maps are on this disk.),这个MetaData的存储方式,除了HP VA 系列存储设计研发人员知道,别人如果没有测试环境研究,没办法的到准确信息。
从本故障信息看,很有可能是MetaData硬盘出现了异常,导致控制器上的信息跟硬盘上的信息不一致,读取LUN时,Map信息不准确或者地址溢出,死机或者自动重启是必然的。
既然原因分析出来,就去验证这两块MetaData硬盘到底是不是良好的,从而下手数据恢复技术操作。