1)
环境介绍
OS
:
redhat enterprise Linux 4.6 x86
Cluster:RHCS
(
2 nodes
)
多路径软件:
emc powerpath 5.1 for linux
Storage:EMC AX4-5 EMC CX300
Ax4-5
有一个
LUN
映射给主机,
CX300
有两个
LUN
映射给主机
2)
故障描述
在磁阵上配置好
LUN
映射后,先后重新两节点服务器。两节点都认到所映射存储单元(
LUN
)。运行
fdisk �Cl
查看
LUN
在主机(
OS
)看到的设备名。发现两节点认到的设备名不一致。其中,
node1
认到
emcpowera
、
emcpowerc
、
emcpowerd
;
node2
认到
emcpowera
、
emcpowerb
、
emcpowerc
;根据所划分空间的大小,可知其中
node1 emcpowera对应node2 emcpowera,node1 emcpowerc对应node2 emcpowerb,node1 emcpowerd对应node2 emcpowerc。
由于两节点要做
cluster
,在群集中配置共享存储时,要求两节点对识别到的
LUN
要有相同的设备名。
3)
分析排错
node2
识别到的盘符是对的;
node1
有问题,不知道为何把
emcpowerb
搞没了。
在
node1
上执行
powermt display dev=all
emcpadm getfreepseudos �Cn 5
发现
node1
的
emcpowerb
并在列表中。
由于业务系统上线在即,没有更多的时间去考虑和分析。当时想到两种思路,一是删除
node1
上识别到的路径,重启机器看看是否能解决;二是,将
node2
的盘符手动修改为和
node1
一样。
排错思路一操作:
powermt remove dev=all //
删除当前认到的路径
powermt config //
路径重认
powermt display dev=all
reboot
问题依然存在,没有得到解决;
排错思路二操作:
在
node2
上操作
emcpadm getfreepseudos //
发现
emcpowerd
可用;
emcpadm �Cs emcpowerc �Ct emcpowerd
emcpadm �Cs emcpowerb �Ct emcpowerc
powermt save
Reboot
至此,两节点都认到
emcpowera,emcpowerc,emcpowerd
,问题解决。
4)
结论
由于
node1
之前做测试时,曾有
emcpowerb
存在过,在移走该设备后,
powerpath
配置数据库未能及时更新。导致
emcpowerb
表现为占用。
后续我找了相关的文章,发现通过强制删除
powerpath
配置的文件方式尝试进行解决。操作步骤如下:
停止
powerpath
服务
/etc/init.d/PowerPath stop
保存当前配置文件的备份
# cp /etc/powermt.custom /etc/powermt.custom.old_config
# cp /etc/emcp_devicesDB.dat /etc/emcp_devicesDB.dat.old_config
# cp /etc/emcp_devicesDB.idx /etc/emcp_devicesDB.idx.old_config
删除
powerpath
相关配置文件
# rm /etc/powermt.custom /etc/emcp_devicesDB.dat /etc/emcp_devicesDB.idx
重启
powerpath
服务
# /etc/init.d/PowerPath start
保持
powerpath
配置
# powermt save
5)
参考
In some cases, during installation of PowerPath and device reconfiguration, a server may skip a few "emcpower" devices due to devices that were removed. PowerPath keeps track of devices and makes sure that the emcpower device names remains the same regardless of the underlying Linux /dev/sd# device.
Fix:
(
steps for powerpath 4.x
)
1) Make sure all I/O is stopped and all of the file systems to the array are unmounted.
2) Stop PowerPath
# /etc/init.d/PowerPath stop
3) Make a backup copy of the current PowerPath custom file just in case
# cp /etc/powermt.custom /etc/powermt.custom.old_config
4) Make a backup copy of the current PowerPath config dat file...just in case
# cp /etc/emcp_devicesDB.dat /etc/emcp_devicesDB.dat.old_config
5) Make a backup copy of the current PowerPath config idx file...just in case
# cp /etc/emcp_devicesDB.idx /etc/emcp_devicesDB.idx.old_config
6) Remove the old config files # rm /etc/powermt.custom /etc/emcp_devicesDB.dat /etc/emcp_devicesDB.idx
7) Remove the /etc/emc/archive directory.
# rm �Cr /etc/emc/arvhive
8) Start PowerPath
# /etc/init.d/PowerPath start
9) Save the new configuration
# powermt save
In some cases with PowerPath 4.x this process will clean up the PowerPath devices but they still will not be discovered in Bus-Target-LUN order so if you are trying to synchronize emcpower device numbers between two cluster nodes it may not work. In this case it is recommended that you present the devices to the node one at a time in the order you want them to appear.
root cause 2
:
Devices were not added to the nodes in the same order
Fix:
(
steps for powerpath 4.x
)
Use the emcpadm command to change the emcpower pseudo devices to the desired names.
In order to "fix" the discrepancy between the two nodes the emcpadm command can be used.
1
)
Use the command below in order to determine the emcpower devices that are already in use
# emcpadm getused
PowerPath pseudo device names in use:
Pseudo Device Name Major# Minor#
emcpowera 232 0
emcpowerb 232 16
emcpowerc 232 32
emcpowerd 232 48
emcpowere 232 64
emcpowerg 232 96
2
)
Use the command below in order to determine the emcpower devices that are available
# emcpadm getfree -n 5 -b emcpowera
PowerPath pseudo device names not in use:
Pseudo Device Name Major# Minor#
emcpowerf 232 80
emcpowerh 232 112
emcpoweri 232 128
emcpowerj 232 144
emcpowerk 232 160
3
)
Use the command below to rename a device
# emcpadm rename -s emcpowerg -t emcpowerf
4
)
The "emcpadm getused" command can now be used again to check the devices after the rename
# emcpadm getused
PowerPath pseudo device names in use:
Pseudo Device Name Major# Minor#
emcpowera 232 0
emcpowerb 232 16
emcpowerc 232 32
emcpowerd 232 48
emcpowere 232 64
emcpowerf 232 80
5
)
Note
:
In order to make sure that the actual volumes match between the two cluster nodes the "powermt display dev=all" command can be used from each node in the cluster for comparison.
# powermt display dev=all
Pseudo name=emcpowerf
CLARiiON ID=WRE00021500573 [Linux103]
Logical device ID=6006016022470A0084D8358B528BD911 [LUN 10]
state=alive; policy=CLAROpt; priority=0; queued-IOs=0
Owner: default=SP B, current=SP B
==============================================================================
---------------- Host --------------- - Stor - -- I/O Path - -- Stats -
## HW Path I/O Paths Interf. Mode State Q-IOs errors
==============================================================================
2 lpfc sdg SP A0 active alive 0 0
3 lpfc sdm SP B0 active alive 0 0