环境:
两台华为服务器,一台主存华为18500,一台备存5300,多路径软件为华为的power path
操作系统是RHEL 6.5,做了集群,集群软件是RHCS,应用为ORACLE,主备模式
已经做完了mirror卷,对应两台设备上的/dev/sdb和/dev/sdc lun==接上一篇《lvm2线性卷转成镜像卷》
现象:
重起两个node之后,发现两个node上/dev/sdb和/dev/sdc互换名字了
node1上
Disk /dev/sdb: 644.2 GB, 644245094400 bytes
255 heads, 63 sectors/track, 78325 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
Disk /dev/sdc: 536.9 GB, 536870912000 bytes
255 heads, 63 sectors/track, 65270 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
node2上
Disk /dev/sdb: 536.9 GB, 536870912000 bytes
255 heads, 63 sectors/track, 65270 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
Disk /dev/sdc: 644.2 GB, 644245094400 bytes
255 heads, 63 sectors/track, 78325 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
在node2上名字是正确的顺序,所以服务运行在node2上
Cluster Status for db @ Wed Apr 18 17:00:02 2018
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
node1 1 Online, rgmanager
node2 2 Online, Local, rgmanager
Service Name Owner (Last) State
------- ---- ----- ------ -----
service:db-oracle node2 started
现在需要在两个node上将多路径聚合出来的名字设置正确,使用udev来解决,解决过程如下:
step1:将rhcs freeze
[root@node1 ~]# clustat
Cluster Status for db @ Thu Apr 19 14:49:00 2018
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
node1 1 Online, Local, rgmanager
node2 2 Online, rgmanager
Service Name Owner (Last) State
------- ---- ----- ------ -----
service:db-oracle node2 started
[root@node1 ~]# clusvcadm -Z db-oracle
Local machine freezing service:db-oracle...Success
[root@node1 ~]# clustat
Cluster Status for db @ Thu Apr 19 14:49:19 2018
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
node1 1 Online, Local, rgmanager
node2 2 Online, rgmanager
Service Name Owner (Last) State
------- ---- ----- ------ -----
service:db-oracle node2 started [Z]
[root@node1 ~]#
step2:在两个node上使用udev绑定lun的名字
注意:
多路径软件必须实现的功能有2个:1 多路径聚合,并在两个node上生成唯一的名字;2 这个名字必须在两个节点上一致
两个node的/dev/sdb和/dev/sdc都是经过多路径聚合后生成的名字。但华为的iscsi多路径软件有问题,只生成了唯一的名字,但在各个node上的名字顺序是不统一的,这说明华为的多路径软件很不好,没有实现多路径软件应有的功能。我们必须手动解决这个问题
在两个node上查看两个lun的属性
[root@node1 ~]# udevadm info --query=env --path=/block/sdb
UDEV_LOG=3
DEVPATH=/devices/up_primary/up_adapter/host15/target15:0:0/15:0:0:1/block/sdb
MAJOR=8
MINOR=16
DEVNAME=/dev/sdb
......
ID_MODEL=XSG1
......
ID_REVISION=4303
......
ID_SERIAL=36a8ca7b100808f4e5f3ad82e00000068
......
[root@node1 ~]# udevadm info --query=env --path=/block/sdc
UDEV_LOG=3
DEVPATH=/devices/up_primary/up_adapter/host15/target15:0:1/15:0:1:1/block/sdc
MAJOR=8
MINOR=32
DEVNAME=/dev/sdc
......
ID_MODEL=HVS85T
......
ID_REVISION=4101
......
ID_SERIAL=364482e5100a340a72a7803c900000079
......
[root@node2 ~]# udevadm info --query=env --path=/block/sdb
UDEV_LOG=3
DEVPATH=/devices/up_primary/up_adapter/host15/target15:0:0/15:0:0:1/block/sdb
MAJOR=8
MINOR=16
DEVNAME=/dev/sdb
......
ID_MODEL=HVS85T
......
ID_REVISION=4101
......
ID_SERIAL=364482e5100a340a72a7803c900000079
......
[root@node2 ~]# udevadm info --query=env --path=/block/sdc
UDEV_LOG=3
DEVPATH=/devices/up_primary/up_adapter/host15/target15:0:1/15:0:1:1/block/sdc
MAJOR=8
MINOR=32
DEVNAME=/dev/sdc
......
ID_MODEL=XSG1
......
ID_REVISION=4303
......
ID_SERIAL=36a8ca7b100808f4e5f3ad82e00000068
......
经过对比,可以发现
node1 /dev/sdb的ID_SERIAL等于node2 /dev/sdc的ID_SERIAL
node1 /dev/sdc的ID_SERIAL等于node2 /dev/sdb的ID_SERIAL
node1
/dev/sdb
ID_SERIAL=36a8ca7b100808f4e5f3ad82e00000068
/dev/sdc
ID_SERIAL=364482e5100a340a72a7803c900000079
node2
/dev/sdb
ID_SERIAL=364482e5100a340a72a7803c900000079
/dev/sdc
ID_SERIAL=36a8ca7b100808f4e5f3ad82e00000068
step3:将数据库关掉
SYS@nbora2> shutdown immediate;
[oracle@node2 ~]$ lsnrctl stop
step4:在step2中找到唯一标识符后,在step3中关库、关监听,之后就可以配置udev了
node1上
[root@node1 ~]# more /etc/udev/rules.d/99-z-multipath.rules
KERNEL=="sd*",SUBSYSTEM=="block",ENV{ID_MODEL}=="HVS85T",ENV{ID_REVISION}=="4101",ENV{ID_SERIAL}=="364482e5100a340a72a7803c900000079",NAME="sdb"
KERNEL=="sd*",SUBSYSTEM=="block",ENV{ID_MODEL}=="XSG1",ENV{ID_REVISION}=="4303",ENV{ID_SERIAL}=="36a8ca7b100808f4e5f3ad82e00000068",NAME="sdc"
node2上相同,把/etc/udev/rules.d/99-z-multipath.rules拷贝到node2的/etc/udev/rules.d中
之后在两个node上分别执行如下命令即可==重新识别所有udev规则
start_udev
注意:
我之前在两个node的udev中只写了ENV{ID_SERIAL},重启后发现还是乱的,于是就又在两个node添加了两个属性ENV{ID_MODEL}和ENV{ID_REVISION},这样,两个lun在两个node上就是唯一的了
[root@node1 ~]# fdisk -l /dev/sdb
Disk /dev/sdb: 536.9 GB, 536870912000 bytes
255 heads, 63 sectors/track, 65270 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
[root@node1 ~]# fdisk -l /dev/sdc
Disk /dev/sdc: 644.2 GB, 644245094400 bytes
255 heads, 63 sectors/track, 78325 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
[root@node2 rules.d]# fdisk -l /dev/sdb
Disk /dev/sdb: 536.9 GB, 536870912000 bytes
255 heads, 63 sectors/track, 65270 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
[root@node2 rules.d]# fdisk -l /dev/sdc
Disk /dev/sdc: 644.2 GB, 644245094400 bytes
255 heads, 63 sectors/track, 78325 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
可以看到,两个node上的/dev/sdb和/dev/sdc就完全一致了
step5:现在将cluster unfreeze
[root@node2 ~]# clusvcadm -U db-oracle
之后切换测试一下
clusvcadm -r db-oracle -m node1
[root@node1 ~]# clustat
Cluster Status for db @ Thu Apr 19 15:17:53 2018
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
node1 1 Online, Local, rgmanager
node2 2 Online, rgmanager
Service Name Owner (Last) State
------- ---- ----- ------ -----
service:db-oracle node1 started
集群切换正常,可以正常切换到node1上
查看两个node上lvm信息
[root@node1 ~]# lvs -a -o name,copy_percent,devices vg_oracle
LV Cpy%Sync Devices
lv_oracle 100.00 lv_oracle_mimage_0(0),lv_oracle_mimage_1(0)
[lv_oracle_mimage_0] /dev/sdb(0)
[lv_oracle_mimage_1] /dev/sdc(0)
[lv_oracle_mlog] /dev/sdc(127999)
[root@node2 ~]# lvs -a -o name,copy_percent,devices vg_oracle
LV Cpy%Sync Devices
lv_oracle 100.00 lv_oracle_mimage_0(0),lv_oracle_mimage_1(0)
[lv_oracle_mimage_0] /dev/sdb(0)
[lv_oracle_mimage_1] /dev/sdc(0)
[lv_oracle_mlog] /dev/sdc(127999)
至此,线性卷修改为mirror卷OK,华为多路径导致lun名字混乱问题解决,cluster切换测试完毕