首先介绍一下什么是多路径(multi-path)?先说说多路径功能产生的背景,在多路径功能出现之前,主机上的硬盘是直接挂接到一个总线(PCI)上,路径是一对一的关系,也就是一条路径指向一个硬盘或是存储设备,这样的一对一关系对于操作系统而言,处理相对简单,但是缺少了可靠性。当出现了光纤通道网络(Fibre Channle)也就是通常所说的SAN网络时,或者由iSCSI组成的IPSAN环境时,由于主机和存储之间通过光纤通道交换机或者多块网卡及IP来连接时,构成了多对多关系的IO通道,也就是说一台主机到一台存储设备之间存在多条路径。当这些路径同时生效时,I/O流量如何分配和调度,如何做IO流量的负载均衡,如何做主备。这种背景下多路径软件就产生了。
多路径的主要功能就是和存储设备一起配合实现如下功能:
1.故障的切换和恢复
2.IO流量的负载均衡
3.磁盘的虚拟化
在linux操作系统中,RedHat和Suse的2.6的内核中都自带了免费的多路径软件包,ESX操作系统下也是自带了免费的多路径功能,而windows操作系统下,就需要购买一个叫MPIO的软件lience才能使用multi-path多路径功能。其他windows和ESX操作系统下的多路径 功能都是图形化界面比较简单这里就不多做介绍了,在这里就是介绍一下linux环境下如何配置multi-path多路径功能。
1、device-mapper-multipath:即multipath-tools。主要提供multipathd和multipath等工具和 multipath.conf等配置文件。这些工具通过device mapper的ioctr的接口创建和配置multipath,设备创建的多路径设备映射会在/dev /mapper中。
2、 device-mapper:主要包括两大部分:内核部分和用户部分。内核部分主要由device mapper核心(dm.ko)和一些target driver(md-multipath.ko)。核心完成设备的映射,而target根据映射关系和自身特点具体处理从mappered device 下来的i/o。同时,在核心部分,提供了一个接口,用户通过ioctr可和内核部分通信,以指导内核驱动的行为,比如如何创建mappered device,这些divece的属性等。linux device mapper的用户空间部分主要包括device-mapper这个包。其中包括dmsetup工具和一些帮助创建和配置mappered device的库。这些库主要抽象,封装了与ioctr通信的接口,以便方便创建和配置mappered device。multipath-tool的程序中就需要调用这些库。
[root@liujing ~]# multipath –ll 查看多路径状态
Mar 10 19:18:28 | /etc/multipath.conf does not exist, blacklisting all devices.
Mar 10 19:18:28 | A sample multipath.conf file is located at
Mar 10 19:18:28 | /usr/share/doc/device-mapper-multipath-0.4.9/multipath.conf
Mar 10 19:18:28 | You can run /sbin/mpathconf to create or modify /etc/multipath.conf
Mar 10 19:18:28 | DM multipath kernel driver not loaded ----DM模块没有加载
如果模块没有加载成功请使用下列命初始化DM,或重启系统
---Use the following commands to initialize and start DM for the first time:
# modprobe dm-multipath
# modprobe dm-round-robin
# service multipathd start
# multipath –v2
初始化完了之后再通过multipath -ll命令查看是否加载成功
[root@liujing ~]# multipath -ll
Mar 10 19:21:14 | /etc/multipath.conf does not exist, blacklisting all devices.
Mar 10 19:21:14 | A sample multipath.conf file is located at
Mar 10 19:21:14 | /usr/share/doc/device-mapper-multipath-0.4.9/multipath.conf
Mar 10 19:21:14 | You can run /sbin/mpathconf to create or modify /etc/multipath.conf
DM multipath kernel driver not loaded ----这个提示没了说明DM模块已加载成功。
从上面的提示可以看到,DM模块是成功加载,但是/etc/下没有multipath.conf 配置文件,下一步介绍如何配置multipath.conf 文件。
2. 配置multipath:
通过vi命令创建一个Multipath的配置文件路径是/etc/multipath.conf ,在配置文件中添加multipath正常工作的最简配置如下:
vi /etc/multipath.conf
blacklist {
devnode "^sda"
}
defaults {
user_friendly_names yes
path_grouping_policy multibus
failback immediate
no_path_retry fail
}
编辑完成后保存配置,同时通过命令:
# /etc/init.d/multipathd start #开启mulitipath服务
如果出现无法开启服务的情况,没有提示OK的话如下:
[root@liujing mapper]# service multipathd start
Starting multipathd daemon: 没有提示OK
重新开关一下服务就可以解决了。
[root@liujing mapper]# /etc/init.d/multipathd stop
Stopping multipathd daemon: [ OK ]
[root@localhost mapper]# /etc/init.d/multipathd start
Starting multipathd daemon: [ OK ] -----提示OK 正常开启服务
通过命令查看:
[root@liujing mapper]# multipath -ll
mpatha (360a9800064665072443469563477396c) dm-0 NETAPP,LUN ----创建了一个lun
size=3.5G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=4 status=active
|- 1:0:0:0 sdb 8:16 active ready running ----多路径下的两个盘符sdb和sde.
`- 2:0:0:0 sde 8:64 active ready running
目录/dev/mapper/ 下多了两个文件夹mpatha 和mpathap1。
[root@liujing mapper]# cd /dev/mapper/
[root@liujing mapper]# ls
control mpatha mpathap1
同时fdisk –l的命令下也多了两个设备标识:
没有配置多路径时:
[root@liujing~]# fdisk -l
Disk /dev/sda: 146.8 GB, 146815733760 bytes
255 heads, 63 sectors/track, 17849 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000a6cdd
Device Boot Start End Blocks Id System
/dev/sda1 * 1 26 204800 83 Linux
Partition 1 does not end on cylinder boundary.
/dev/sda2 26 287 2097152 82 Linux swap / Solaris
Partition 2 does not end on cylinder boundary.
/dev/sda3 287 17850 141071360 83 Linux
Disk /dev/sdb: 3774 MB, 3774873600 bytes
117 heads, 62 sectors/track, 1016 cylinders
Units = cylinders of 7254 * 512 = 3714048 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0xac956c3a
Device Boot Start End Blocks Id System
/dev/sdb1 1 1016 3685001 83 Linux
Partition 1 does not start on physical sector boundary.
Disk /dev/sde: 3774 MB, 3774873600 bytes
117 heads, 62 sectors/track, 1016 cylinders
Units = cylinders of 7254 * 512 = 3714048 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0xac956c3a
Device Boot Start End Blocks Id System
/dev/sde1 1 1016 3685001 83 Linux
Partition 1 does not start on physical sector boundary.
两个CAN网卡获取到同一盘符:
/dev/sde和/dev/sdb.
配置后多了/dev/mapper/mpatha和/dev/mapper/mpathap1:
[root@localhost mapper]# fdisk -l
Disk /dev/sda: 146.8 GB, 146815733760 bytes
255 heads, 63 sectors/track, 17849 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000a6cdd
Device Boot Start End Blocks Id System
/dev/sda1 * 1 26 204800 83 Linux
Partition 1 does not end on cylinder boundary.
/dev/sda2 26 287 2097152 82 Linux swap / Solaris
Partition 2 does not end on cylinder boundary.
/dev/sda3 287 17850 141071360 83 Linux
Disk /dev/sdb: 3774 MB, 3774873600 bytes
117 heads, 62 sectors/track, 1016 cylinders
Units = cylinders of 7254 * 512 = 3714048 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0xac956c3a
Device Boot Start End Blocks Id System
/dev/sdb1 1 1016 3685001 83 Linux
Partition 1 does not start on physical sector boundary.
Disk /dev/sde: 3774 MB, 3774873600 bytes
117 heads, 62 sectors/track, 1016 cylinders
Units = cylinders of 7254 * 512 = 3714048 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0xac956c3a
Device Boot Start End Blocks Id System
/dev/sde1 1 1016 3685001 83 Linux
Partition 1 does not start on physical sector boundary.
Disk /dev/mapper/mpatha: 3774 MB, 3774873600 bytes
117 heads, 62 sectors/track, 1016 cylinders
Units = cylinders of 7254 * 512 = 3714048 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0xac956c3a
Device Boot Start End Blocks Id System
/dev/mapper/mpathap1 1 1016 3685001 83 Linux
Partition 1 does not start on physical sector boundary.
Disk /dev/mapper/mpathap1: 3773 MB, 3773441024 bytes
255 heads, 63 sectors/track, 458 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Alignment offset: 1024 bytes
Disk identifier: 0x00000000
Disk /dev/mapper/mpathap1 doesn't contain a valid partition table
# multipath -F #删除现有路径 两个新的路径就会被删除
# multipath -v2 #格式化路径 格式化后又出现
3. multipath磁盘的基本操作
要对多路径软件生成的磁盘进行操作直接操作/dev/mapper/目录下的磁盘就行.
在对多路径软件生成的磁盘进行分区之前最好运行一下pvcreate命令:
# pvcreate /dev/mapper/mpatha
# fdisk /dev/mapper/mpatha 分区时用这个目录/dev/mapper/mpatha
用fdisk对多路径软件生成的磁盘进行分区保存时会有一个报错,此报错不用理会.
# ls -l /dev/mapper/
[root@liujing mnt]# ls -l /dev/mapper/
total 0
crw-rw----. 1 root root 10, 58 Mar 10 19:10 control
lrwxrwxrwx. 1 root root 7 Mar 10 20:28 mpatha -> ../dm-0
lrwxrwxrwx. 1 root root 7 Mar 10 20:33 mpathap1 -> ../dm-1
的mpathap1就是我们对multipath磁盘进行的分区
# mkfs.ext4 /dev/mapper/mpathap1 #对mpath1p1分区格式化成ext4文件系统
# mount /dev/mapper/mpathap1 /mnt/ #挂载mpathap1分区
格式化和挂载时用/dev/mapper/mpathap1
4. 分区磁盘:
上面有提到分区时用目录/dev/mapper/mpatha
[root@liujing~]# fdisk /dev/mapper/mpatha
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0xac956c3a.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won't be recoverable.
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)
WARNING: DOS-compatible mode is deprecated. It's strongly recommended to
switch off the mode (command 'c') and change display units to
sectors (command 'u').
Command (m for help): n------------------------新建分区
Command action
e extended
p primary partition (1-4)
p-----------------------------主分区
Partition number (1-4): 1
First cylinder (1-1016, default 1):
Using default value 1
Last cylinder, +cylinders or +size{K,M,G} (1-1016, default 1016):
Using default value 1016
Command (m for help): w ---------------------写入列表相当于保存
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
注:如果同一台设备的两个node挂同样的盘符,另一个盘符还需要再次写入w就行。不需要n了。
5. 格式化:
[root@liujing ~]# mkfs.ext4 /dev/mapper/mpathap1
mke2fs 1.41.12 (17-May-2010)
/dev/sdd1 alignment is offset by 1024 bytes.
This may result in very poor performance, (re)-partitioning suggested.
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=1 blocks, Stripe width=16 blocks
230608 inodes, 921250 blocks
46062 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=943718400
29 block groups
32768 blocks per group, 32768 fragments per group
7952 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736
Writing inode tables: done
Creating journal (16384 blocks): done
Writing superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 33 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.
6. 挂载 /dev/mapper/mpathap1 到 /mnt
[root@liujing ~]# mount /dev/mapper/mpathap1 /mnt
三、multipath的高级配置之前的配置都是用multipath的默认配置来完成multipath,比如映射设备的名称,multipath负载均衡的方法都是默认设置。那有没有按照我们自己定义的方法来配置multipath呢,答案是OK。
Attribute |
Description |
|||||||||
wwid |
Specifies the WWID of the multipath device to which the multipath attributes apply. This parameter is mandatory for this section of themultipath.conf file. |
|||||||||
alias |
Specifies the symbolic name for the multipath device to which themultipath attributes apply. If you are using user_friendly_names, do not set this value tompathn; this may conflict with an automatically assigned user friendly name and give you incorrect device node names. |
|||||||||
path_grouping_policy |
|
|||||||||
path_selector |
|
|||||||||
failback |
|
|||||||||
prio |
|
|||||||||
no_path_retry |
|
|||||||||
rr_min_io |
Specifies the number of I/O requests to route to a path before switching to the next path in the current path group. This setting is only for systems running kernels older that 2.6.31. Newer systems should userr_min_io_rq. The default value is 1000. |
|||||||||
rr_min_io_rq |
Specifies the number of I/O requests to route to a path before switching to the next path in the current path group, using request-based device-mapper-multipath. This setting should be used on systems running current kernels. On systems running kernels older than 2.6.31, use rr_min_io. The default value is 1. |
|||||||||
rr_weight |
If set to priorities, then instead of sending rr_min_io requests to a path before calling path_selector to choose the next path, the number of requests to send is determined byrr_min_io times the path's priority, as determined by the prio function. If set to uniform, all path weights are equal. |
|||||||||
flush_on_last_del |
If set to yes, then multipath will disable queueing when the last path to a device has been deleted. |
在我本地的一个完整的高级配置如下:
[root@liujing ~]# vi /etc/multipath.conf
blacklist {
devnode "^sda"
}
multipaths {
multipath {
wwid 360a98000646650724434697454546156
alias mpathb_fcoe
path_grouping_policy multibus
#path_checker "directio"
prio "random"
path_selector "round-robin 0"
}
}
devices {
device {
vendor "NETAPP"
product "LUN"
getuid_callout "/lib/udev/scsi_id --whitelisted --device=/dev/%n"
#path_checker "directio"
#path_selector "round-robin 0"
failback immediate
no_path_retry fail
}
}
其中 wwid,vendor,product, getuid_callout这些参数可以通过:multipath -v3命令来获取。如果在/etc/multipath.conf中有设定各wwid 别名,别名会覆盖此设定。
可以使用dd命令来对设备进行读写操作,并同时通过iostat来查看I/0状态,流量从哪个路径出去:
DD命令:dd if=/dev/zero of=/mnt/1Gfile bs=8k count=131072 在上面我们已经把磁盘挂载在/MNT文件夹下所以我们在读写磁盘时直接对/mnt文件夹直接读写就可以了。
如果想对磁盘重复读写可以用如下语句:
[root@liujing ~]# for ((i=1;i<=5;i++));do dd if=/dev/zero of=/mnt/1Gfile bs=8k count=131072 2>&1|grep MB;done; ---重复读写5次这个值可以根据自己测试需求修改。
另一个控制台输入iostat 2 10查看IO读写状态:
可以看到sdc和sdd是两个多路径的盘符,流量均匀的负载在两条路径中,负载均衡很成功。
将其中一条路径的端口down掉,所有流量会直接切换到另一个路径中。