在虚拟机上使用SoftRoCE部署SPDK NVMe-oF

作者简介

万群,Intel 存储软件工程师,主要从事SPDK软件测试工作。

背景介绍

在之前推送的文章《

为什么我们需要在虚拟机上部署SPDK NVMe-oF呢?原因很明显:因为随着主机拥有越来越强大的内核,我们能更有效地利用内核资源;而对于NVMe-oF功能测试用例,不需要太多计算和内存资源,那么,我们可以充分利用虚拟机来进行NVMe-oF功能测试。 接下来,我们将讨论如何使用虚拟机和SoftRoCE来实现此目的。

设置SoftRoCE环境

设置两个Fedora26内核为4.15.0-041500-generic的虚拟机。两个虚拟机使用Vagrantfile连接到主机上名为vboxnet0的的网桥。 拓扑结构如下图:

在虚拟机上使用SoftRoCE部署SPDK NVMe-oF_第1张图片

图1 使用SoftRoCE在两台虚拟机上部署NVMe-oF的拓扑结构图

1

 在主机中,您可以使用如下vagrant命令列出有效的虚拟机:

# vagrant global-status --prune

id       name   provider   state   directory                             

--------------------------------------------------------------------------

440cdef  sss    virtualbox running /home/yidong/spdk_init/scripts/vagrant

c5e4e5a  www    virtualbox running /home/yidong/spdk/scripts/vagrant    

以上显示了有关此计算机上所有已知Vagrant环境的信息。 这是缓存数据,可能不是最新的信息(使用“vagrant global-status --prune”来修剪无效条目)。 要与任何计算机进行交互,您可以转到该目录并运行Vagrant,也可以直接使用任何目录中的Vagrant命令使用该ID。 例如:

"vagrant destroy 1a2b3c4d"

2

然后,您可以使用以下命令登录虚拟机:

# vagrant ssh 440cdef

Welcome to Ubuntu 16.04.4 LTS (GNU/Linux 4.15.0-041500-generic x86_64)

 * Documentation:  https://help.ubuntu.com

 * Management:     https://landscape.canonical.com

 * Support:        https://ubuntu.com/advantage

  Get cloud support with Ubuntu Advantage Cloud Guest:

    http://www.ubuntu.com/business/services/cloud

43 packages can be updated.

0 updates are security updates.

Last login: Tue Aug  7 01:48:33 2018 from 10.0.2.2

3

 接下来,您需要按照上一期文章《

使用场景- NVMe-oF

注意:

1. 以NVMe-oF为例

2. 此处我们使用rdma模式或本地映射模式运行fio、perf

01

使用fio运行NVMe-oF测试的步骤

NVMe-oF target(虚拟机1)

1.  克隆最新的SPDK

git clone https://github.com/spdk/spdk.git

2.  使用with rdma编译SPDK

./configure --with-rdma

make -j 64

3.  卸载mlx_ib驱动程序并加载NVMe-oF RDMA驱动程序

modprobe –rv mlx4_ib

modprobe nvme_rdma

modprobe nvme_fabrics

4.  为非rdma NIC配置SoftRoCE

rxe_cfg start

rxe_cfg add enp0s8 (这里我们使用e1000网卡)

ifconfig enp0s8 172.28.128.12

5.  使用以下命令运行nvmf_tgt应用程序

cd spdk

./app/nvmf_tgt/nvmf_tgt  -c app/nvmf_tgt/nvmf.conf.in

以下是nvmf.conf.in 文件,以供参考。

[Global]

[Nvmf]

  MaxQueuesPerSession 4

  AcceptorPollRate 10000

[Nvme]

  TransportId "trtype:PCIe traddr:0000:0e:00.0" Nvme0

  RetryCount 4

  Timeout 0

  ActionOnTimeout None

  AdminPollRate 100000

  HotplugEnable No

[Subsystem1]

  NQN nqn.2016-06.io.spdk:cnode1

  Listen RDMA 172.28.128.12:4420

  AllowAnyHost yes

  Host nqn.2016-06.io.spdk:init

  SN SPDK00000000000001

  Namespace Nvme0n1 

6.  当nvmf_tgt应用程序成功运行时,将看到以下内容

# ./app/nvmf_tgt/nvmf_tgt -c ../nvmf.conf

Starting SPDK v18.07 / DPDK 18.05.0 initialization...

[ DPDK EAL parameters: nvmf -c 0x1 --legacy-mem --file-prefix=spdk_pid5808 ]

EAL: Detected 2 lcore(s)

EAL: Detected 1 NUMA nodes

EAL: Multi-process socket /var/run/dpdk/spdk_pid5808/mp_socket

EAL: Probing VFIO support...

app.c: 530:spdk_app_start: *NOTICE*: Total cores available: 1

reactor.c: 718:spdk_reactors_init: *NOTICE*: Occupied cpu socket mask is 0x1

reactor.c: 492:_spdk_reactor_run: *NOTICE*: Reactor started on core 0 on socket 0

bdev_nvme.c:1268:bdev_nvme_library_init: *WARNING*: Timeout (in seconds) was renamed to TimeoutUsec (in microseconds)

bdev_nvme.c:1269:bdev_nvme_library_init: *WARNING*: Please update your configuration file

EAL: PCI device 0000:00:0e.0 on NUMA socket 0

EAL:   probe driver: 80ee:4e56 spdk_nvme

NVMe-oF Initiator (虚拟机 2)

1.  卸载mlx_ib驱动程序并加载NVMe-oF RDMA主机驱动程序

modprobe –rv mlx4_ib

modprobe nvme_rdma

modprobe nvme_fabrics

2.  为非rdma NIC配置SoftRoCE

rxe_cfg start

rxe_cfg add enp0s8

 (在这里我们使用e1000网卡)

ifconfig enp0s8 172.28.128.4

3.  连接子系统

nvme connect -t rdma -n "nqn.2016-06.io.spdk:cnode1" -a 172.28.128.12 -s 4420

4.  执行fio任务

fio fio_softroce.job

Fio配置文件fio_softroce.job如下:

[global]

invalidate=1

norandommap=1

thread=1

rw=randrw

runtime=10

ioengine=libaio

direct=1

bs=4096

size=1G

iodepth=128

group_reporting

time_based=1

[job0]

filename=/dev/nvme0n1

5.  当fio成功完成时,您将看到以下内容

# fio fio_softroce.job

job0: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=128

fio-2.2.10

Starting 1 thread

open path: No such file or directory

Error getting slave device numbers.: No such file or directory

Jobs: 1 (f=1): [m(1)] [1.1% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 18m:59s]           

job0: (groupid=0, jobs=1): err= 0: pid=6477: Tue Aug  7 01:26:12 2018

  read : io=6388.0KB, bw=534202B/s, iops=130, runt= 12245msec

    slat (usec): min=3, max=25613, avg=568.48, stdev=2447.25

    clat (usec): min=600, max=2540.4K, avg=40228.82, stdev=227386.16

     lat (usec): min=710, max=2557.8K, avg=40797.85, stdev=227690.07

    clat percentiles (usec):

     |  1.00th=[  964],  5.00th=[ 1336], 10.00th=[ 1656], 20.00th=[ 2160],

     | 30.00th=[ 2608], 40.00th=[ 5856], 50.00th=[ 9280], 60.00th=[12736],

     | 70.00th=[15424], 80.00th=[18048], 90.00th=[23936], 95.00th=[30592],

     | 99.00th=[995328], 99.50th=[2277376], 99.90th=[2473984], 99.95th=[2539520],

     | 99.99th=[2539520]

    bw (KB  /s): min=   64, max= 3696, per=100.00%, avg=1770.71, stdev=1051.85

  write: io=5944.0KB, bw=497072B/s, iops=121, runt= 12245msec

    slat (usec): min=4, max=23449, avg=656.70, stdev=2460.03

    clat (msec): min=3, max=2988, avg=416.84, stdev=605.87

     lat (msec): min=3, max=2988, avg=417.50, stdev=605.89

    clat percentiles (msec):

     |  1.00th=[    7],  5.00th=[   21], 10.00th=[   85], 20.00th=[  119],

     | 30.00th=[  151], 40.00th=[  182], 50.00th=[  217], 60.00th=[  265],

     | 70.00th=[  334], 80.00th=[  465], 90.00th=[  709], 95.00th=[ 2376],

     | 99.00th=[ 2573], 99.50th=[ 2638], 99.90th=[ 2704], 99.95th=[ 2999],

     | 99.99th=[ 2999]

    bw (KB  /s): min=    0, max= 2416, per=100.00%, avg=1335.50, stdev=846.13

    lat (usec) : 750=0.10%, 1000=0.55%

    lat (msec) : 2=8.27%, 4=10.12%, 10=8.08%, 20=18.52%, 50=7.82%

    lat (msec) : 100=3.15%, 250=21.51%, 500=12.81%, 750=3.99%, 1000=0.94%

    lat (msec) : 2000=0.03%, >=2000=4.12%

  cpu          : usr=0.00%, sys=1.80%, ctx=1999, majf=0, minf=1

  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.3%, 16=0.5%, 32=1.0%, >=64=98.0%

     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%

     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%

     issued    : total=r=1597/w=1486/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0

     latency   : target=0, window=0, percentile=100.00%, depth=128

 

Run status group 0 (all jobs):

   READ: io=6388KB, aggrb=521KB/s, minb=521KB/s, maxb=521KB/s, mint=12245msec, maxt=12245msec

  WRITE: io=5944KB, aggrb=485KB/s, minb=485KB/s, maxb=485KB/s, mint=12245msec, maxt=12245msec

 

Disk stats (read/write):

  nvme0n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%

6.  断开子系统

nvme disconnect -d /dev/nvme0n1

02

在rdma模式下使用perf在NVMe-oF测试上运行的步骤

NVMe-oF Target (虚拟机 1)

步骤和fio完全一样。

NVMe-oF Target (虚拟机 2)

1.      断开现有的nvme设备

nvme disconnect –d /dev/nvme0n1

2.      使用 setup.sh运行安装程序

./script/setup.sh

3.      使用rdma模式运行perf测试

./examples/nvme/perf/perf –q 1 –w randread –s 4096 –t 10 –r ‘trtype:RDMA adrfam:IPv4 traddr:172.28.128.12 trsvcid:4420’

4.      当perf成功完成运行后,您将看到打印的内容如下

# ./examples/nvme/perf/perf -q 1 -w randread -s 4096 -t 10 -r 'trtype:RDMA adrfam:IPv4 traddr:172.28.128.12 trsvcid:4420'

Starting SPDK v18.10-pre / DPDK 18.05.0 initialization...

[ DPDK EAL parameters: perf -c 0x1 --no-pci --legacy-mem --file-prefix=spdk_pid5138 ]

EAL: Detected 2 lcore(s)

EAL: Detected 1 NUMA nodes

EAL: Multi-process socket /var/run/dpdk/spdk_pid5138/mp_socket

EAL: Probing VFIO support...

Initializing NVMe Controllers

Attaching to NVMe over Fabrics controller at 172.28.128.12:4420: nqn.2016-06.io.spdk:cnode1

Attached to NVMe over Fabrics controller at 172.28.128.12:4420: nqn.2016-06.io.spdk:cnode1

Associating SPDK bdev Controller (SPDK00000000000001  ) with lcore 0

Initialization complete. Launching workers.

Starting thread on core 0

========================================================

                                                                                            Latency(us)

Device Information                                     :       IOPS       MB/s    Average        min        max

SPDK bdev Controller (SPDK00000000000001  ) from core 0:    3389.30      13.24     294.97     143.31   17376.58

========================================================

Total                                                  :    3389.30      13.24     294.97     143.31   17376.58

03

在NVMe-oF测试中使用perf在本地运行的步骤

NVMe-oF Target (虚拟机 1)

步骤和fio完全一样。

NVMe-oF Initiator (虚拟机 2)

1.      连接子系统

nvme connect –t rdma –n “nqn.2016-06.io.spdk:cnode1” –a 172.28.128.12 –s 4420

2.      列出连接的nvme设备

nvme list

3.      在本地设备上运行perf测试

./examples/nvme/perf/perf /dev/nvme0n1 –q 1 –w randread –s 4096 –t 10

4.      当perf成功完成运行后,您将看到打印的消息如下

# ./examples/nvme/perf/perf /dev/nvme0n1 -q 1 -w randread -s 4096 -t 10

Starting SPDK v18.10-pre / DPDK 18.05.0 initialization...

[ DPDK EAL parameters: perf -c 0x1 --legacy-mem   --file-prefix=spdk_pid5122 ]

EAL: Detected 2 lcore(s)

EAL: Detected 1 NUMA nodes

EAL: Multi-process socket /var/run/dpdk/spdk_pid5122/mp_socket

EAL: Probing VFIO support...

Initializing NVMe Controllers

EAL: PCI device 0000:00:0e.0 on NUMA socket 0

EAL:   probe driver: 80ee:4e56   spdk_nvme

Attaching to NVMe Controller at 0000:00:0e.0

Attached to NVMe Controller at 0000:00:0e.0 [80ee:4e56]

Associating ORCL-VBOX-NVME-VER12 (VB1234-56789        ) with lcore 0

Associating /dev/nvme0n1 with lcore 0

Initialization complete. Launching workers.

Starting thread on core 0

========================================================

                                                                                              Latency(us)

Device Information                                     :       IOPS       MB/s      Average        min        max

/dev/nvme0n1                                from core   0:       0.50       0.00 2286420.25 2223980.88 2307051.46

ORCL-VBOX-NVME-VER12 (VB1234-56789          ) from core 0:   11283.60      44.08     101.63      33.21 1474178.36

========================================================

Total                                                    :   11284.10      44.08     202.94      33.21 2307051.46

5.      断开子系统

nvme disconnect -d   /dev/nvme0n1

Q & A

 1      在NVMe-oF initiator连接到NVMe-oFtarget后,如果使用lsblk命令检查连接的命名空间,您将看到以下内容。您也可以使用nvme list检查命名空间。

# lsblk

lsblk: nvme0c1n1: unknown device name

NAME   MAJ:MIN RM SIZE RO TYPE MOUNTPOINT

sdb      8:16   0  10M  0 disk

sda      8:0    0  40G  0 disk

└─sda1   8:1    0  40G  0 part /

 2      在NVMe-oF initiator上,如果在使用perf命令“./examples/nvme/perf/perf / dev / nvme0n1 -q 1 -s 4096 -w randread -t 10”运行时遇到以下问题,您可以重新启动虚拟机,此问题将得到解决。

#   ./examples/nvme/perf/perf /dev/nvme0n1 -q 1 -s 4096 -w randread -t 10

Starting   SPDK v18.07-pre / DPDK 18.02.0 initialization...

[ DPDK   EAL parameters: perf -c 0x1 --file-prefix=spdk_pid8197 ]

EAL:   Detected 2 lcore(s)

EAL: No   free hugepages reported in hugepages-2048kB

EAL:   FATAL: Cannot get hugepage information.

EAL:   Cannot get hugepage information.

Failed   to initialize DPDK

Unable   to initialize SPDK env

./examples/nvme/perf/perf:   errors occured

参考文献

[1] https://community.mellanox.com/docs/DOC-2184

[2] https://github.com/SoftRoCE

 推荐阅读 

在虚拟机上使用SoftRoCE部署SPDK NVMe-oF_第2张图片

你可能感兴趣的:(在虚拟机上使用SoftRoCE部署SPDK NVMe-oF)