下面记录SPDK常见错误的解决方法,以免重复走弯路

读写没对齐 512B

现象:

nvme_qpair.c: 137:nvme_io_qpair_print_command: *NOTICE*: WRITE sqid:1 cid:191 nsid:1 lba:0 len:65536

nvme_qpair.c: 306:nvme_qpair_print_completion: *NOTICE*: INVALID FIELD (00/02) sqid:1 cid:191 cdw0:0 sqhd:0002 p:1 m:0 dnr:1

解决办法:分析代码允许记录:

TRACE: 09-12 10:45:02:   * 0 common/spdknvme_io.c:296] OP: Write, Offset:0, Size: 13

根据SPDK NVME 接口读写要求,必需512B对齐。改成了512B 之后,上面报错消失。

不是从大页内存读写

现象

starting write I/O failed, push back, reback to previous status
starting write I/O failed, push back, reback to previous status

解决办法:
SPDK读写的内存必须是基于EAL 大页申请的内存,这部分内存通过EAL DPDK库能够映射到用户态,如果用普通的内存,无法做DMA以供NVME hardware queue 直接使用,因此需要检查读写的接口使用的内存是否都是从大页分配的。检查了一下,果然这里不符合预期,修改检查允许正常。

大页初始化失败

现象:

 Starting SPDK v19.04-pre / DPDK 18.08.0 initialization...
[ DPDK EAL parameters: append_demo -c 0x8 --base-virtaddr=0x200000000000 --file-prefix=spdk0 --proc-type=auto ]
EAL: Detected 32 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Auto-detected process type: PRIMARY
EAL: Multi-process socket /var/run/dpdk/spdk0/mp_socket
EAL: No free hugepages reported in hugepages-1048576kB
EAL: Cannot allocate memzone list
EAL: FATAL: Cannot init memzone

EAL: Cannot init memzone

 EAL: Cannot init memzone

Failed to initialize DPDK
Unable to initialize Spdk env

分析和解决方法:
检查是否真的有大页:

 [root@szw scripts]# cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
0
[root@szwscripts]# cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
0

果然没有大页了,重新申请:

 cd  spdk/script ; ./all_setup.sh config

控制器处于failed 状态

 Starting SPDK v19.04-pre / DPDK 18.08.0 initialization...
[ DPDK EAL parameters: append_demo -c 0x8 --base-virtaddr=0x200000000000 --file-prefix=spdk0 --proc-type=auto ]
EAL: Detected 32 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Auto-detected process type: PRIMARY
EAL: Multi-process socket /var/run/dpdk/spdk0/mp_socket
TRACE: 08-13 19:24:01:   * 0 baidu/bce/cds/common/spdk_nvme_io.cpp:534] Initializing NVMe Controllers

TRACE: 08-13 19:24:01:   * 0 baidu/bce/cds/common/spdk_nvme_io.cpp:182] Attaching to 0000:b0:00.0

nvme_ctrlr.c:2170:nvme_ctrlr_process_init: *ERROR*: Initialization timed out in state 2
nvme_ctrlr.c: 496:nvme_ctrlr_fail: *ERROR*: ctrlr 0000:b0:00.0 in failed state.
nvme.c: 423:nvme_init_controllers: *ERROR*: Failed to initialize SSD: 0000:b0:00.0
nvme_ctrlr.c: 553:nvme_ctrlr_shutdown: *ERROR*: did not shutdown within 10000 milliseconds

解决办法:SPDK模式的盘已经有进程绑定了,当前进程无法attach 到SPDK模式的盘。kill调所有使用SPDK模式盘的进程,重新拉起程序。