本文根据TVM官方文档Compile YOLO-V2 and YOLO-V3 in DarkNet Models提供的例程实现了在TVM上部署YOLO-DarkNet,并对使用TVM进行优化的效果进行了性能测试及对比。本文以YOLO-V3为例。
操作系统:Ubuntu 18.04
Backend:虚拟机 CPU
从GitHub上将DarkNet clone下来
git clone https://github.com/pjreddie/darknet.git
cd darknet
make
make完毕后,运行
./darknet
如出现以下提示,则安装成功:
为了测试YOLO,我们需要下载官方训练好的权重文件yolov3.weights
,下载地址为https://pjreddie.com/media/files/yolov3.weights
官网的下载速度较慢,可以提前至可信的国内平台下载。weights文件放在darknet
目录下。
运行以下命令:
./darknet detect cfg/yolov3.cfg yolov3.weights data/dog.jpg
可看到以下效果:
检测结果保存在predictions.jpg
中:
由于我们是在CPU上运行YOLO,且虚拟机CPU性能并不是很强,本次检测耗时32.126040s。
接下来我们在TVM上实现相同的效果并测试时间。
这一部分的示例代码来自官方文档Compile YOLO-V2 and YOLO-V3 in DarkNet Models,为了对性能进行测试,我向其中加入了计时代码。
运行测试程序之前,需要先对环境进行配置,安装CFFI和CV2:
pip install cffi
pip install opencv-python
环境配置完成后即可运行检测程序。
TVM的官方示例代码中使用download_testdata
函数从网络下载所需文件并返回目录,此处因为weights文件官网下载速度过慢,所以建议提前下载并放置在用户目录下的/.tvm_test_data/darknet/
目录中。为了防止偶现的由于网络问题导致download_testdata
函数执行出错而中断程序,当所有所需文件下载完成后,可直接填写文件的路径变量,提高运行效率。
为了防止下载、加载文件等操作影响对程序运行时间计算的干扰,我们将程序分为三部分计时。
# 导入 numpy and matplotlib
import numpy as np
import matplotlib.pyplot as plt
import sys
# 导入 tvm, relay
import tvm
from tvm import relay
from ctypes import *
from tvm.contrib.download import download_testdata
from tvm.relay.testing.darknet import __darknetffi__
import tvm.relay.testing.yolo_detection
import tvm.relay.testing.darknet
import datetime
# 设置 Model name
MODEL_NAME = 'yolov3'
######################################################################
# 下载所需文件
# -----------------------
# 这部分程序下载cfg及weights文件。
CFG_NAME = MODEL_NAME + '.cfg'
WEIGHTS_NAME = MODEL_NAME + '.weights'
REPO_URL = 'https://github.com/dmlc/web-data/blob/master/darknet/'
CFG_URL = REPO_URL + 'cfg/' + CFG_NAME + '?raw=true'
WEIGHTS_URL = 'https://pjreddie.com/media/files/' + WEIGHTS_NAME
cfg_path = download_testdata(CFG_URL, CFG_NAME, module="darknet")
# 以下为直接填写路径值示例,后同。
# cfg_path = "/home/ztj/.tvm_test_data/darknet/yolov3.cfg"
weights_path = download_testdata(WEIGHTS_URL, WEIGHTS_NAME, module="darknet")
# weights_path = "/home/ztj/.tvm_test_data/darknet/yolov3.weights"
# 下载并加载DarkNet Library
if sys.platform in ['linux', 'linux2']:
DARKNET_LIB = 'libdarknet2.0.so'
DARKNET_URL = REPO_URL + 'lib/' + DARKNET_LIB + '?raw=true'
elif sys.platform == 'darwin':
DARKNET_LIB = 'libdarknet_mac2.0.so'
DARKNET_URL = REPO_URL + 'lib_osx/' + DARKNET_LIB + '?raw=true'
else:
err = "Darknet lib is not supported on {} platform".format(sys.platform)
raise NotImplementedError(err)
lib_path = download_testdata(DARKNET_URL, DARKNET_LIB, module="darknet")
# lib_path = "/home/ztj/.tvm_test_data/darknet/libdarknet2.0.so"
# ******timepoint1-start*******
start1 = datetime.datetime.now()
# ******timepoint1-start*******
DARKNET_LIB = __darknetffi__.dlopen(lib_path)
net = DARKNET_LIB.load_network(cfg_path.encode('utf-8'), weights_path.encode('utf-8'), 0)
dtype = 'float32'
batch_size = 1
data = np.empty([batch_size, net.c, net.h, net.w], dtype)
shape_dict = {
'data': data.shape}
print("Converting darknet to relay functions...")
mod, params = relay.frontend.from_darknet(net, dtype=dtype, shape=data.shape)
######################################################################
# 将图导入Relay
# -------------------------
# 编译模型
target = 'llvm'
target_host = 'llvm'
ctx = tvm.cpu(0)
data = np.empty([batch_size, net.c, net.h, net.w], dtype)
shape = {
'data': data.shape}
print("Compiling the model...")
with relay.build_config(opt_level=3):
graph, lib, params = relay.build(mod,
target=target,
target_host=target_host,
params=params)
[neth, netw] = shape['data'][2:] # Current image shape is 608x608
# ******timepoint1-end*******
end1 = datetime.datetime.now()
# ******timepoint1-end*******
######################################################################
# 加载测试图片
# -----------------
test_image = 'dog.jpg'
print("Loading the test image...")
img_url = REPO_URL + 'data/' + test_image + '?raw=true'
# img_path = download_testdata(img_url, test_image, "data")
img_path = "/home/ztj/.tvm_test_data/data/dog.jpg"
# ******timepoint2-start*******
start2 = datetime.datetime.now()
# ******timepoint2-start*******
data = tvm.relay.testing.darknet.load_image(img_path, netw, neth)
######################################################################
# 在TVM上执行
# ----------------------
# 过程与其他示例没有差别
from tvm.contrib import graph_runtime
m = graph_runtime.create(graph, lib, ctx)
# 设置输入
m.set_input('data', tvm.nd.array(data.astype(dtype)))
m.set_input(**params)
# 执行
print("Running the test image...")
m.run()
# 获得输出
tvm_out = []
if MODEL_NAME == 'yolov2':
layer_out = {
}
layer_out['type'] = 'Region'
# Get the region layer attributes (n, out_c, out_h, out_w, classes, coords, background)
layer_attr = m.get_output(2).asnumpy()
layer_out['biases'] = m.get_output(1).asnumpy()
out_shape = (layer_attr[0], layer_attr[1]//layer_attr[0],
layer_attr[2], layer_attr[3])
layer_out['output'] = m.get_output(0).asnumpy().reshape(out_shape)
layer_out['classes'] = layer_attr[4]
layer_out['coords'] = layer_attr[5]
layer_out['background'] = layer_attr[6]
tvm_out.append(layer_out)
elif MODEL_NAME == 'yolov3':
for i in range(3):
layer_out = {
}
layer_out['type'] = 'Yolo'
# 获取YOLO层属性 (n, out_c, out_h, out_w, classes, total)
layer_attr = m.get_output(i*4+3).asnumpy()
layer_out['biases'] = m.get_output(i*4+2).asnumpy()
layer_out['mask'] = m.get_output(i*4+1).asnumpy()
out_shape = (layer_attr[0], layer_attr[1]//layer_attr[0],
layer_attr[2], layer_attr[3])
layer_out['output'] = m.get_output(i*4).asnumpy().reshape(out_shape)
layer_out['classes'] = layer_attr[4]
tvm_out.append(layer_out)
# 检测,并进行框选标记。
thresh = 0.5
nms_thresh = 0.45
img = tvm.relay.testing.darknet.load_image_color(img_path)
_, im_h, im_w = img.shape
dets = tvm.relay.testing.yolo_detection.fill_network_boxes((netw, neth), (im_w, im_h), thresh,
1, tvm_out)
last_layer = net.layers[net.n - 1]
tvm.relay.testing.yolo_detection.do_nms_sort(dets, last_layer.classes, nms_thresh)
# ******timepoint2-end*******
end2 = datetime.datetime.now()
# ******timepoint2-end*******
coco_name = 'coco.names'
coco_url = REPO_URL + 'data/' + coco_name + '?raw=true'
font_name = 'arial.ttf'
font_url = REPO_URL + 'data/' + font_name + '?raw=true'
coco_path = download_testdata(coco_url, coco_name, module='data')
font_path = download_testdata(font_url, font_name, module='data')
# coco_path = "/home/ztj/.tvm_test_data/data/coco.names"
# font_path = "/home/ztj/.tvm_test_data/data/arial.ttf"
# ******timepoint3-start*******
start3 = datetime.datetime.now()
# ******timepoint3-start*******
with open(coco_path) as f:
content = f.readlines()
names = [x.strip() for x in content]
tvm.relay.testing.yolo_detection.draw_detections(font_path, img, dets, thresh, names, last_layer.classes)
# ******timepoint3-end*******
end3 = datetime.datetime.now()
# ******timepoint3-end*******
print(end1-start1)
print(end2-start2)
print(end3-start3)
plt.imshow(img.transpose(1, 2, 0))
plt.show()
对直接运行及在TVM上运行分别进行十次重复测试,得到以下测试结果:
本次测试中,直接运行平均时长为28.196s,TVM上运行平均时长为22.266s,平均提速1.27x。目前还没有在其他硬件平台上进行测试。
在单图测试结果中,TVM的速度提升约为1.27x,测出的时间数据显示,TVM测试代码中的STAGE1,也就是将模型导入Relay、编译模型的阶段是耗时最长的部分,而导入检测图片和执行检测图片的过程耗时较少。接下来将进一步使用多张图片进行测试,文章链接:TVM上YOLO-DarkNet多图性能对比。