使用yolov4-tiny训练西瓜检测模型,部署到RKNPU上推理Python单线程速度35FPS以上
本人使用得是这里的darknet
Readme里有详细写明数据准备,cfg修改,训练步骤等,本文不在赘述。
这里解释一下cfg,方便训练修改和调参
[net]
# Testing ### 测试模式
# Training ### 训练模式
# batch=64
# subdivisions=16
---------------------------------------------------------------------------------------------------------
batch=64 ### 每一次迭代送到网络的图片数量,也叫批数量。增大这个可以让
### 网络在较少的迭代次数内完成一个epoch。在固定最大迭代次数的
### 前提下,增加batch会延长训练时间,但会更好的寻找到梯度下降的
### 方向。如果你显存够大,可以适当增大这个值来提高内存利用率。
### 这个值是需要大家不断尝试选取的,过小的话会让训练不够收敛,
### 过大会陷入局部最优。
subdivisions=32 ### 这个参数很有意思的,它会让你的每一个batch不是一下子都丢到
### 网络里。而是分成subdivision对应数字的份数,一份一份的跑
### 完后,在一起打包算作完成一次iteration。这样会降低对显存的
### 占用情况。如果设置这个参数为1的话就是一次性把所有batch的
### 图片都丢到网络里,如果为2的话就是一次丢一半。
### batch/subdivisions作为一次性送入训练器的样本数量,
### 如果内存不够大,则会将batch分割为subdivisions个子batch
---------------------------------------------------------------------------------------------------------
width=32 ### input图像的宽
height=32 ### input图像的高
channels=3 ### input图像的通道数 3为RGB彩色图片,1为灰度图,4为RGBA图,A通道表示透明度
### 以上三个参数为输入图像的参数信息width和height影响网络
### 对输入图像的分辨率,从而影响precision,只可以设置成32的倍数
---------------------------------------------------------------------------------------------------------
momentum=0.9 ### 冲量。DeepLearning1中最优化方法中的动量参数,这个值影响着梯
### 度下降 到最优值得速度,冲量的建议配置为0.9。
decay=0.0005 ### 权值衰减。使用的目的是防止过拟合,当网络逐渐过拟合时网络权值往往会变
### 大,因此,为了避免过拟合,在每次迭代过程中以某个小因子降低
### 每,decay参数越大对过拟合的抑制能力越强
### 个权值,也等效于给误差函数添加一个惩罚项,常用的惩罚项是所有
### 权重的平方乘以一个衰减常量之和。权值衰减惩罚项使得权值收敛到较小的绝对值。
angle=180 ### 图片角度变化,单位为度,假如angle=5,
### 就是生成新图片的时候随机旋转-5~5度
---------------------------------------------------------------------------------------------------------
saturation = 1.5
exposure = 1.5 ### 饱和度与曝光变化大小,tiny-yolo-voc.cfg中1到1.5倍,
### 以及1/1.5~1倍
hue=.1 ### 色调变化范围,tiny-yolo-voc.cfg中-0.1~0.1
### 在每次迭代中,会基于角度、饱和度、曝光、色调产生新的训练图片。
---------------------------------------------------------------------------------------------------------
learning_rate=0.001 ### 初始学习率。训练发散的话可以降低学习率。学习遇到瓶颈,loss不变
### 的话也可以减低学习率。
### 学习率决定了参数移动到最优值的速度快慢,如果学习率过大,
### 很可能会越过最优值导致函数无法收敛,甚至发散;反之,
### 如果学习率过小,优化的效率可能过低,算法长时间无法收敛,
### 也易使算法陷入局部最优(非凸函数不能保证达到全
### 局最优)。合适的学习率应该是在保证收敛的前提下,能尽快收
### 敛。设置较好的learning rate,需要不断尝试。在一开始的时
### 候,可以将其设大一点,这样可以使weights快一点发生改变,在
### 迭代一定的epochs之后人工减小学习率。在yolo训练中,网络训
### 练160epoches,初始学习率为0.001,在60和90epochs时将学习率除以10。
burn_in=1000 ### 在迭代次数小于burn_in时,其学习率的更新有一种方式,大于
### burn_in时,才采用policy的更新方式
max_batches = 50000 ### 最大迭代次数。训练达到max_batches后停止学习
policy=steps ### 学习策略,一般都是step这种步进式。
### 有policy:constant, steps, exp, poly, step, sig, RANDOM,constant等方式
steps=100, 25000, 35000 ### 学习率变化时的迭代次数
scales=10,.1,.1 ### 学习率变化的比率。这两个是组合一起的,举个例子:learn_rate: 0.001,
### step:100,25000,35000 scales: 10, .1, .1 这组数据的
### 意思就是在0-100次iteration期间learning rate为原始0.001,
### 在100-25000次iteration期间learning rate为原始的10倍0.01,
### 在25000-35000次iteration期间learning rate为当前值的0.1倍,
### 就是0.001, 在35000到最大iteration期间使用learning rate为
### 当前值的0.1倍,就是0.0001。随着iteration增加,降低学习率可以
### 是模型更有效的学习,也就是更好的降低train loss。
[convolutional]
batch_normalize=1 ### 是否做BN
filters=32 ### 输出多少个特征图
size=3 ### 卷积核的尺寸
stride=1 ### 做卷积运算的步长
pad=1 ### 如果pad为0,padding由 padding参数指定。如果pad为1,padding大小为size/2
activation=leaky
......
......
[convolutional]
size=1
stride=1
pad=1
filters=27 ### 每一个[region/yolo]层前的最后一个卷积层中的
### filters=(classes+1+coords)*anchors_num,
### 其中anchors_num 是该层mask的一个值.如果没有mask
### 则 anchors_num=num是这层的ancho5的意义是5个坐标,
### 论文中的tx,ty,tw,th,to。 3*(5+len(classes))
activation=linear ### 激活函数-activation
### 包括logistic, loggy, relu, elu, relie, plse, hardtan
### ,lhtan, linear, ramp, leaky, tanh, stair
[yolo] ### 在yoloV2中yolo层叫region层
mask = 6,7,8 ### 当前属于第几个预选框,这一层预测第7、8、9个 anchor boxes,
### 每个yolo层实际上只预测3个由mask定义的anchors
anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
### 预测框的初始宽高,第一个是w,第二个是h,总数量是num*2,
### YOLOv2作者说anchors是使用K-MEANS获得,其实就是计算出
### 哪种类型的框比较多,可以增加收敛速度,如果不设置anchors,
### 默认是0.5;
classes=4 ### 网络需要识别的物体种类数
num=9 ### 每个grid cell预测几个box,和anchors的数量一致。当想要使
### 用更多anchors时需要调大num,且如果调大num后训练时Obj趋近0的话
### 可以尝试调大object_scale
jitter=.3 ### 通过抖动增加噪声来抑制过拟合
ignore_thresh = .5 ### 决定是否需要计算IOU误差的参数,大于thresh,IOU误差不会夹在cost function中
truth_thresh = 1
random=1 ### random为1时会启用Multi-Scale Training,随机使用不同尺
### 寸的图片进行训练,如果为0,每次训练大小与输入大小一致;
### 是否随机确定最后的预测框,显存小可设置成0
训练完成后得到yolov4-yiny.cfg和yolov4-tiny.weight, 转换为yolov4-tiny.rknn,转换过程中遇到许多坑,每一步仔细点问题不大。
from PIL import Image
import numpy as np
from matplotlib import pyplot as plt
import re
import math
import random
from rknn.api import RKNN
if __name__ == '__main__':
# Create RKNN object
rknn = RKNN()
print('--> Loading model')
rknn.load_darknet(model='./yolov4-tiny.cfg', weight="./yolov4-tiny_final.weights")
print('done')
rknn.config(channel_mean_value='0 0 0 255', reorder_channel='0 1 2')
# Build model
print('--> Building model')
rknn.build(do_quantization=True, dataset='./dataset.txt', pre_compile=True)
print('done')
rknn.export_rknn('./yolov4-tiny.rknn')
ret = rknn.init_runtime()
if ret != 0:
print('init runtime failed.')
exit(ret)
print('done')
exit(0)
本人的模型是单类检测320x320输入,多类检测修改对应class和GRID0、GRID1, LISTSIZE=class+5,有时间再写一个多线程版本或C++版本速度可进一步提升。
from PIL import Image
import numpy as np
from matplotlib import pyplot as plt
import re
import math
import random
import cv2
import time
from rknn.api import RKNN
GRID0 = 10
GRID1 = 20
GRID2 = 52
LISTSIZE = 6
SPAN = 3
NUM_CLS = 1
MAX_BOXES = 500
OBJ_THRESH = 0.5
NMS_THRESH = 0.6
# CLASSES = ("person", "bicycle", "car","motorbike ","aeroplane ","bus ","train","truck ","boat","traffic light",
# "fire hydrant","stop sign ","parking meter","bench","bird","cat","dog ","horse ","sheep","cow","elephant",
# "bear","zebra ","giraffe","backpack","umbrella","handbag","tie","suitcase","frisbee","skis","snowboard","sports ball","kite",
# "baseball bat","baseball glove","skateboard","surfboard","tennis racket","bottle","wine glass","cup","fork","knife ",
# "spoon","bowl","banana","apple","sandwich","orange","broccoli","carrot","hot dog","pizza ","donut","cake","chair","sofa",
# "pottedplant","bed","diningtable","toilet ","tvmonitor","laptop ","mouse ","remote ","keyboard ","cell phone","microwave ",
# "oven ","toaster","sink","refrigerator ","book","clock","vase","scissors ","teddy bear ","hair drier", "toothbrush ")
CLASSES = ('watermelon',)
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def process(input, mask, anchors):
anchors = [anchors[i] for i in mask]
grid_h, grid_w = map(int, input.shape[0:2])
box_confidence = sigmoid(input[..., 4])
box_confidence = np.expand_dims(box_confidence, axis=-1)
box_class_probs = sigmoid(input[..., 5:])
box_xy = sigmoid(input[..., :2])
box_wh = np.exp(input[..., 2:4])
box_wh = box_wh * anchors
col = np.tile(np.arange(0, grid_w), grid_w).reshape(-1, grid_w)
row = np.tile(np.arange(0, grid_h).reshape(-1, 1), grid_h)
col = col.reshape(grid_h, grid_w, 1, 1).repeat(3, axis=-2)
row = row.reshape(grid_h, grid_w, 1, 1).repeat(3, axis=-2)
grid = np.concatenate((col, row), axis=-1)
box_xy += grid
box_xy /= (grid_w, grid_h)
box_wh /= (416, 416)
box_xy -= (box_wh / 2.)
box = np.concatenate((box_xy, box_wh), axis=-1)
return box, box_confidence, box_class_probs
def filter_boxes(boxes, box_confidences, box_class_probs):
"""Filter boxes with object threshold.
# Arguments
boxes: ndarray, boxes of objects.
box_confidences: ndarray, confidences of objects.
box_class_probs: ndarray, class_probs of objects.
# Returns
boxes: ndarray, filtered boxes.
classes: ndarray, classes for boxes.
scores: ndarray, scores for boxes.
"""
box_scores = box_confidences * box_class_probs
box_classes = np.argmax(box_scores, axis=-1)
box_class_scores = np.max(box_scores, axis=-1)
pos = np.where(box_class_scores >= OBJ_THRESH)
boxes = boxes[pos]
classes = box_classes[pos]
scores = box_class_scores[pos]
return boxes, classes, scores
def nms_boxes(boxes, scores):
"""Suppress non-maximal boxes.
# Arguments
boxes: ndarray, boxes of objects.
scores: ndarray, scores of objects.
# Returns
keep: ndarray, index of effective boxes.
"""
x = boxes[:, 0]
y = boxes[:, 1]
w = boxes[:, 2]
h = boxes[:, 3]
areas = w * h
order = scores.argsort()[::-1]
keep = []
while order.size > 0:
i = order[0]
keep.append(i)
xx1 = np.maximum(x[i], x[order[1:]])
yy1 = np.maximum(y[i], y[order[1:]])
xx2 = np.minimum(x[i] + w[i], x[order[1:]] + w[order[1:]])
yy2 = np.minimum(y[i] + h[i], y[order[1:]] + h[order[1:]])
w1 = np.maximum(0.0, xx2 - xx1 + 0.00001)
h1 = np.maximum(0.0, yy2 - yy1 + 0.00001)
inter = w1 * h1
ovr = inter / (areas[i] + areas[order[1:]] - inter)
inds = np.where(ovr <= NMS_THRESH)[0]
order = order[inds + 1]
keep = np.array(keep)
return keep
def yolov4_post_process(input_data):
# yolov3
# masks = [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
# anchors = [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45],
# [59, 119], [116, 90], [156, 198], [373, 326]]
# yolov3-tiny
# masks = [[3, 4, 5], [0, 1, 2]]
# anchors = [[10, 14], [23, 27], [37, 58], [81, 82], [135, 169], [344, 319]]
#yolov4
masks = [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
anchors = [[12, 16], [19, 36], [40, 28], [36, 75], [76, 55], [72, 146], [142, 110], [192, 243], [459, 401]]
boxes, classes, scores = [], [], []
for input,mask in zip(input_data, masks):
b, c, s = process(input, mask, anchors)
b, c, s = filter_boxes(b, c, s)
boxes.append(b)
classes.append(c)
scores.append(s)
boxes = np.concatenate(boxes)
classes = np.concatenate(classes)
scores = np.concatenate(scores)
nboxes, nclasses, nscores = [], [], []
for c in set(classes):
inds = np.where(classes == c)
b = boxes[inds]
c = classes[inds]
s = scores[inds]
keep = nms_boxes(b, s)
nboxes.append(b[keep])
nclasses.append(c[keep])
nscores.append(s[keep])
if not nclasses and not nscores:
return None, None, None
boxes = np.concatenate(nboxes)
classes = np.concatenate(nclasses)
scores = np.concatenate(nscores)
return boxes, classes, scores
def draw(image, boxes, scores, classes):
"""Draw the boxes on the image.
# Argument:
image: original image.
boxes: ndarray, boxes of objects.
classes: ndarray, classes of objects.
scores: ndarray, scores of objects.
all_classes: all classes name.
"""
for box, score, cl in zip(boxes, scores, classes):
x, y, w, h = box
print('class: {}, score: {}'.format(CLASSES[cl], score))
print('box coordinate left,top,right,down: [{}, {}, {}, {}]'.format(x, y, x+w, y+h))
x *= image.shape[1]
y *= image.shape[0]
w *= image.shape[1]
h *= image.shape[0]
top = max(0, np.floor(x + 0.5).astype(int))
left = max(0, np.floor(y + 0.5).astype(int))
right = min(image.shape[1], np.floor(x + w + 0.5).astype(int))
bottom = min(image.shape[0], np.floor(y + h + 0.5).astype(int))
# print('class: {}, score: {}'.format(CLASSES[cl], score))
# print('box coordinate left,top,right,down: [{}, {}, {}, {}]'.format(top, left, right, bottom))
cv2.rectangle(image, (top, left), (right, bottom), (255, 0, 0), 2)
cv2.putText(image, '{0} {1:.2f}'.format(CLASSES[cl], score),
(top, left - 6),
cv2.FONT_HERSHEY_SIMPLEX,
0.6, (0, 0, 255), 2)
if __name__ == '__main__':
# Create RKNN object
rknn = RKNN()
# Load tensorflow model
print('--> Loading model')
ret = rknn.load_rknn('./yolov4-tiny.rknn')
if ret != 0:
print('load rknn model failed')
exit(ret)
print('done')
# Set inputs
im_file = './0010031120190807065741.jpg'
img = cv2.imread(im_file)
orig_img = cv2.resize(img, (320,320))
img = cv2.cvtColor(orig_img, cv2.COLOR_BGR2RGB)
# init runtime environment
print('--> Init runtime environment')
ret = rknn.init_runtime(target='rk1808')
if ret != 0:
print('Init runtime environment failed')
exit(ret)
print('done')
# Inference
print('--> Running model')
t1 = time.time()
outputs = rknn.inference(inputs=[img])
print("rknn infer time:", time.time() - t1)
rknn.release()
# input0_data = np.reshape(outputs[2], (SPAN, LISTSIZE, GRID0, GRID0))
input1_data = np.reshape(outputs[1], (SPAN, LISTSIZE, GRID1, GRID1))
input2_data = np.reshape(outputs[0], (SPAN, LISTSIZE, GRID0, GRID0))
input_data = []
# input_data.append(np.transpose(input0_data, (2, 3, 0, 1)))
input_data.append(np.transpose(input2_data, (2, 3, 0, 1)))
input_data.append(np.transpose(input1_data, (2, 3, 0, 1)))
boxes, classes, scores = yolov4_post_process(input_data)
if boxes is not None:
draw(orig_img, boxes, scores, classes)
cv2.imshow("results",orig_img)
cv2.waitKeyEx(0)
print('done')
# exit(0)