yolov4-tiny训练自己的模型并部署到RK的NPU上推理

文章目录

  • 前言
  • 一、安装darknet训练框架
  • 二、模型转换
  • 二、RKNN推理


前言

使用yolov4-tiny训练西瓜检测模型,部署到RKNPU上推理Python单线程速度35FPS以上

一、安装darknet训练框架

本人使用得是这里的darknet
Readme里有详细写明数据准备,cfg修改,训练步骤等,本文不在赘述。
这里解释一下cfg,方便训练修改和调参

[net]
# Testing                    ###    测试模式
# Training                   ###    训练模式
# batch=64
# subdivisions=16
 
---------------------------------------------------------------------------------------------------------
batch=64                     ###    每一次迭代送到网络的图片数量,也叫批数量。增大这个可以让
                             ###    网络在较少的迭代次数内完成一个epoch。在固定最大迭代次数的
                             ###    前提下,增加batch会延长训练时间,但会更好的寻找到梯度下降的
                             ###    方向。如果你显存够大,可以适当增大这个值来提高内存利用率。
                             ###    这个值是需要大家不断尝试选取的,过小的话会让训练不够收敛,
                             ###    过大会陷入局部最优。
 
subdivisions=32              ###    这个参数很有意思的,它会让你的每一个batch不是一下子都丢到
                             ###    网络里。而是分成subdivision对应数字的份数,一份一份的跑
                             ###    完后,在一起打包算作完成一次iteration。这样会降低对显存的
                             ###    占用情况。如果设置这个参数为1的话就是一次性把所有batch的
                             ###    图片都丢到网络里,如果为2的话就是一次丢一半。
 
    
                             ###    batch/subdivisions作为一次性送入训练器的样本数量,
                             ###    如果内存不够大,则会将batch分割为subdivisions个子batch
---------------------------------------------------------------------------------------------------------
width=32                     ###    input图像的宽
height=32                    ###    input图像的高
channels=3                   ###    input图像的通道数 3为RGB彩色图片,1为灰度图,4为RGBA图,A通道表示透明度
 
                             ###    以上三个参数为输入图像的参数信息width和height影响网络
                             ###    对输入图像的分辨率,从而影响precision,只可以设置成32的倍数
---------------------------------------------------------------------------------------------------------
momentum=0.9                 ###    冲量。DeepLearning1中最优化方法中的动量参数,这个值影响着梯
                             ###    度下降 到最优值得速度,冲量的建议配置为0.9。
 
decay=0.0005                 ###    权值衰减。使用的目的是防止过拟合,当网络逐渐过拟合时网络权值往往会变
                             ###    大,因此,为了避免过拟合,在每次迭代过程中以某个小因子降低
                             ###    每,decay参数越大对过拟合的抑制能力越强
                             ###    个权值,也等效于给误差函数添加一个惩罚项,常用的惩罚项是所有
                             ###    权重的平方乘以一个衰减常量之和。权值衰减惩罚项使得权值收敛到较小的绝对值。
 
angle=180                    ###    图片角度变化,单位为度,假如angle=5,
                             ###    就是生成新图片的时候随机旋转-5~5---------------------------------------------------------------------------------------------------------
saturation = 1.5
exposure = 1.5               ###    饱和度与曝光变化大小,tiny-yolo-voc.cfg中11.5,
                             ###    以及1/1.5~1倍 
 
hue=.1                       ###    色调变化范围,tiny-yolo-voc.cfg中-0.1~0.1 
 
 
                             ###    在每次迭代中,会基于角度、饱和度、曝光、色调产生新的训练图片。
---------------------------------------------------------------------------------------------------------
 
learning_rate=0.001          ###    初始学习率。训练发散的话可以降低学习率。学习遇到瓶颈,loss不变   
                             ###    的话也可以减低学习率。
 
                             ###    学习率决定了参数移动到最优值的速度快慢,如果学习率过大,
                             ###    很可能会越过最优值导致函数无法收敛,甚至发散;反之,
                             ###    如果学习率过小,优化的效率可能过低,算法长时间无法收敛,
                             ###    也易使算法陷入局部最优(非凸函数不能保证达到全
                             ###    局最优)。合适的学习率应该是在保证收敛的前提下,能尽快收
                             ###    敛。设置较好的learning rate,需要不断尝试。在一开始的时
                             ###    候,可以将其设大一点,这样可以使weights快一点发生改变,在
                             ###    迭代一定的epochs之后人工减小学习率。在yolo训练中,网络训
                             ###    练160epoches,初始学习率为0.001,在6090epochs时将学习率除以10。
 
 
burn_in=1000                 ###    在迭代次数小于burn_in时,其学习率的更新有一种方式,大于
                             ###    burn_in时,才采用policy的更新方式
 
max_batches = 50000          ###    最大迭代次数。训练达到max_batches后停止学习
policy=steps                 ###    学习策略,一般都是step这种步进式。
                             ###    有policy:constant, steps, exp, poly, step, sig, RANDOM,constant等方式
 
steps=100, 25000, 35000      ###    学习率变化时的迭代次数
scales=10,.1,.1              ###    学习率变化的比率。这两个是组合一起的,举个例子:learn_rate: 0.001,
                             ###    step:100,25000,35000   scales: 10, .1, .1 这组数据的
                             ###    意思就是在0-100次iteration期间learning rate为原始0.001,
                             ###    在100-25000次iteration期间learning rate为原始的100.01,
                             ###    在25000-35000次iteration期间learning rate为当前值的0.1,
                             ###    就是0.001, 在35000到最大iteration期间使用learning rate为
                             ###    当前值的0.1倍,就是0.0001。随着iteration增加,降低学习率可以
                             ###    是模型更有效的学习,也就是更好的降低train loss。
 
[convolutional]
batch_normalize=1            ###    是否做BN
filters=32                   ###    输出多少个特征图
size=3                       ###    卷积核的尺寸
stride=1                     ###    做卷积运算的步长
pad=1                        ###    如果pad为0,padding由 padding参数指定。如果pad为1,padding大小为size/2
activation=leaky
 
......
......
 
[convolutional]
size=1
stride=1
pad=1
filters=27                    ###    每一个[region/yolo]层前的最后一个卷积层中的
                              ###    filters=(classes+1+coords)*anchors_num,
                              ###    其中anchors_num 是该层mask的一个值.如果没有mask
                              ###    则 anchors_num=num是这层的ancho5的意义是5个坐标,
                              ###    论文中的tx,ty,tw,th,to。 3*(5+len(classes))
 
activation=linear             ###    激活函数-activation
                              ###    包括logistic, loggy, relu, elu, relie, plse, hardtan
                              ###    ,lhtan, linear, ramp, leaky, tanh, stair
 
 
[yolo]                        ###    在yoloV2中yolo层叫region层
mask = 6,7,8                  ###    当前属于第几个预选框,这一层预测第789个 anchor boxes,
                              ###    每个yolo层实际上只预测3个由mask定义的anchors
 
anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
                              ###    预测框的初始宽高,第一个是w,第二个是h,总数量是num*2,
                              ###    YOLOv2作者说anchors是使用K-MEANS获得,其实就是计算出
                              ###    哪种类型的框比较多,可以增加收敛速度,如果不设置anchors,
                              ###    默认是0.5;
 
classes=4                     ###    网络需要识别的物体种类数
num=9                         ###    每个grid cell预测几个box,和anchors的数量一致。当想要使
                              ###    用更多anchors时需要调大num,且如果调大num后训练时Obj趋近0的话
                              ###    可以尝试调大object_scale
 
jitter=.3                     ###    通过抖动增加噪声来抑制过拟合
 
ignore_thresh = .5            ###    决定是否需要计算IOU误差的参数,大于thresh,IOU误差不会夹在cost function中
 
truth_thresh = 1
 
random=1                      ###    random为1时会启用Multi-Scale Training,随机使用不同尺
                              ###    寸的图片进行训练,如果为0,每次训练大小与输入大小一致; 
                              ###    是否随机确定最后的预测框,显存小可设置成0

二、模型转换

训练完成后得到yolov4-yiny.cfg和yolov4-tiny.weight, 转换为yolov4-tiny.rknn,转换过程中遇到许多坑,每一步仔细点问题不大。

from PIL import Image
import numpy as np
from matplotlib import pyplot as plt
import re
import math
import random
from rknn.api import RKNN

if __name__ == '__main__':

    # Create RKNN object
    rknn = RKNN()
    print('--> Loading model')
    rknn.load_darknet(model='./yolov4-tiny.cfg', weight="./yolov4-tiny_final.weights")
    print('done')
    rknn.config(channel_mean_value='0 0 0 255', reorder_channel='0 1 2')
    # Build model
    print('--> Building model')
    rknn.build(do_quantization=True, dataset='./dataset.txt', pre_compile=True)
    print('done')
    rknn.export_rknn('./yolov4-tiny.rknn')
    ret = rknn.init_runtime()
    if ret != 0:
        print('init runtime failed.')
        exit(ret)
    print('done')
    exit(0)

二、RKNN推理

本人的模型是单类检测320x320输入,多类检测修改对应class和GRID0、GRID1, LISTSIZE=class+5,有时间再写一个多线程版本或C++版本速度可进一步提升。

from PIL import Image
import numpy as np
from matplotlib import pyplot as plt

import re
import math
import random
import cv2
import time
from rknn.api import RKNN

GRID0 = 10
GRID1 = 20
GRID2 = 52
LISTSIZE = 6
SPAN = 3
NUM_CLS = 1
MAX_BOXES = 500
OBJ_THRESH = 0.5
NMS_THRESH = 0.6

# CLASSES = ("person", "bicycle", "car","motorbike ","aeroplane ","bus ","train","truck ","boat","traffic light",
#            "fire hydrant","stop sign ","parking meter","bench","bird","cat","dog ","horse ","sheep","cow","elephant",
#            "bear","zebra ","giraffe","backpack","umbrella","handbag","tie","suitcase","frisbee","skis","snowboard","sports ball","kite",
#            "baseball bat","baseball glove","skateboard","surfboard","tennis racket","bottle","wine glass","cup","fork","knife ",
#            "spoon","bowl","banana","apple","sandwich","orange","broccoli","carrot","hot dog","pizza ","donut","cake","chair","sofa",
#            "pottedplant","bed","diningtable","toilet ","tvmonitor","laptop	","mouse	","remote ","keyboard ","cell phone","microwave ",
#            "oven ","toaster","sink","refrigerator ","book","clock","vase","scissors ","teddy bear ","hair drier", "toothbrush ")
CLASSES = ('watermelon',)
def sigmoid(x):
    return 1 / (1 + np.exp(-x))


def process(input, mask, anchors):

    anchors = [anchors[i] for i in mask]
    grid_h, grid_w = map(int, input.shape[0:2])

    box_confidence = sigmoid(input[..., 4])
    box_confidence = np.expand_dims(box_confidence, axis=-1)

    box_class_probs = sigmoid(input[..., 5:])

    box_xy = sigmoid(input[..., :2])
    box_wh = np.exp(input[..., 2:4])
    box_wh = box_wh * anchors

    col = np.tile(np.arange(0, grid_w), grid_w).reshape(-1, grid_w)
    row = np.tile(np.arange(0, grid_h).reshape(-1, 1), grid_h)

    col = col.reshape(grid_h, grid_w, 1, 1).repeat(3, axis=-2)
    row = row.reshape(grid_h, grid_w, 1, 1).repeat(3, axis=-2)
    grid = np.concatenate((col, row), axis=-1)

    box_xy += grid
    box_xy /= (grid_w, grid_h)
    box_wh /= (416, 416)
    box_xy -= (box_wh / 2.)
    box = np.concatenate((box_xy, box_wh), axis=-1)

    return box, box_confidence, box_class_probs

def filter_boxes(boxes, box_confidences, box_class_probs):
    """Filter boxes with object threshold.

    # Arguments
        boxes: ndarray, boxes of objects.
        box_confidences: ndarray, confidences of objects.
        box_class_probs: ndarray, class_probs of objects.

    # Returns
        boxes: ndarray, filtered boxes.
        classes: ndarray, classes for boxes.
        scores: ndarray, scores for boxes.
    """
    box_scores = box_confidences * box_class_probs
    box_classes = np.argmax(box_scores, axis=-1)
    box_class_scores = np.max(box_scores, axis=-1)
    pos = np.where(box_class_scores >= OBJ_THRESH)

    boxes = boxes[pos]
    classes = box_classes[pos]
    scores = box_class_scores[pos]

    return boxes, classes, scores

def nms_boxes(boxes, scores):
    """Suppress non-maximal boxes.

    # Arguments
        boxes: ndarray, boxes of objects.
        scores: ndarray, scores of objects.

    # Returns
        keep: ndarray, index of effective boxes.
    """
    x = boxes[:, 0]
    y = boxes[:, 1]
    w = boxes[:, 2]
    h = boxes[:, 3]

    areas = w * h
    order = scores.argsort()[::-1]

    keep = []
    while order.size > 0:
        i = order[0]
        keep.append(i)

        xx1 = np.maximum(x[i], x[order[1:]])
        yy1 = np.maximum(y[i], y[order[1:]])
        xx2 = np.minimum(x[i] + w[i], x[order[1:]] + w[order[1:]])
        yy2 = np.minimum(y[i] + h[i], y[order[1:]] + h[order[1:]])

        w1 = np.maximum(0.0, xx2 - xx1 + 0.00001)
        h1 = np.maximum(0.0, yy2 - yy1 + 0.00001)
        inter = w1 * h1

        ovr = inter / (areas[i] + areas[order[1:]] - inter)
        inds = np.where(ovr <= NMS_THRESH)[0]
        order = order[inds + 1]
    keep = np.array(keep)
    return keep


def yolov4_post_process(input_data):
    # yolov3
    # masks = [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
    # anchors = [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45],
    #          [59, 119], [116, 90], [156, 198], [373, 326]]
    # yolov3-tiny
    # masks = [[3, 4, 5], [0, 1, 2]]
    # anchors = [[10, 14], [23, 27], [37, 58], [81, 82], [135, 169], [344, 319]]

    #yolov4
    masks = [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
    anchors = [[12, 16], [19, 36], [40, 28], [36, 75], [76, 55], [72, 146], [142, 110], [192, 243], [459, 401]]

    boxes, classes, scores = [], [], []
    for input,mask in zip(input_data, masks):
        b, c, s = process(input, mask, anchors)
        b, c, s = filter_boxes(b, c, s)
        boxes.append(b)
        classes.append(c)
        scores.append(s)

    boxes = np.concatenate(boxes)
    classes = np.concatenate(classes)
    scores = np.concatenate(scores)

    nboxes, nclasses, nscores = [], [], []
    for c in set(classes):
        inds = np.where(classes == c)
        b = boxes[inds]
        c = classes[inds]
        s = scores[inds]

        keep = nms_boxes(b, s)

        nboxes.append(b[keep])
        nclasses.append(c[keep])
        nscores.append(s[keep])

    if not nclasses and not nscores:
        return None, None, None

    boxes = np.concatenate(nboxes)
    classes = np.concatenate(nclasses)
    scores = np.concatenate(nscores)

    return boxes, classes, scores


def draw(image, boxes, scores, classes):
    """Draw the boxes on the image.

    # Argument:
        image: original image.
        boxes: ndarray, boxes of objects.
        classes: ndarray, classes of objects.
        scores: ndarray, scores of objects.
        all_classes: all classes name.
    """
    for box, score, cl in zip(boxes, scores, classes):
        x, y, w, h = box
        print('class: {}, score: {}'.format(CLASSES[cl], score))
        print('box coordinate left,top,right,down: [{}, {}, {}, {}]'.format(x, y, x+w, y+h))
        x *= image.shape[1]
        y *= image.shape[0]
        w *= image.shape[1]
        h *= image.shape[0]
        top = max(0, np.floor(x + 0.5).astype(int))
        left = max(0, np.floor(y + 0.5).astype(int))
        right = min(image.shape[1], np.floor(x + w + 0.5).astype(int))
        bottom = min(image.shape[0], np.floor(y + h + 0.5).astype(int))

        # print('class: {}, score: {}'.format(CLASSES[cl], score))
        # print('box coordinate left,top,right,down: [{}, {}, {}, {}]'.format(top, left, right, bottom))

        cv2.rectangle(image, (top, left), (right, bottom), (255, 0, 0), 2)
        cv2.putText(image, '{0} {1:.2f}'.format(CLASSES[cl], score),
                    (top, left - 6),
                    cv2.FONT_HERSHEY_SIMPLEX,
                    0.6, (0, 0, 255), 2)

if __name__ == '__main__':

    # Create RKNN object
    rknn = RKNN()

    # Load tensorflow model
    print('--> Loading model')
    ret = rknn.load_rknn('./yolov4-tiny.rknn')

    if ret != 0:
        print('load rknn model failed')
        exit(ret)
    print('done')

    # Set inputs
    im_file = './0010031120190807065741.jpg'
    img = cv2.imread(im_file)
    orig_img = cv2.resize(img, (320,320))
    img = cv2.cvtColor(orig_img, cv2.COLOR_BGR2RGB)

    # init runtime environment
    print('--> Init runtime environment')
    ret = rknn.init_runtime(target='rk1808')
    if ret != 0:
        print('Init runtime environment failed')
        exit(ret)
    print('done')

    # Inference
    print('--> Running model')
    t1 = time.time()
    outputs = rknn.inference(inputs=[img])
    print("rknn infer time:", time.time() - t1)
    rknn.release()
    
    # input0_data = np.reshape(outputs[2], (SPAN, LISTSIZE, GRID0, GRID0))
    input1_data = np.reshape(outputs[1], (SPAN, LISTSIZE, GRID1, GRID1))
    input2_data = np.reshape(outputs[0], (SPAN, LISTSIZE, GRID0, GRID0))

    input_data = []
    # input_data.append(np.transpose(input0_data, (2, 3, 0, 1)))
    input_data.append(np.transpose(input2_data, (2, 3, 0, 1)))
    input_data.append(np.transpose(input1_data, (2, 3, 0, 1)))


    boxes, classes, scores = yolov4_post_process(input_data)

    if boxes is not None:
        draw(orig_img, boxes, scores, classes)

    cv2.imshow("results",orig_img)
    cv2.waitKeyEx(0)
    print('done')
    # exit(0)

你可能感兴趣的:(yolov4,RK1808,深度学习)