YOLO训练自己的数据集-Darknet

Darknet安装官网
本文主要参考自https://www.cnblogs.com/nanzhao/p/Sailon.html

一、下载与安装

git clone https://github.com/pjreddie/darknet.git
cd darknet

修改Makefile文件相应的运行配置

GPU=1  #01
CUDNN=1  #01
OPENCV=0  #01
OPENMP = 0 
DEBUG = 0

修改完成后make进行编译
下载与训练权重

wget https://pjreddie.com/media/files/yolov3.weights

运行测试

./darknet detect cfg/yolov3.cfg yolov3.weights data/dog.jpg

检测输出实例

layer     filters    size              input                output
    0 conv     32  3 x 3 / 1   416 x 416 x   3   ->   416 x 416 x  32  0.299 BFLOPs
    1 conv     64  3 x 3 / 2   416 x 416 x  32   ->   208 x 208 x  64  1.595 BFLOPs
    .......
  105 conv    255  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 255  0.353 BFLOPs
  106 detection
truth_thresh: Using default '1.000000'
Loading weights from yolov3.weights...Done!
data/dog.jpg: Predicted in 0.029329 seconds.
dog: 99%
truck: 93%
bicycle: 99%

二、VOC数据集训练

Darknet希望每个图像都有一个txt 文件,图像中的每个标注框都有一行,如下所示:

    

其中 x,y,width和height 相对于图像的宽度和高度。
要生成这些文件,我们将voc_label.py 在Darknet的scripts/ 目录中运行脚本

voc_label.py共需修改以下四处:

import xml.etree.ElementTree as ET
import pickle
import os
from os import listdir, getcwd
from os.path import join
sets=[('2007', 'train'), ('2007', 'val'), ('2007', 'test')]  #替换为自己的数据集
classes = ["head", "eye", "nose"]     #修改为自己的类别

def convert(size, box):
    dw = 1./(size[0])
    dh = 1./(size[1])
    x = (box[0] + box[1])/2.0 - 1
    y = (box[2] + box[3])/2.0 - 1
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x*dw
    w = w*dw
    y = y*dh
    h = h*dh
    return (x,y,w,h)
def convert_annotation(year, image_id):
    in_file = open('VOCdevkit/VOC%s/Annotations/%s.xml'%(year, image_id))  #将数据集放于当前目录下
    out_file = open('VOCdevkit/VOC%s/labels/%s.txt'%(year, image_id), 'w')
    tree=ET.parse(in_file)
    root = tree.getroot()
    size = root.find('size')
    w = int(size.find('width').text)
    h = int(size.find('height').text)
    for obj in root.iter('object'):
        difficult = obj.find('difficult').text
        cls = obj.find('name').text
        if cls not in classes or int(difficult)==1:
            continue
        cls_id = classes.index(cls)
        xmlbox = obj.find('bndbox')
        b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text), float(xmlbox.find('ymax').text))
        bb = convert((w,h), b)
        out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')
wd = getcwd()
for year, image_set in sets:
    if not os.path.exists('VOCdevkit/VOC%s/labels/'%(year)):
        os.makedirs('VOCdevkit/VOC%s/labels/'%(year))
    image_ids = open('VOCdevkit/VOC%s/ImageSets/Main/%s.txt'%(year, image_set)).read().strip().split()
    list_file = open('%s_%s.txt'%(year, image_set), 'w')
    for image_id in image_ids:
        list_file.write('%s/VOCdevkit/VOC%s/JPEGImages/%s.jpg\n'%(wd, year, image_id))
        convert_annotation(year, image_id)
    list_file.close()   
os.system("cat 2007_train.txt 2007_val.txt > train.txt")     #修改为自己的数据集用作训练

运行

#wget https://pjreddie.com/media/files/voc_label.py
python voc_label.py

此脚本将生成所有必需的文件。大多数情况下,它会在VOCdevkit/VOC2007/labels/中 生成大量标签文件。
在目录中可以看到:

pc:~/darknet/scripts$ ls
2007_test.txt #0   dice_label.sh        imagenet_label.sh  VOCdevkit_original
2007_train.txt #1  gen_tactic.sh        train.txt #3        voc_label.py
2007_val.txt #2 get_coco_dataset.sh  VOCdevkit

文本文件 2007_train.txt 列出了该年份的图像文件和图像集。
Darknet需要一个文本文件,其中包含您要训练的所有图像。

修改Pascal数据的Cfg

修改cfg/voc.data配置文件以指向自己的数据:

pc:~/darknet/cfg$ cat voc.data
classes= 3    #修改为自己的类别数
train  = /home/learner/darknet/data/voc/train.txt   #修改为自己的路径 or /home/learner/darknet/scripts/2007_test.txt
valid  = /home/learner/darknet/data/voc/2007_test.txt   #修改为自己的路径 or /home/learner/darknet/scripts/2007_test.txt
names = /home/learner/darknet/data/voc.names  #修改见voc.names
backup = /home/learner/darknet/backup   #修改为自己的路径,输出的权重信息将存储其内

修改voc.names,每行一个类别,如:

head
eye
nose

替换放置VOC数据的目录为自己的VOC数据集

下载预训练卷积权重

使用在Imagenet上预训练的darknet53模型权重

wget https://pjreddie.com/media/files/darknet53.conv.74

修改文件cfg / yolov3-voc.cfg

[net]
# Testing
 batch=64
 subdivisions=32   #每批训练的个数=batch/subvisions,根据自己GPU显存进行修改,显存不够改大一些
# Training
# batch=64
# subdivisions=16
width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.001
burn_in=1000
max_batches = 50200  #训练步数
policy=steps
steps=40000,45000  #开始衰减的步数
scales=.1,.1



[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

# Downsample

[convolutional]
batch_normalize=1
filters=64
size=3
stride=2
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=32
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

# Downsample

[convolutional]
batch_normalize=1
filters=128
size=3
stride=2
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

# Downsample

[convolutional]
batch_normalize=1
filters=256
size=3
stride=2
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear


[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

# Downsample

[convolutional]
batch_normalize=1
filters=512
size=3
stride=2
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear


[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear


[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear


[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear


[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear


[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

# Downsample

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=2
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

######################

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=24   #filters = 3 * ( classes + 5 )   here,filters=3*(3+5)
activation=linear

[yolo]
mask = 6,7,8
anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
classes=3    #修改为自己的类别数
num=9
jitter=.3
ignore_thresh = .5
truth_thresh = 1
random=1

[route]
layers = -4

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[upsample]
stride=2

[route]
layers = -1, 61



[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=24    #filters = 3 * ( classes + 5 )   here,filters=3*(3+5)
activation=linear

[yolo]
mask = 3,4,5
anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
classes=3  #修改为自己的类别数
num=9
jitter=.3
ignore_thresh = .5
truth_thresh = 1
random=1

[route]
layers = -4

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[upsample]
stride=2

[route]
layers = -1, 36



[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=256
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=256
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=256
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=24    #filters = 3 * ( classes + 5 )   here,filters=3*(3+5)
activation=linear

[yolo]
mask = 0,1,2
anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
classes=3   #修改为自己的类别数
num=9
jitter=.3
ignore_thresh = .5
truth_thresh = 1
random=1

开始训练

单GPU训练:./darknet -i detector train

./darknet detector train cfg/voc.data cfg/yolov3-voc.cfg darknet53.conv.74

多GPU训练,格式为0,1,2,3:./darknet detector train -gpus

./darknet detector train cfg/voc.data cfg/yolov3-voc.cfg darknet53.conv.74 -gpus 0,1,2,3

三、测试

测试单张图片(自己的模型)

  • 测试单张图片,需要编译时有OpenCV支持:./darknet detector test #本次测试无opencv支持
  • 文件中batch和subdivisions两项必须为1。
  • 测试时还可以用-thresh和-hier选项指定对应参数。
learner@learner-pc:~/darknet$ ./darknet detector test cfg/voc.data cfg/yolov3-voc.cfg backup/yolov3-voc_20000.weights Eminem.jpg                                                                        layer     filters    size              input                output
conv     32  3 x 3 / 1   416 x 416 x   3   ->   416 x 416 x  32  0.299 BF                                          LOPs
conv     64  3 x 3 / 2   416 x 416 x  32   ->   208 x 208 x  64  1.595 BF                                          LOPs
conv     32  1 x 1 / 1   208 x 208 x  64   ->   208 x 208 x  32  0.177 BF                                          LOPs
conv     64  3 x 3 / 1   208 x 208 x  32   ->   208 x 208 x  64  1.595 BF                                          LOPs
res    1                 208 x 208 x  64   ->   208 x 208 x  64
conv    128  3 x 3 / 2   208 x 208 x  64   ->   104 x 104 x 128  1.595 BF                                          LOPs
conv     64  1 x 1 / 1   104 x 104 x 128   ->   104 x 104 x  64  0.177 BF                                          LOPs
conv    128  3 x 3 / 1   104 x 104 x  64   ->   104 x 104 x 128  1.595 BF                                          LOPs
res    5                 104 x 104 x 128   ->   104 x 104 x 128
conv     64  1 x 1 / 1   104 x 104 x 128   ->   104 x 104 x  64  0.177 BF                                          LOPs
conv    128  3 x 3 / 1   104 x 104 x  64   ->   104 x 104 x 128  1.595 BF                                          LOPs
res    8                 104 x 104 x 128   ->   104 x 104 x 128
conv    256  3 x 3 / 2   104 x 104 x 128   ->    52 x  52 x 256  1.595 BF                                          LOPs
conv    128  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 128  0.177 BF                                          LOPs
conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256  1.595 BF                                          LOPs
res   12                  52 x  52 x 256   ->    52 x  52 x 256
conv    128  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 128  0.177 BF                                          LOPs
conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256  1.595 BF                                          LOPs
res   15                  52 x  52 x 256   ->    52 x  52 x 256
conv    128  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 128  0.177 BF                                          LOPs
conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256  1.595 BF                                          LOPs
res   18                  52 x  52 x 256   ->    52 x  52 x 256
conv    128  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 128  0.177 BF                                          LOPs
conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256  1.595 BF                                          LOPs
res   21                  52 x  52 x 256   ->    52 x  52 x 256
conv    128  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 128  0.177 BF                                          LOPs
conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256  1.595 BF                                          LOPs
res   24                  52 x  52 x 256   ->    52 x  52 x 256
conv    128  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 128  0.177 BF                                          LOPs
conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256  1.595 BF                                          LOPs
res   27                  52 x  52 x 256   ->    52 x  52 x 256
conv    128  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 128  0.177 BF                                          LOPs
conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256  1.595 BF                                          LOPs
res   30                  52 x  52 x 256   ->    52 x  52 x 256
conv    128  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 128  0.177 BF                                          LOPs
conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256  1.595 BF                                          LOPs
res   33                  52 x  52 x 256   ->    52 x  52 x 256
conv    512  3 x 3 / 2    52 x  52 x 256   ->    26 x  26 x 512  1.595 BF                                          LOPs
conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256  0.177 BF                                          LOPs
conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512  1.595 BF                                          LOPs
res   37                  26 x  26 x 512   ->    26 x  26 x 512
conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256  0.177 BF                                          LOPs
conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512  1.595 BF                                          LOPs
res   40                  26 x  26 x 512   ->    26 x  26 x 512
conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256  0.177 BF                                          LOPs
conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512  1.595 BF                                          LOPs
res   43                  26 x  26 x 512   ->    26 x  26 x 512
conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256  0.177 BF                                          LOPs
conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512  1.595 BF                                          LOPs
res   46                  26 x  26 x 512   ->    26 x  26 x 512
conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256  0.177 BF                                          LOPs
conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512  1.595 BF                                          LOPs
res   49                  26 x  26 x 512   ->    26 x  26 x 512
conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256  0.177 BF                                          LOPs
conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512  1.595 BF                                          LOPs
res   52                  26 x  26 x 512   ->    26 x  26 x 512
conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256  0.177 BF                                          LOPs
conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512  1.595 BF                                          LOPs
res   55                  26 x  26 x 512   ->    26 x  26 x 512
conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256  0.177 BF                                          LOPs
conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512  1.595 BF                                          LOPs
res   58                  26 x  26 x 512   ->    26 x  26 x 512
conv   1024  3 x 3 / 2    26 x  26 x 512   ->    13 x  13 x1024  1.595 BF                                          LOPs
conv    512  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 512  0.177 BF                                          LOPs
conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024  1.595 BF                                          LOPs
res   62                  13 x  13 x1024   ->    13 x  13 x1024
conv    512  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 512  0.177 BF                                          LOPs
conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024  1.595 BF                                          LOPs
res   65                  13 x  13 x1024   ->    13 x  13 x1024
conv    512  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 512  0.177 BF                                          LOPs
conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024  1.595 BF                                          LOPs
res   68                  13 x  13 x1024   ->    13 x  13 x1024
conv    512  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 512  0.177 BF                                          LOPs
conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024  1.595 BF                                          LOPs
res   71                  13 x  13 x1024   ->    13 x  13 x1024
conv    512  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 512  0.177 BF                                          LOPs
conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024  1.595 BF                                          LOPs
conv    512  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 512  0.177 BF                                          LOPs
conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024  1.595 BF                                          LOPs
conv    512  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 512  0.177 BF                                          LOPs
conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024  1.595 BFLOPs
conv     24  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x  24  0.008 BFLOPs
yolo
route  79
conv    256  1 x 1 / 1    13 x  13 x 512   ->    13 x  13 x 256  0.044 BFLOPs
upsample            2x    13 x  13 x 256   ->    26 x  26 x 256
route  85 61
conv    256  1 x 1 / 1    26 x  26 x 768   ->    26 x  26 x 256  0.266 BFLOPs
conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512  1.595 BFLOPs
conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256  0.177 BFLOPs
conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512  1.595 BFLOPs
conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256  0.177 BFLOPs
conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512  1.595 BFLOPs
conv     24  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x  24  0.017 BFLOPs
yolo
route  91
conv    128  1 x 1 / 1    26 x  26 x 256   ->    26 x  26 x 128  0.044 BFLOPs
upsample            2x    26 x  26 x 128   ->    52 x  52 x 128
route  97 36
conv    128  1 x 1 / 1    52 x  52 x 384   ->    52 x  52 x 128  0.266 BFLOPs
conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256  1.595 BFLOPs
conv    128  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 128  0.177 BFLOPs
conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256  1.595 BFLOPs
conv    128  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 128  0.177 BFLOPs
conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256  1.595 BFLOPs
conv     24  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x  24  0.033 BFLOPs
yolo
Loading weights from backup/yolov3-voc_20000.weights...Done!
Eminem.jpg: Predicted in 0.049594 seconds.
eye: 91%
head: 83%

批量测试图片并保存在自定义文件夹下
yolov3-voc.cfg(cfg文件夹下)文件中batch和subdivisions两项必须为1。
用下面代码替换detector.c文件(example文件夹下)的void test_detector函数(注意有3处要改成自己的路径)

void test_detector(char *datacfg, char *cfgfile, char *weightfile, char *filename, float thresh, float hier_thresh, char *outfile, int fullscreen)
{
    list *options = read_data_cfg(datacfg);
    char *name_list = option_find_str(options, "names", "data/names.list");
    char **names = get_labels(name_list);
 
    image **alphabet = load_alphabet();
    network *net = load_network(cfgfile, weightfile, 0);
    set_batch_network(net, 1);
    srand(2222222);
    double time;
    char buff[256];
    char *input = buff;
    float nms=.45;
    int i=0;
    while(1){
        if(filename){
            strncpy(input, filename, 256);
            image im = load_image_color(input,0,0);
            image sized = letterbox_image(im, net->w, net->h);
        //image sized = resize_image(im, net->w, net->h);
        //image sized2 = resize_max(im, net->w);
        //image sized = crop_image(sized2, -((net->w - sized2.w)/2), -((net->h - sized2.h)/2), net->w, net->h);
        //resize_network(net, sized.w, sized.h);
            layer l = net->layers[net->n-1];
 
 
            float *X = sized.data;
            time=what_time_is_it_now();
            network_predict(net, X);
            printf("%s: Predicted in %f seconds.\n", input, what_time_is_it_now()-time);
            int nboxes = 0;
            detection *dets = get_network_boxes(net, im.w, im.h, thresh, hier_thresh, 0, 1, &nboxes);
            //printf("%d\n", nboxes);
            //if (nms) do_nms_obj(boxes, probs, l.w*l.h*l.n, l.classes, nms);
            if (nms) do_nms_sort(dets, nboxes, l.classes, nms);
                draw_detections(im, dets, nboxes, thresh, names, alphabet, l.classes);
                free_detections(dets, nboxes);
            if(outfile)
             {
                save_image(im, outfile);
             }
            else{
                save_image(im, "predictions");
#ifdef OPENCV
                cvNamedWindow("predictions", CV_WINDOW_NORMAL); 
                if(fullscreen){
                cvSetWindowProperty("predictions", CV_WND_PROP_FULLSCREEN, CV_WINDOW_FULLSCREEN);
                }
                show_image(im, "predictions");
                cvWaitKey(0);
                cvDestroyAllWindows();
#endif
            }
            free_image(im);
            free_image(sized);
            if (filename) break;
         } 
        else {
            printf("Enter Image Path: ");
            fflush(stdout);
            input = fgets(input, 256, stdin);
            if(!input) return;
            strtok(input, "\n");
   
            list *plist = get_paths(input);
            char **paths = (char **)list_to_array(plist);
             printf("Start Testing!\n");
            int m = plist->size;
            if(access("/home/learner/darknet/data/out",0)==-1)//"/home/learner/darknet/data"修改成自己的路径
            {
              if (mkdir("/home/learner/darknet/data/out",0777))//"/home/learner/darknet/data"修改成自己的路径
               {
                 printf("creat file bag failed!!!");
               }
            }
            for(i = 0; i < m; ++i){
             char *path = paths[i];
             image im = load_image_color(path,0,0);
             image sized = letterbox_image(im, net->w, net->h);
        //image sized = resize_image(im, net->w, net->h);
        //image sized2 = resize_max(im, net->w);
        //image sized = crop_image(sized2, -((net->w - sized2.w)/2), -((net->h - sized2.h)/2), net->w, net->h);
        //resize_network(net, sized.w, sized.h);
        layer l = net->layers[net->n-1];
 
 
        float *X = sized.data;
        time=what_time_is_it_now();
        network_predict(net, X);
        printf("Try Very Hard:");
        printf("%s: Predicted in %f seconds.\n", path, what_time_is_it_now()-time);
        int nboxes = 0;
        detection *dets = get_network_boxes(net, im.w, im.h, thresh, hier_thresh, 0, 1, &nboxes);
        //printf("%d\n", nboxes);
        //if (nms) do_nms_obj(boxes, probs, l.w*l.h*l.n, l.classes, nms);
        if (nms) do_nms_sort(dets, nboxes, l.classes, nms);
        draw_detections(im, dets, nboxes, thresh, names, alphabet, l.classes);
        free_detections(dets, nboxes);
        if(outfile){
            save_image(im, outfile);
        }
        else{
             
             char b[2048];
            sprintf(b,"/home/learner/darknet/data/out/%s",GetFilename(path));//"/home/leaner/darknet/data"修改成自己的路径
            
            save_image(im, b);
            printf("save %s successfully!\n",GetFilename(path));
#ifdef OPENCV
            cvNamedWindow("predictions", CV_WINDOW_NORMAL); 
            if(fullscreen){
                cvSetWindowProperty("predictions", CV_WND_PROP_FULLSCREEN, CV_WINDOW_FULLSCREEN);
            }
            show_image(im, "predictions");
            cvWaitKey(0);
            cvDestroyAllWindows();
#endif
        }
 
        free_image(im);
        free_image(sized);
        if (filename) break;
        }
      }
    }
}

并且在detector.c中增加头文件如下:

#include <unistd.h>  /* Many POSIX functions (but not all, by a large margin) */
#include <fcntl.h>   /* open(), creat() - and fcntl() */

在前面添加*GetFilename(char *p)函数

#include "darknet.h"
#include <sys/stat.h>  //需增加的头文件
#include<stdio.h>
#include<time.h>
#include<sys/types.h>  //需增加的头文件
static int coco_ids[] = {1,2,3,4,5,6,7,8,9,10,11,13,14,15,16,17,18,19,20,21,22,23,24,25,27,28,31,32,33,34,35,36,37,38,39,40,41,42,43,44,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,67,70,72,73,74,75,76,77,78,79,80,81,82,84,85,86,87,88,89,90};
 
char *GetFilename(char *p)
{ 
    static char name[20]={""};
    char *q = strrchr(p,'/') + 1;
    strncpy(name,q,6);
    return name;
}

在darknet下重新make

make
make clean

执行批量测试命令如下

./darknet detector test cfg/voc.data cfg/yolov3-voc.cfg backup/yolov3-voc_20000.weights
layer     filters    size              input                output
conv     32  3 x 3 / 1   416 x 416 x   3   ->   416 x 416 x  32  0.299 BFLOPs
conv     64  3 x 3 / 2   416 x 416 x  32   ->   208 x 208 x  64  1.595 BFLOPs
    .......
conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256  1.595 BFLOPs
conv    255  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 255  0.353 BFLOPs
detection
Loading weights from yolov3.weights...Done!
Enter Image Path:

Enter Image Path:后面输入你的txt文件路径(你准备好的所有测试图片的路径全部存放在一个txt文件里),你可以复制voc.data文件里的valid后面的路径,就可以了,如:

/home/xxx/darknet/data/voc/2007_test.txt

然后所有的图片都保存在了data/out文件夹下

生成预测结果

  • ./darknet detector valid
  • yolov3-voc.cfg(cfg文件夹下)文件中batch和subdivisions两项必须为1。
  • 结果生成在的results指定的目录下以开头的若干文件中,若没有指定results,那么默认为/results。
 执行语句如下:/*在终端只返回用时,在./results/comp4_det_test_[类名].txt里保存测试结果*/
./darknet detector valid cfg/voc.data cfg/yolov3-voc.cfg backup/yolov3-voc_20000.weights

视频检测

运行如下代码:

./darknet detector demo cfg/voc.data cfg/yolov3-voc.cfg backup/yolov3-voc_40000.weights Cow_video.mp4

视频检测及保存相关文件:https://blog.csdn.net/cgt19910923/article/details/80525366
测试视频主要是调用到detector demo,主要修改的是demo.c中的demo函数。

./darknet detector demo ./cfg/voc.data ./cfg/yolov3-voc.cfg ./results/yolov3-voc_final.weights 1.mp4 -gpus 0,1

在image.c中定义视频保存函数

void save_video(image p, CvVideoWriter *mVideoWriter)
{
	image copy = copy_image(p);
	if(p.c == 3) rgbgr_image(copy);
	int x,y,k;
	IplImage *disp = cvCreateImage(cvSize(p.w,p.h), IPL_DEPTH_8U, p.c);
	int step = disp->widthStep;
	for(y = 0; y < p.h; ++y){
		for(x = 0; x < p.w; ++x){
			for(k= 0; k < p.c; ++k){
				disp->imageData[y*step + x*p.c + k] = (unsigned char)(get_pixel(copy,x,y,k)*255); 
			}
		}
	}
	cvWriteFrame(mVideoWriter,disp);
	cvReleaseImage(&disp);
	free_image(copy);
}

修改demo.c:

#define DEMO 1
#define SAVEVIDEO
#ifdef OPENCV
#ifdef SAVEVIDEO
static CvVideoWriter *mVideoWriter;
#endif
void demo(char *cfgfile, char *weightfile, float thresh, int cam_index, const char *filename, char **names, int classes, int delay, char *prefix, int avg_frames, float hier, int w, int h, int frames, int fullscreen)
{
//demo_frame = avg_frames;
 
image **alphabet = load_alphabet();
 
demo_names = names;
 
demo_alphabet = alphabet;
 
demo_classes = classes;
 
demo_thresh = thresh;
 
demo_hier = hier;
 
printf("Demo\n");
 
net = load_network(cfgfile, weightfile, 0);
 
set_batch_network(net, 1);
 
pthread_t detect_thread;
 
pthread_t fetch_thread;
 
srand(2222222);
 
int i;
demo_total = size_network(net);
predictions = calloc(demo_frame, sizeof(float*));
for (i = 0; i < demo_frame; ++i){
predictions[i] = calloc(demo_total, sizeof(float));
}
avg = calloc(demo_total, sizeof(float));
 
if(filename){
 
printf("video file: %s\n", filename);
 
cap = cvCaptureFromFile(filename);
 
#ifdef SAVEVIDEO
 
if(cap){
 
//int mfps = cvGetCaptureProperty(cap,CV_CAP_PROP_FPS); //local video file,needn't change
 
int mfps = 200;
 
mVideoWriter=cvCreateVideoWriter("Output.avi",CV_FOURCC('M','J','P','G'),mfps,cvSize(cvGetCaptureProperty(cap,CV_CAP_PROP_FRAME_WIDTH),cvGetCaptureProperty(cap,CV_CAP_PROP_FRAME_HEIGHT)),1);
 
}
 
#endif
 
}else{
 
cap = cvCaptureFromCAM(cam_index);
 
#ifdef SAVEVIDEO
 
if(cap){
 
//int mfps = cvGetCaptureProperty(cap,CV_CAP_PROP_FPS); //webcam video file,need change.
 
int mfps = 25; //the output video FPS,you can set here.
 
mVideoWriter=cvCreateVideoWriter("Output_webcam.avi",CV_FOURCC('M','J','P','G'),mfps,cvSize(cvGetCaptureProperty(cap,CV_CAP_PROP_FRAME_WIDTH),cvGetCaptureProperty(cap,CV_CAP_PROP_FRAME_HEIGHT)),1);
 
}
 
#endif
 
if(w){
 
cvSetCaptureProperty(cap, CV_CAP_PROP_FRAME_WIDTH, w);
 
}
 
if(h){
 
cvSetCaptureProperty(cap, CV_CAP_PROP_FRAME_HEIGHT, h);
 
}
 
if(frames){
 
cvSetCaptureProperty(cap, CV_CAP_PROP_FPS, frames);
 
}
 
}
 
 
 
if(!cap) error("Couldn't connect to webcam.\n");
 
 
 
buff[0] = get_image_from_stream(cap);
 
buff[1] = copy_image(buff[0]);
 
buff[2] = copy_image(buff[0]);
 
buff_letter[0] = letterbox_image(buff[0], net->w, net->h);
 
buff_letter[1] = letterbox_image(buff[0], net->w, net->h);
 
buff_letter[2] = letterbox_image(buff[0], net->w, net->h);
 
ipl = cvCreateImage(cvSize(buff[0].w,buff[0].h), IPL_DEPTH_8U, buff[0].c);
 
 
 
int count = 0;
 
if(!prefix){
 
cvNamedWindow("Demo", CV_WINDOW_NORMAL);
 
if(fullscreen){
 
cvSetWindowProperty("Demo", CV_WND_PROP_FULLSCREEN, CV_WINDOW_FULLSCREEN);
 
} else {
 
cvMoveWindow("Demo", 0, 0);
 
cvResizeWindow("Demo", 1352, 1013);
 
}
 
}
 
 
 
demo_time = what_time_is_it_now();
 
 
 
while(!demo_done){
 
buff_index = (buff_index + 1) %3;
 
if(pthread_create(&fetch_thread, 0, fetch_in_thread, 0)) error("Thread creation failed");
 
if(pthread_create(&detect_thread, 0, detect_in_thread, 0)) error("Thread creation failed");
 
if(!prefix){
 
#ifdef SAVEVIDEO
 
save_video(buff[0],mVideoWriter);
 
#endif
 
fps = 1./(what_time_is_it_now() - demo_time);
 
demo_time = what_time_is_it_now();
 
display_in_thread(0);
 
}else{
 
char name[256];
 
sprintf(name, "%s_%08d", prefix, count);
 
#ifdef SAVEVIDEO
 
save_video(buff[0],mVideoWriter);
 
#else
 
save_image(buff[(buff_index + 1)%3], name);
 
#endif
 
}
 
pthread_join(fetch_thread, 0);
 
pthread_join(detect_thread, 0);
 
++count;
 
}
 
}

用来测试录像视频,其中mfps为保存的处理帧率,保存结果为Output.avi

recall 的实现

替换examples/detector.c的validate_detector_recall函数如下,且调用时增加datacfg参数。

void validate_detector_recall(char *datacfg, char *cfgfile, char *weightfile)
{
    /*
    network net = parse_network_cfg_custom(cfgfile, 1);    // set batch=1
    if (weightfile) {
        load_weights(&net, weightfile);
    }
    //set_batch_network(&net, 1);
    fuse_conv_batchnorm(net);
    srand(time(0));
    */
    network *net = load_network(cfgfile, weightfile, 0);
    set_batch_network(net, 1);
    fprintf(stderr, "Learning Rate: %g, Momentum: %g, Decay: %g\n", net->learning_rate, net->momentum, net->decay);
    srand(time(0));
    //list *plist = get_paths("data/coco_val_5k.list");
    list *options = read_data_cfg(datacfg);
    char *valid_images = option_find_str(options, "valid", "data/train.txt");
    list *plist = get_paths(valid_images);
    char **paths = (char **)list_to_array(plist);

    //layer l = net.layers[net.n - 1];
    layer l = net->layers[net->n-1];
    int j, k;

    int m = plist->size;
    int i = 0;

    float thresh = .001;
    float iou_thresh = .5;
    float nms = .4;

    int total = 0;
    int correct = 0;
    int proposals = 0;
    float avg_iou = 0;

    for (i = 0; i < m; ++i) {
        char *path = paths[i];
        image orig = load_image_color(path, 0, 0);       
// image orig = load_image(path, 0, 0, net.c);
        image sized = resize_image(orig, net->w, net->h);
        char *id = basecfg(path);
        network_predict(net, sized.data);
        int nboxes = 0;
        detection *dets = get_network_boxes(net, sized.w, sized.h, thresh, .5, 0, 1, &nboxes);
        if (nms) do_nms_obj(dets, nboxes, 1, nms);

        char labelpath[4096];
       // replace_image_to_label(path, labelpath);
        find_replace(path, "images", "labels", labelpath);
        find_replace(labelpath, "JPEGImages", "labels", labelpath);
        find_replace(labelpath, ".jpg", ".txt", labelpath);
        find_replace(labelpath, ".JPEG", ".txt", labelpath);
        int num_labels = 0;
        box_label *truth = read_boxes(labelpath, &num_labels);
        for (k = 0; k < nboxes; ++k) {
            if (dets[k].objectness > thresh) {
                ++proposals;
            }
        }
        for (j = 0; j < num_labels; ++j) {
            ++total;
            box t = { truth[j].x, truth[j].y, truth[j].w, truth[j].h };
            float best_iou = 0;
            for (k = 0; k < nboxes; ++k) {  //重点在这里
                float iou = box_iou(dets[k].bbox, t);
                if (dets[k].objectness > thresh && iou > best_iou) {
                    best_iou = iou;
                }
            }
            avg_iou += best_iou;
            if (best_iou > iou_thresh) {
                ++correct;
            }
        }
        //fprintf(stderr, " %s - %s - ", paths[i], labelpath);
        fprintf(stderr, "%5d %5d %5d\tRPs/Img: %.2f\tIOU: %.2f%%\tRecall:%.2f%%\n", i, correct, total, (float)proposals / (i + 1), avg_iou * 100 / total, 100.*correct / total);
        free(id);
        free_image(orig);
        free_image(sized);
    }
}

调用更换如下:

validate_detector_recall(datacfg, cfg, weights);

============================================
下载第三方库,重新编译运行

git clone https://github.com/LianjiLi/yolo-compute-map.git

修改darknet/examples/detector.c中validate_detector()

char *valid_images = option_find_str(options, "valid", "./data/2007_test.txt");//改成自己的测试文件路径

if(!outfile) outfile = "comp4_det_test_";
        fps = calloc(classes, sizeof(FILE *));
        for(j = 0; j < classes; ++j){
            snprintf(buff, 1024, "%s/%s.txt", prefix, names[j]);//删除outfile参数以及对应的%s
            fps[j] = fopen(buff, "w");

darknet文件夹下运行

./darknet detector valid cfg/voc.data cfg/yolov3-tiny.cfg backup/yolov3tiny_164000.weights(改为自己的模型路径)

在本文件夹下运行python compute_mAP.py
compute_mAP.py中的test.txt文件内容只有文件名字,不带绝对路径,不带后缀

Ref

YOLOv3批量测试图片并保存在自定义文件夹下

https://blog.csdn.net/mieleizhi0522/article/details/79989754

YOLO官网

https://pjreddie.com/darknet/yolo/

DarkNet-YOLOv3 训练自己的数据集 Ubuntu16.04+cuda8.0

https://zhuanlan.zhihu.com/p/35490655

DarkNet-Yolo使用指南

https://clavichord93.wordpress.com/2017/05/11/darknetyolo-shi-yong-zhi-nan/

源码出处

https://github.com/pjreddie/darknet

YOLOv3训练自己的数据(GPU版本)

https://blog.csdn.net/u012135425/article/details/80294884

较为全面

https://blog.csdn.net/runner668/article/details/80579063

视频检测

https://blog.csdn.net/cgt19910923/article/details/80525366

https://blog.csdn.net/sinat_33718563/article/details/79964758

yolov3 improve the final train and detect

https://github.com/AlexeyAB/darknet

recall实现参考

https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

整体参考:

https://clavichord93.wordpress.com/2017/05/11/darknetyolo-shi-yong-zhi-nan/

https://blog.csdn.net/eloise_29/article/details/70215338

后续参考

https://pjreddie.com/darknet/yolo/#demo

https://blog.csdn.net/helloworld1213800/article/details/79749359

https://blog.csdn.net/cgt19910923/article/details/79725875

https://blog.csdn.net/jahonn/article/details/80824014

https://blog.csdn.net/lilai619/article/details/79695109

https://blog.csdn.net/sinat_33718563/article/details/79964758

https://maozezhong.github.io/2018/04/29/yolo系列/yolo系列(1):使用yolov3检测红绿灯/

https://zhuanlan.zhihu.com/p/35490655

https://clavichord93.wordpress.com/2017/05/11/darknetyolo-shi-yong-zhi-nan/

https://github.com/pjreddie/darknet

https://blog.csdn.net/Patrick_Lxc/article/details/80615433

https://blog.csdn.net/zhangjunbob/article/details/52769381

darknet的浅层特征可视化请参看:https://www.cnblogs.com/pprp/p/10146355.html

AlexyAB大神总结的优化经验请参看:https://www.cnblogs.com/pprp/p/10204480.html

如何使用Darknet进行分类请参看:https://www.cnblogs.com/pprp/p/10342335.html

Darknet loss可视化软件请参看:https://www.cnblogs.com/pprp/p/10248436.html

如何设计更改YOLO网络结构:https://pprp.github.io/2018/09/20/tricks.html

YOLO详细改进总结:https://pprp.github.io/2018/06/20/yolo.html
SSD关键源码解析

https://zhuanlan.zhihu.com/p/25100992

你可能感兴趣的:(深度框架学习,darknet)