运行环境,Ubuntu16.04 + CUDA 8 + cuDNN6 + Opencv3.1
Windows10 +MSVS2015+CUDA 9.1 + cuDNN7 + Opencv3.1
Ubunutu环境下按照以下网址安装并配置好Darknet :
https://pjreddie.com/darknet/install/
Windows10环境下按照以下网址安装并配置好Darknet (若用上面的链接下载需要自行修改代码才能在windows跑起来,已经有大神帮我们修改好了):
https://github.com/AlexeyAB/darknet#how-to-compile-on-windows
数据集下载链接:百度网盘,提取密码:gorp
先放效果:
1.1准备工作
目录Annotation:存放xml文件,每一个xml对应一张图像,文件中存放的是标记的各个目标的位置和类别信息。后面会讲到如何生成;
其中每个xml文件内容如下:
import os
def main(src, dest):
out_file = open(dest,'w')
with open(dest, 'w') as f:
for name in os.listdir(src):
base_name = os.path.basename(name)
file_name = base_name.split('.')[0]
f.write('%s\n' % file_name)
if __name__ == '__main__':
TrainDir = 'VOCdevkit/VOC2007/JPEGImages'
target = 'VOCdevkit/VOC2007/ImageSets/Main/train.txt'
main(TrainDir, target)
train.txt画风如下,其实就是图片文件名去掉了后缀而已
import xml.etree.ElementTree as ET
import pickle
import os
from os import listdir, getcwd
from os.path import join
sets=[('2007', 'train')] #目录名VOC2007,只trian
classes = ["damage", "polluted"] #我的两个类别,damage和polluted
def convert(size, box):
dw = 1./size[0]
dh = 1./size[1]
x = (box[0] + box[1])/2.0
y = (box[2] + box[3])/2.0
w = box[1] - box[0]
h = box[3] - box[2]
x = x*dw
w = w*dw
y = y*dh
h = h*dh
return (x,y,w,h)
def convert_annotation(year, image_id):
in_file = open('VOCdevkit/VOC%s/Annotations/%s.xml'%(year, image_id))
out_file = open('VOCdevkit/VOC%s/labels/%s.txt'%(year, image_id), 'w')
tree=ET.parse(in_file)
root = tree.getroot()
size = root.find('size')
w = int(size.find('width').text)
h = int(size.find('height').text)
for obj in root.iter('object'):
difficult = obj.find('difficult').text
cls = obj.find('name').text
if cls not in classes or int(difficult) == 1:
continue
cls_id = classes.index(cls)
xmlbox = obj.find('bndbox')
b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text), float(xmlbox.find('ymax').text))
bb = convert((w,h), b)
out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')
wd = getcwd()
for year, image_set in sets:
if not os.path.exists('VOCdevkit/VOC%s/labels/'%(year)):
os.makedirs('VOCdevkit/VOC%s/labels/'%(year))
image_ids = open('VOCdevkit/VOC%s/ImageSets/Main/%s.txt'%(year, image_set)).read().strip().split()
list_file = open('%s_%s.txt'%(year, image_set), 'w')
for image_id in image_ids:
list_file.write('%s/VOCdevkit/VOC%s/JPEGImages/%s.jpg\n'%(wd, year, image_id))
convert_annotation(year, image_id)
list_file.close()
labels文件:
到此为止数据就准备完毕了!准备开始训练
2.1修改配置文件
2.1.1 修改cfg/yolo-voc.2.0.cfg
配置文件前面的参数都很好理解,不需要修改,后面的就是网络结构部分的参数
[net]
batch=64 每batch个样本更新一次参数。
subdivisions=8 如果内存不够大,将batch分割为subdivisions个子batch,每个子batch的大小为batch/subdivisions。
在darknet代码中,会将batch/subdivisions命名为batch。
height=416 input图像的高
width=416 Input图像的宽
channels=3 Input图像的通道数
momentum=0.9 动量
decay=0.0005 权重衰减正则项,防止过拟合
angle=0 通过旋转角度来生成更多训练样本
saturation = 1.5 通过调整饱和度来生成更多训练样本
exposure = 1.5 通过调整曝光量来生成更多训练样本
hue=.1 通过调整色调来生成更多训练样本
learning_rate=0.0001 初始学习率
max_batches = 45000 训练达到max_batches后停止学习
policy=steps 调整学习率的policy,有如下policy:CONSTANT, STEP, EXP, POLY, STEPS, SIG, RANDOM
steps=100,25000,35000 根据batch_num调整学习率
scales=10,.1,.1 学习率变化的比例,累计相乘
要改动的是配置文件后面的部分,因为我只是做两类的分类,所以classes由20改为2,filter由125改为35.
filters=(classes+ coords+ 1)* num。num是每个cell中预测anchor box是数量,每个anchor box预测1个置信度,2个类别,4个bounding box的位置尺寸信息,所以filters=5*(1+2+4)=35;
[convolutional]
size=1
stride=1
pad=1
filters=35 原来是125
activation=linear
[region]
anchors = 1.08,1.19, 3.42,4.41, 6.63,11.38, 9.42,5.11, 16.62,10.52
bias_match=1
classes=2 原来是20
coords=4
num=5
softmax=1
jitter=.2
rescore=1
object_scale=5
noobject_scale=1
class_scale=1
coord_scale=1
absolute=1
thresh = .6
random=0
2.1.2 修改data/voc.names
填上你自己的类别,我的是damage与polluted,一行填写一个。
2.1.3 修改cfg/voc.data
classes改成2;train=2007_train.txt文件的绝对路径;names=voc.names文件的绝对路径;backup为训练过程中模型保存的路径,可以自行定义。
2.2 开始训练
训练命令如下:如果要可视化训练过程中的数据以便分析,请连同2.3一起看完再执行训练
windows下:
使用预训练模型,可以在darknet官网下载darknet19_448.conv.23:
.\darknet.exe detector train .\cfg\voc.data .\cfg\yolo-voc.2.0.cfg .\darknet19_448.conv.23
不使用与训练模型:
.\darknet.exe detector train .\cfg\voc.data .\cfg\yolo-voc.2.0.cfg
Ubunut下(其实对比windows下只有后缀名exe的差异):
./darknet detector train ./cfg/voc.data ./cfg/yolo-voc.2.0.cfg ./darknet19_448.conv.23
./darknet detector train ./cfg/voc.data ./cfg/yolo-voc.2.0.cfg
若训练中断,可以进入backup目录,执行以下命令从指定的模型开始训练(下面相当于是从训练了1000次的模型开始训练):
.\darknet.exe detector train .\cfg\voc.data .\cfg\yolo-voc.2.0.cfg backup\yolo-voc.2.0_1000.weights
2.3 可视化log数据
2>1 | tee train(名字随便起)_log.txt
./darknet detector train cfg/voc.data ./cfg/yolo-voc.2.0.cfg 2>1 | | tee train_log.txt
保存log时会生成两个文件,文件1里保存的是网络加载信息和checkout点保存信息,train_log.txt中保存的是训练信息。长这样:
def extract_log(log_file,new_log_file,key_word):
f = open(log_file)
train_log = open(new_log_file, 'w')
for line in f:
# 去除多gpu的同步log
if 'Syncing' in line:
continue
# 去除除零错误的log
if 'nan' in line:
continue
if key_word in line:
train_log.write(line)
f.close()
train_log.close()
extract_log('train_log.txt','train_log_loss.txt','images') #train_log.txt 用于绘制loss曲线
extract_log('train_log.txt','train_log_iou.txt','IOU'9
使用train_loss_visualization.py脚本可以绘制loss变化曲线 ;
train_loss_visualization.py脚本如下,放在person_train_log.txt 同目录就行 :
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#%matplotlib inline
lines =9873
result = pd.read_csv('person_train_log_loss.txt', skiprows=[x for x in range(lines) if ((x%10!=9) |(x<1000))] ,error_bad_lines=False, names=['loss', 'avg', 'rate', 'seconds', 'images'])
result.head()
result['loss']=result['loss'].str.split(' ').str.get(1)
result['avg']=result['avg'].str.split(' ').str.get(1)
result['rate']=result['rate'].str.split(' ').str.get(1)
result['seconds']=result['seconds'].str.split(' ').str.get(1)
result['images']=result['images'].str.split(' ').str.get(1)
result.head()
result.tail()
#print(result.head())
# print(result.tail())
# print(result.dtypes)
print(result['loss'])
print(result['avg'])
print(result['rate'])
print(result['seconds'])
print(result['images'])
result['loss']=pd.to_numeric(result['loss'])
result['avg']=pd.to_numeric(result['avg'])
result['rate']=pd.to_numeric(result['rate'])
result['seconds']=pd.to_numeric(result['seconds'])
result['images']=pd.to_numeric(result['images'])
result.dtypes
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.plot(result['avg'].values,label='avg_loss')
#ax.plot(result['loss'].values,label='loss')
ax.legend(loc='best')
ax.set_title('The loss curves')
ax.set_xlabel('batches')
fig.savefig('avg_loss')
执行train_loss_visualization.py会在脚本所在路径生成avg_loss.png。