SSD(single shot detector)
https://arxiv.org/abs/1512.02325
记录一下如何使用 github 上 ssd 开源代码 ssd,并用 wider face 数据集进行训练。
关于 caffe 的安装,可以参考:Ubuntu16.04 Caffe 安装步骤记录(超详尽)
源码下载
git clone https://github.com/weiliu89/caffe.git
由于整个代码仓库还包括多个分支,其中 ssd 分支就是ssd的代码实现,所以这里先切换分支
git checkout -b ssd origin/ssd
进入项目目录并编译,Makefile.config 文件最好使用你已经编译过的 caffe 中的Makefile.config,因为之前已经在你的机器上通过了编译,你也配置了相关的信息,如果直接使用 Makefile.config.example 可能会出现错误。
cd caffe
cp Makefile.config.example Makefile.config
make all -j8
将 caffe-ssd 添加到环境变量中
vim ~/.bashrc
export PYTHONPATH=$CAFFE_ROOT/python:PYTHONPATH
source ~/.bashrc
# 有时候不能生效的话,从新进入terminal
编译 pycaffe
make pycaffe -j8
然后在执行数据集的打包,才会有效果
其中可能出现一些问题,下面给出一些解决方法
wider face 数据集是香港中文大学 YangShuo 等收集,其中包含 32203 个图像和 393703 个人脸图像,图像在尺度、姿势、装扮、光照等方面表现出了大的变化,基于 61 个事件类别组织的。
分别下载对应的训练集、验证集、测试集和标注数据
数据集示例图片
关于VOC 数据格式,请看 PASCAL VOC数据集分析
import cv2
import shutil
from xml.dom.minidom import Document
def writexml(filename, saveimg, bboxes, xmlpath):
"""
write to xml style of VOC dataset
:param filename: xml filename
:param saveimg: the image data with shape [H, W, C]
:param bboxes: bounding boxes
:param xmlpath: xml file save path
:return: None
"""
doc = Document()
annotation = doc.createElement('annotation')
doc.appendChild(annotation)
folder = doc.createElement('folder')
folder_name = doc.createTextNode('widerface')
folder.appendChild(folder_name)
annotation.appendChild(folder)
filenamenode = doc.createElement('filename')
filename_name = doc.createTextNode(filename)
filenamenode.appendChild(filename_name)
annotation.appendChild(filenamenode)
source = doc.createElement('source')
annotation.appendChild(source)
database = doc.createElement('database')
database.appendChild(doc.createTextNode('wider face Database'))
source.appendChild(database)
annotation_s = doc.createElement('annotation')
annotation_s.appendChild(doc.createTextNode('PASCAL VOC2007'))
source.appendChild(annotation_s)
flikerid = doc.createElement('flikerid')
flikerid.appendChild(doc.createTextNode('-1'))
source.appendChild(flikerid)
owner = doc.createElement('owner')
name_o = doc.createElement('name')
name_o.appendChild(doc.createTextNode('kinhom'))
owner.appendChild(name_o)
size = doc.createElement('size')
annotation.appendChild(size)
width = doc.createElement('width')
width.appendChild(doc.createTextNode(str(saveimg.shape[1])))
height = doc.createElement('height')
height.appendChild(doc.createTextNode(str(saveimg.shape[0])))
depth = doc.createElement('depth')
depth.appendChild(doc.createTextNode(str(saveimg.shape[2])))
size.appendChild(width)
size.appendChild(height)
size.appendChild(depth)
segmented = doc.createElement('segmented')
segmented.appendChild(doc.createTextNode('0'))
annotation.appendChild(segmented)
for i in range(len(bboxes)):
bbox = bboxes[i]
objects = doc.createElement('object')
annotation.appendChild(objects)
object_name = doc.createElement('name')
object_name.appendChild(doc.createTextNode('face'))
objects.appendChild(object_name)
pose = doc.createElement('pose')
pose.appendChild(doc.createTextNode('Unspecified'))
objects.appendChild(pose)
truncated = doc.createElement('truncated')
truncated.appendChild(doc.createTextNode('1'))
objects.appendChild(truncated)
difficult = doc.createElement('difficult')
difficult.appendChild(doc.createTextNode('0'))
objects.appendChild(difficult)
bndbox = doc.createElement('bndbox')
objects.appendChild(bndbox)
xmin = doc.createElement('xmin')
xmin.appendChild(doc.createTextNode(str(bbox[0])))
bndbox.appendChild(xmin)
ymin = doc.createElement('ymin')
ymin.appendChild(doc.createTextNode(str(bbox[1])))
bndbox.appendChild(ymin)
xmax = doc.createElement('xmax')
xmax.appendChild(doc.createTextNode(str(bbox[2])))
bndbox.appendChild(xmax)
ymax = doc.createElement('ymax')
ymax.appendChild(doc.createTextNode(str(bbox[3])))
bndbox.appendChild(ymax)
f = open(xmlpath, 'w')
f.write(doc.toprettyxml(indent=' '))
f.close()
# wider face dataset folder path
rootdir = "E:/dataset/wider_face"
def convertimgset(img_set):
imgdir = rootdir + "/WIDER_" + img_set + "/images"
gtfilepath = rootdir + "/wider_face_split/wider_face_" + img_set + "_bbx_gt.txt"
fwrite = open(rootdir + "/ImageSets/Main/" + img_set + ".txt", 'w')
index = 0
with open(gtfilepath, 'r') as gtfiles:
while index < 1000: # True
filename = gtfiles.readline()[:-1]
if filename == "":
continue
imgpath = imgdir + "/" + filename
# print(imgpath)
img = cv2.imread(imgpath)
if not img.data:
break
numbbox = int(gtfiles.readline())
bboxes = []
for i in range(numbbox):
line = gtfiles.readline()
lines = line.split()
lines = lines[0: 4]
bbox = (int(lines[0]), int(lines[1]), int(lines[0]) + int(lines[2]), int(lines[1]) + int(lines[3]))
bboxes.append(bbox)
filename = filename.replace("/", "_")
if len(bboxes) == 0:
print("no face")
continue
cv2.imwrite("{}/JPEGImages/{}".format(rootdir, filename), img)
fwrite.write(filename.split('.')[0] + '\n')
xmlpath = '{}/Annotations/{}.xml'.format(rootdir, filename.split('.')[0])
writexml(filename, img, bboxes, xmlpath)
if index % 100 == 0:
print("success NO." + str(index))
index += 1
print(img_set + " total: " + str(index))
fwrite.close()
if __name__=="__main__":
img_sets = ['train', 'val']
for img_set in img_sets:
print("handling " + img_set)
convertimgset(img_set)
shutil.move(rootdir + "/ImageSets/Main/" + "train.txt", rootdir + "/ImageSets/Main/" + "trainval.txt")
shutil.move(rootdir + "/ImageSets/Main/" + "val.txt", rootdir + "/ImageSets/Main/" + "test.txt")
在 $SSD_ROOT/data
下新建一个文件夹 widerface
,并将 $SSD_ROOT/data/VOC0712
中的三个文件 create_data.sh
create_list.sh
labelmap_voc.prototxt
复制到 widerface
中.
首先修改 labelmap_voc.prototxt
中的配置,这个文件中配置了对应的类别名和实际的label,由于 wider face 数据集只检测人脸,所以这里只有两个类别,人脸和背景。
item {
name: "none_of_the_above"
label: 0
display_name: "background"
}
item {
name: "steel"
label: 1
display_name: "steel"
}
然后再修改 create_list.sh
:
line 3 : root_dir 更改为对应数据集的目录,注意这里是存放数据集的上层目录,如我的数据集存放在 `/dataset/wider_face`,那么 root_dir 应该为 /dataset
line 6: for dataset in trainval test 分别对应于生成的数据集中的两个ImageSets/Main中的两个文件,文件执行后会产生两个文件,分别为 trainval.txt 和 test.txt
line 13: for name in wider_face 对应于`/dataset/wider_face` 中的 wider_face 文件夹
执行 create_list.sh 会生成对应的三个文件 test_name_size.txt
test.txt
trainval.txt
最后修改 create_data.sh
:
line 7-8:
data_root_dir="/dataset"
dataset_name="widerface" 这里的widerface对应于刚刚新建的文件夹名称
执行 create_data.sh
,成功后会在/dataset目录下生成widerface
文件夹,其中包含了对应的 lmdb 数据集。
整个模型训练的配置文件在 $SSD_ROOT/examples/ssd
目录下,这里我们新建一个文件夹 widerface
,并把 ssd_pascal.py
复制到该文件夹下,修改配置文件。
修改配置文件 line 82-84
训练数据和测试数据的路径,这里的路径 examples/widerface/widerface_trainval_lmdb
是刚刚生成的 lmdb 数据时自动生成的软链接,链接到了对应的 lmdb 文件。
# The database file for training data. Created by data/VOC0712/create_data.sh
train_data = "examples/widerface/widerface_trainval_lmdb"
# The database file for testing data. Created by data/VOC0712/create_data.sh
test_data = "examples/widerface/widerface_test_lmdb"
line 266
类别数量,这里是背景加人脸,类别数为2
num_classes = 2
line 236-246
修改模型名称,保存路径等
# The name of the model. Modify it if you want.
model_name = "VGG_wider_face_{}".format(job_name)
# Directory which stores the model .prototxt file.
save_dir = "models/VGGNet/wider_face/{}".format(job_name)
# Directory which stores the snapshot of models.
snapshot_dir = "models/VGGNet/wider_face/{}".format(job_name)
# Directory which stores the job script and log file.
job_dir = "jobs/VGGNet/wider_face/{}".format(job_name)
# Directory which stores the detection results.
output_result_dir = "{}/data/VOCdevkit/results/wider_face/{}/Main".format(os.environ['HOME'], job_name)
line 258-263
name size 等文件路径设置
# Stores the test image names and sizes. Created by data/VOC0712/create_list.sh
name_size_file = "data/widerface/test_name_size.txt"
# The pretrained model. We use the Fully convolutional reduced (atrous) VGGNet.
pretrain_model = "models/VGGNet/VGG_ILSVRC_16_layers_fc_reduced.caffemodel"
# Stores LabelMapItem.
label_map_file = "data/widerface/labelmap_voc.prototxt"
line 332
GPU设置,多个 GPU 用逗号隔开
gpus = "0,1"
line 359
测试图片数量
num_test_image = 1000
line 536
这里没有使用预训练模型,预训练模型置为空
train_src_param = '' #'--weights="{}" \\\n'.format(pretrain_model)
执行 ssd_pascal.py
:
python examples/ssd/steel/ssd_pascal.py
看到输出 loss 则成功,如下: