git clone https://github.com/chuanqi305/ssd.git
cd caffe
git checkout ssd
optional ReLU6Parameter relu6_param = 100000; //relu6
optional ConvolutionDepthwiseParameter convolution_depthwise_param = 151; //convolution_depthwise
&
//relu6
message ReLU6Parameter {
optional float negative_slope = 1 [default = 0];
}
//convolution_depthwise
message ConvolutionDepthwiseParameter{
optional uint32 num_output = 1; // The number of outputs for the layer
optional bool bias_term = 2 [default = true]; // whether to have bias terms
// Pad, kernel size, and stride are all given as a single value for equal
// dimensions in all spatial dimensions, or once per spatial dimension.
repeated uint32 pad = 3; // The padding size; defaults to 0
repeated uint32 kernel_size = 4; // The kernel size
repeated uint32 stride = 6; // The stride; defaults to 1
// Factor used to dilate the kernel, (implicitly) zero-filling the resulting
// holes. (Kernel dilation is sometimes referred to by its use in the
// algorithme à trous from Holschneider et al. 1987.)
repeated uint32 dilation = 18; // The dilation; defaults to 1
// For 2D convolution only, the *_h and *_w versions may also be used to
// specify both spatial dimensions.
optional uint32 pad_h = 9 [default = 0]; // The padding height (2D only)
optional uint32 pad_w = 10 [default = 0]; // The padding width (2D only)
optional uint32 kernel_h = 11; // The kernel height (2D only)
optional uint32 kernel_w = 12; // The kernel width (2D only)
optional uint32 stride_h = 13; // The stride height (2D only)
optional uint32 stride_w = 14; // The stride width (2D only)
optional uint32 group = 5 [default = 1]; // The group size for group conv
optional FillerParameter weight_filler = 7; // The filler for the weight
optional FillerParameter bias_filler = 8; // The filler for the bias
enum Engine {
DEFAULT = 0;
CAFFE = 1;
CUDNN = 2;
}
optional Engine engine = 15 [default = DEFAULT];
// The axis to interpret as "channels" when performing convolution.
// Preceding dimensions are treated as independent inputs;
// succeeding dimensions are treated as "spatial".
// With (N, C, H, W) inputs, and axis == 1 (the default), we perform
// N independent 2D convolutions, sliding C-channel (or (C/g)-channels, for
// groups g>1) filters across the spatial axes (H, W) of the input.
// With (N, C, D, H, W) inputs, and axis == 1, we perform
// N independent 3D convolutions, sliding (C/g)-channels
// filters across the spatial axes (D, H, W) of the input.
optional int32 axis = 16 [default = 1];
// Whether to force use of the general ND convolution, even if a specific
// implementation for blobs of the appropriate number of spatial dimensions
// is available. (Currently, there is only a 2D-specific convolution
// implementation; for input blobs with num_axes != 2, this option is
// ignored and the ND implementation will be used.)
optional bool force_nd_im2col = 17 [default = false];
}
MobileNet-SSD 是依赖于我们刚才配置的caffe-ssd
github下载地址:https://github.com/chuanqi305/MobileNet-SSD
将该文件夹放在caffe-ssd/examples下。
其中 MobileNet-SSD-master/template
文件夹下有我们需要训练的 .prototxt 文件,我们需要整改网络,包括修改输入数据集文件夹的路径,输入图片的尺寸,输入的 batch_size 等等参数可以在这里修改。
然后在 MobileNet-SSD
文件夹下,通过 gen_model.sh 来将 template 文件夹里的文件转化成训练文件。我的数据集是识别2种目标,再加上背景,所以后面跟 3。 运行成功后会自动生成一个 example 的文件夹,里面包含的是训练所用的文件。
sh gen_model.sh num_class+1
example 的文件夹:
在根目录创造文件夹data/VOCdekit/Hisense
格式如下:
VOCdevkit
——Hisense
————Annotations #放入所有的xml文件
————ImageSets
——————Main #放入train.txt,val.txt文件
————JPEGImages #放入所有的图片文件
Main中的文件分别表示train.txt是训练集,val.txt是验证集
数据集分成训练集和测试集,用一个python脚本 classify.py 来自己生成.txt文件
import os
import random
trainval_percent = 0.8 # 修改训练集与测试集比例,此时train:test=8:2
train_percent = 0.7 # train 占 trainval 中的 0.7
fdir = '/home/xxx/data/VOCdevkit/Hisense/ImageSets/Main/' # 修改对应路径
xmlfilepath = '/home/xxx/data/VOCdevkit/Hisense/Annotations/' # 修改对应路径
txtsavepath = fdir
total_xml = os.listdir(xmlfilepath)
num=len(total_xml)
list=range(num)
tv=int(num*trainval_percent)
tr=int(tv*train_percent)
trainval= random.sample(list,tv)
train=random.sample(trainval,tr)
ftrainval = open(fdir + 'trainval.txt', 'w')
ftest = open(fdir + 'test.txt', 'w')
ftrain = open(fdir + 'train.txt', 'w')
fval = open(fdir + 'val.txt', 'w')
for i in list:
name=total_xml[i][:-4]+'\n'
if i in trainval:
ftrainval.write(name)
if i in train:
ftrain.write(name)
else:
fval.write(name)
else:
ftest.write(name)
ftrainval.close()
ftrain.close()
fval.close()
ftest .close()
运行后就会在 /ImageSets/Main/
文件夹中出现下面4个txt文件:
接下来就是将图片文件转化为训练用的 lmdb 文件了。
在caffe-ssd/data新建文件夹Hisense,并将VOC0712
文件夹下的create_data.sh、create_list.sh、labelmap_voc.prototxt复制到Hisense文件夹下。
这个文件主要是输入标签与文本的对应关系,以我自己的数据集是两分类,还有一个背景,所以是一个3分类,在我的xml文件中的标签分别为 a
和 b
,里边的 name 要和你自己的数据集的 xml 文件名保持一致。所以内容改为下面就可以了:
item {
name: "none_of_the_above"
label: 0
display_name: "background"
}
item {
name: "a"
label: 1
display_name: "a"
}
item {
name: "b"
label: 2
display_name: "b"
}
这个文件主要是根据我们刚才的 /ImageSets/Main/ 文件夹下的 4 个txt 文件 进一步生成转换 lmdb 需要用的形式, 为下一步 create_data.sh 文件的运行提供方便。 主要修改部分就是后面带有备注的几行。根据自己的情况修改对应路径及文件名称就可以了。
#!/bin/bash
root_dir=$HOME/data/VOCdevkit/ #数据集根目录
sub_dir=ImageSets/Main #创建的Main文件夹路径
bash_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
for dataset in trainval test #训练集和测试集
do
dst_file=$bash_dir/$dataset.txt
if [ -f $dst_file ]
then
rm -f $dst_file
fi
for name in Hisense #数据集名称
do
if [[ $dataset == "test" && $name == "VOC2012" ]]
then
continue
fi
echo "Create list for $name $dataset..."
dataset_file=$root_dir/$name/$sub_dir/$dataset.txt
img_file=$bash_dir/$dataset"_img.txt"
cp $dataset_file $img_file
sed -i "s/^/$name\/JPEGImages\//g" $img_file
sed -i "s/$/.jpg/g" $img_file
label_file=$bash_dir/$dataset"_label.txt"
cp $dataset_file $label_file
sed -i "s/^/$name\/Annotations\//g" $label_file
sed -i "s/$/.xml/g" $label_file
paste -d' ' $img_file $label_file >> $dst_file
rm -f $label_file
rm -f $img_file
done
# Generate image name and size infomation.
if [ $dataset == "test" ]
then
$bash_dir/../../build/tools/get_image_size $root_dir $dst_file $bash_dir/$dataset"_name_size.txt"
fi
# Shuffle trainval file.
if [ $dataset == "trainval" ]
then
rand_file=$dst_file.random
cat $dst_file | perl -MList::Util=shuffle -e 'print shuffle();' > $rand_file
mv $rand_file $dst_file
fi
done
在caffe-ssd/data/Hisense目录下运行:
./create_list.sh
可以看见在Hisense文件夹下下会生成test.txt,trainval.txt,test_name_size.txt
这三个txt 文件:
构建LMDB数据集。
cur_dir=$(cd $( dirname ${BASH_SOURCE[0]} ) && pwd )
root_dir=$cur_dir/../..
cd $root_dir
redo=1
data_root_dir="$HOME/data/VOCdevkit" #数据集根目录
dataset_name="Hisense" #数据集名称
mapfile="$root_dir/data/$dataset_name/labelmap_voc.prototxt" #标签的目录
anno_type="detection"
db="lmdb"
min_dim=0
max_dim=0
width=0
height=0
extra_cmd="--encode-type=jpg --encoded"
if [ $redo ]
then
extra_cmd="$extra_cmd --redo"
fi
for subset in test trainval
do
python $root_dir/scripts/create_annoset.py --anno-type=$anno_type --label-map-file=$mapfile --min-dim=$min_dim --max-dim=$max_dim --resize-width=$width --resize-height=$height --check-label $extra_cmd $data_root_dir $root_dir/data/$dataset_name/$subset.txt $data_root_dir/$dataset_name/$db/$dataset_name"_"$subset"_"$db examples/$dataset_name
done
在caffe-ssd/data/Hisense目录下运行:
./create_data.sh
会在 /home/xxx/data/
VOCdevkit/Hisense/
对应路径下生成lmdb的文件夹
其中会包含生成生成两个 lmdb 文件:Hisense_trainval_lmdb 和 Hisense_test_lmdb
将对应的 lmdb 文件路径加入刚才用 gen_model.sh 生成的 example 文件夹中的 train.prototxt 和 test.prototxt
data_param {
source: "/home/boyun/Hisense/data/lmdb/Hisense_trainval_lmdb"
batch_size: 24
backend: LMDB
}
&
data_param {
source: "/home/boyun/Hisense/data/lmdb/Hisense_test_lmdb"
batch_size: 8
backend: LMDB
}
然后修改 solve_train.prototxt
中的训练超参数和路径,参数解释:solve_train.prototxt 中的训练超参数含义
train_net: "example/MobileNetSSD_train.prototxt"
test_net: "example/MobileNetSSD_test.prototxt"
test_iter: 673
test_interval: 1000
base_lr: 0.0005
display: 10
max_iter: 52000
lr_policy: "multistep"
gamma: 0.5
weight_decay: 0.00005
snapshot: 1000
snapshot_prefix: "snapshot/mobilenet"
solver_mode: GPU
debug_info: false
snapshot_after_train: true
test_initialization: false
average_loss: 10
stepvalue: 20000
stepvalue: 40000
iter_size: 1
type: "RMSProp"
eval_type: "detection"
ap_version: "11point"
最后修改 train.sh 设置好训练的caffe绝对路径(文件跟我这样放不用改)、预训练权重、gpu核
#!/bin/sh
if ! test -f example/MobileNetSSD_train.prototxt ;then
echo "error: example/MobileNetSSD_train.prototxt does not exist."
echo "please use the gen_model.sh to generate your own model."
exit 1
fi
mkdir -p snapshot
../../build/tools/caffe train -solver="solver_train.prototxt" \
-weights="mobilenet_iter_73000.caffemodel" \
-gpu 0
在/home/xxx/caffe-ssd/examples/MobileNet-SSD-master目录下运行:
./train.sh
就可以开始训练了。结果如下:
I0801 16:50:40.645901 7363 caffe.cpp:251] Starting Optimization
I0801 16:50:40.645918 7363 solver.cpp:294] Solving MobileNet-SSD
I0801 16:50:40.645923 7363 solver.cpp:295] Learning Rate Policy: multistep
I0801 16:50:40.649744 7363 blocking_queue.cpp:50] Data layer prefetch queue empty
I0801 16:50:42.626246 7363 solver.cpp:243] Iteration 0, loss = 12.7011
I0801 16:50:42.626294 7363 solver.cpp:259] Train net output #0: mbox_loss = 12.7011 (* 1 = 12.7011 loss)
I0801 16:50:42.626328 7363 sgd_solver.cpp:138] Iteration 0, lr = 0.0005
I0801 16:51:07.516057 7363 solver.cpp:243] Iteration 10, loss = 7.21989
I0801 16:51:07.516435 7363 solver.cpp:259] Train net output #0: mbox_loss = 5.77298 (* 1 = 5.77298 loss)
I0801 16:51:07.516465 7363 sgd_solver.cpp:138] Iteration 10, lr = 0.0005
修改demo.py的配置:
caffe_root = '/home/xxx/caffe-ssd/' #caffe路径
sys.path.insert(0, caffe_root + 'python')
import caffe
net_file= 'deploy.prototxt' #
caffe_model='mobilenet_iter_12000.caffemodel' #模型
test_dir = "images" #图片
if not os.path.exists(caffe_model):
print(caffe_model + " does not exist")
exit()
if not os.path.exists(net_file):
print(net_file + " does not exist")
exit()
net = caffe.Net(net_file,caffe_model,caffe.TEST)
CLASSES = ('background', #标签
'a', 'b')
为了提高模型运行速度,作者在这里将bn层合并到了卷积层中,相当于bn的计算时间就被节省了,对检测速度可能有小幅度的帮助
打开merge_bn.py文件,然后注意修改其中的caffe路径,然后运行:
python merge_bn.py --model example/MobileNetSSD_deploy.prototxt --weights mobilenet_iter_12000.caffemodel
完!