用AlexNet训练自己的数据集
参考资料
本篇重点在于讲述如何讲自己的数据集转化为Caffe能用的数据集,至于具体的网络分类和调参方法,以及不同网络的优缺点,咱们在之后再谈。同时本文几乎就是使用的上述参考资料的方法,这篇文章等同于自己实践操作的记录
参考资料里面好多内容都要慎重考虑,比如网络就没有给全,resize图片也没有给出
目标
之前大体了解了Caffe的网络是怎么构建的,这里介绍下如何具体的训练自己的数据集(这大概也是学Caffe的最终目标)
创建自己的图像数据集
首先我们把我们的图片数据集转换成256X256
(鬼知道为啥下面的有一个Resize代码不能用)
方法来源于薛开宇的笔记
编一个脚本
for name in /path/to/imagenet/val/*.JPEG; do
convert -resize 256x256\! $name $name #本身我们没有这个convert指令,系统会提示你装哪个,装就好了
done
然后我们把我们的图像数据集转成标签txt文本
inputTxt.py
import argparse
import sys,os
def writeouput(result,path):
if os.path.isfile(path):
f=open(path,'a')
f.write(result+'\n')
f.close()
print(result + " " + "success")
else:
f=open(path,'w')
f.write(result+'\n')
f.close()
print(result + " " + "success")
def main(argv):
parser = argparse.ArgumentParser()
parser.add_argument(
"image_name",
help="Input image, directory, or npy."
)
parser.add_argument(
"image_label",
help="Output npy filename."
)
args=parser.parse_args()
writeouput(args.image_name+" "+args.image_label,'/home/demo/caffe/MyCaffe/DataFile/inputTxt/HereIsInputTxt.txt')
if __name__ == '__main__':
main(sys.argv)
shell指令
#!/bin/bash
image_path='/home/demo/caffe/data/FINALPaper/corel1000/*.jpg'
image_label='0'
count=0
for image_name in $image_path;
do python inputTxt.py $image_name $image_label;
count=$((count+1))
echo $count
done
相比参考资料我加了点输出好让使用时在运行的时候能够观察到输出
备注:shell的数字运算是 name =$((name+1)),等号两边不能有空格
然后就在我们指定的位置生成了我们要的txt文档
划分训练集和测试集合
from numpy.matlib import random
dataSet=[]
fileIn=open('/home/demo/caffe/MyCaffe/DataFile/inputTxt/HereIsInputTxt.txt')
for line in fileIn.readlines():
dataSet.append(line.strip())
random.shuffle(dataSet)
pos=int(len(dataSet)*.75)
traindata=dataSet[:pos]
testdata=dataSet[pos:]
f=open('/home/demo/caffe/Mycaffe/DataFile/pictrain.txt','w')
for row in traindata:
f.write(row+'\n')
f.close()
f=open('/home/demo/caffe/Mycaffe/DataFile/pictest.txt','w')
for row in testdata:
f.write(row+'\n')
f.close()
记得手动创建pictrain.txt和pictest.txt,要不然会报错。
生成lmdb文件
再看下面的文档之前,咱们先说下咱们要改哪些参数,以及上面咱们做的是什么。
上面做的其实是用电脑自动的方法,找出来接下来要用的图片的路径。
下面的
EXAMPLE是之后存放lmdb文件的地方
DATA是咱们存放之前~生成的~用于存放~训练和测试用的图片的路径
TOOLS是咱们转换用的工具在的地方
TRAIN_DATA_ROOT和VAL_DATA_ROOT都是找咱们图片路径的位置,所以如果咱们之前弄得是绝对路径,这里就设根目录就好
之后再把那两个存放测试用的txt文件,在代码里面的名字改对就好了。
以及注意,这个很不人性,所以生成时一定要注意文件夹里面一定保证没有咱们要生成的文件夹(空的也不行)
#!/usr/bin/env sh
# Create the imagenet lmdb inputs
# N.B. set the path to the imagenet train + val data dirs
set -e
EXAMPLE=/home/demo/caffe/MyCaffe/MyWorks/Test
DATA=/home/demo/caffe/MyCaffe/DataFile/inputTxt
TOOLS=/home/demo/caffe/build/tools
TRAIN_DATA_ROOT=/
VAL_DATA_ROOT=/
# Set RESIZE=true to resize the images to 256x256. Leave as false if images have
# already been resized using another tool.
RESIZE=false
if $RESIZE; then
RESIZE_HEIGHT=256
RESIZE_WIDTH=256
else
RESIZE_HEIGHT=0
RESIZE_WIDTH=0
fi
if [ ! -d "$TRAIN_DATA_ROOT" ]; then
echo "Error: TRAIN_DATA_ROOT is not a path to a directory: $TRAIN_DATA_ROOT"
echo "Set the TRAIN_DATA_ROOT variable in create_imagenet.sh to the path" \
"where the ImageNet training data is stored."
exit 1
fi
if [ ! -d "$VAL_DATA_ROOT" ]; then
echo "Error: VAL_DATA_ROOT is not a path to a directory: $VAL_DATA_ROOT"
echo "Set the VAL_DATA_ROOT variable in create_imagenet.sh to the path" \
"where the ImageNet validation data is stored."
exit 1
fi
echo "Creating train lmdb..."
GLOG_logtostderr=1 $TOOLS/convert_imageset \
--resize_height=$RESIZE_HEIGHT \
--resize_width=$RESIZE_WIDTH \
--shuffle \
$TRAIN_DATA_ROOT \
$DATA/pictrain.txt \
$EXAMPLE/train_lmdb
echo "Creating val lmdb..."
GLOG_logtostderr=1 $TOOLS/convert_imageset \
--resize_height=$RESIZE_HEIGHT \
--resize_width=$RESIZE_WIDTH \
--shuffle \
$VAL_DATA_ROOT \
$DATA/pictest.txt \
$EXAMPLE/test_lmdb
echo "Done."
然后执行就好
对现有的图片求均值
使用imagenet中的make_imagenet_mean.sh对训练图片数据求均值,图片减去均值后,再进行训练和测试,会提高速度和精度。因此,一般在各种模型中都会有这个操作。
那么这个均值怎么来的呢,实际上就是计算所有训练样本的平均值,计算出来后,保存为一个均值文件,在以后的测试中,就可以直接使用这个均值来相减,而不需要对测试图片重新计算。
核心还是调用$Caffe_Root/build/tools下的compute_image_mean函数,
所以不用太害怕,这些脚本是别人编好的,仔细看也不难,
就是调用一个名为compute_image_mean的函数,对train_lmdb(上一步建的文件夹)里面的训练数据进行处理,生成一个imagenet_mean.binaryproto的文件(这个文件明可以自己瞎改)
#!/usr/bin/env sh
# Compute the mean image from the imagenet training lmdb
# N.B. this is available in data/ilsvrc12
EXAMPLE=/home/demo/caffe/MyCaffe/MyWorks/Test
DATA=/home/demo/caffe/MyCaffe/MyWorks/Test
TOOLS=/home/demo/caffe/build/tools
$TOOLS/compute_image_mean $EXAMPLE/train_lmdb \
$DATA/imagenet_mean.binaryproto
echo "Done."
设置网络训练参数
这一步,我觉得是以后一直要调的部分了,不同的网络的训练参数不一样,但是手法一样,所以在这一步注意的是设置的手法和运行的方式,但是对内容倒是没有必要特别注意,在之后的文章里我会尝试编写一种更加直观的程序来完成上述一系列复杂的操作。
创建一个AlexNet.prototxt(里面要改参),那个很简单,这个要找不到就不用学了
有些博客没写全,最好自己从官网Git下来
以及可视化的时候用的不是这个,可视化的时候要重新写data部分,但是一般网上都有,直接下载下来就好
name: "AlexNet"
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mirror: true
crop_size: 227
mean_file: "/home/demo/caffe/MyCaffe/MyWorks/Test/imagenet_mean.binaryproto"
}
data_param {
source: "/home/demo/caffe/MyCaffe/MyWorks/Test/train_lmdb"
batch_size: 256
backend: LMDB
}
}
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
mirror: false
crop_size: 227
mean_file: "/home/demo/caffe/MyCaffe/MyWorks/Test/imagenet_mean.binaryproto"
}
data_param {
source: "/home/demo/caffe/MyCaffe/MyWorks/Test/test_lmdb"
batch_size: 50
backend: LMDB
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_size: 11
stride: 4
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "norm1"
type: "LRN"
bottom: "conv1"
top: "norm1"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "norm1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad: 2
kernel_size: 5
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "norm2"
type: "LRN"
bottom: "conv2"
top: "norm2"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "pool2"
type: "Pooling"
bottom: "norm2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "conv3"
type: "Convolution"
bottom: "pool2"
top: "conv3"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu3"
type: "ReLU"
bottom: "conv3"
top: "conv3"
}
layer {
name: "conv4"
type: "Convolution"
bottom: "conv3"
top: "conv4"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu4"
type: "ReLU"
bottom: "conv4"
top: "conv4"
}
layer {
name: "conv5"
type: "Convolution"
bottom: "conv4"
top: "conv5"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu5"
type: "ReLU"
bottom: "conv5"
top: "conv5"
}
layer {
name: "pool5"
type: "Pooling"
bottom: "conv5"
top: "pool5"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "fc6"
type: "InnerProduct"
bottom: "pool5"
top: "fc6"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 4096
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu6"
type: "ReLU"
bottom: "fc6"
top: "fc6"
}
layer {
name: "drop6"
type: "Dropout"
bottom: "fc6"
top: "fc6"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc7"
type: "InnerProduct"
bottom: "fc6"
top: "fc7"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 4096
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu7"
type: "ReLU"
bottom: "fc7"
top: "fc7"
}
layer {
name: "drop7"
type: "Dropout"
bottom: "fc7"
top: "fc7"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc8"
type: "InnerProduct"
bottom: "fc7"
top: "fc8"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 1000
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "fc8"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "fc8"
bottom: "label"
top: "loss"
}
创建一个solver.prototxt
net: "/home/demo/caffe/MyCaffe/MyWorks/Test/AlexNet.prototxt" #网络模型文件路径
test_iter: 100 #test的迭代次数,批处理大小为50, 100*50为测试集
个数
test_interval: 100 #训练时每迭代500次测试一次
base_lr: 0.01 #学习率
lr_policy: "step" #学习率的改变策略
gamma: 0.1 #学习率每次改变的值
stepsize: 500 #步长,即每迭代500,base_lr=base_lr*gamma
display: 20 #每迭代20次显示,前面调参的时候,可以多显示,真>正训练可调高该数值
max_iter: 1000 #最大迭代次数
momentum: 0.9 #动量
weight_decay: 0.0005 #权重衰减
snapshot: 200 #每迭代200次存储一次Caffemodel。即训练好的模型
snapshot_prefix: "snap_prefix_test" #训练好的模型存放的位置
solver_mode: GPU #使用GPU 训练
~
最后图方便再写一个运行的shell脚本
start_train.sh
#!/usr/bin/env sh
#set -e
cd /home/demo/caffe
./build/tools/caffe train --solver=/home/demo/caffe/MyCaffe/MyWorks/Test/solver.prototxt
如果出现了out of memory调整batch_size
然后就没有然后了,以后要用不同的改prototxt就好