原文.
目录:
- 摘要
- 安装
- 数据层
- 数据集准备
- 训练
- 多节点分布式训练
- 微调
- 测试
- 特征提取和可视化
- 使用Python API
- 调试
- 实例
- Caffe用处
- 进一步阅读
Caffe是BVLC开发的深度学习框架,基于C++和 CUDA C++语言,并提供了Python和Matlab接口. 该框架对于 卷积神经网络CNN、循环神经网络RNN及多层感知器很有帮助. 现在已经具有对于检测、分类、分割以及Spark兼容的分支.
基于Intel结构优化的Caffe(Caffe-Intel)整合了Intel Math Kernel Library(Intel MKL) 2017,并对 Advanced Vector Extensions(AVX)-2 和AVX-512进行了优化,能够支持 Intel Xeon和Intel Xeon Phi处理器. 因此,基于Intel优化的Caffe框架除了包含BVLC Caffe的所有优点外,还能在Intel结构上有效运行,并能在许多节点进行分布式训练. 该文档主要阐述了基于Intel结构优化的Caffe框架的编译、使用一个或多个计算节点进行网络模型的训练以及网络的部署. 另外, 详细介绍了Caffe的一些函数,比如网络微调、不同模型的特征提取与可视化、Caffe的Python API接口.
名词:
- weights 权重 - 也被叫做核(kernels)、滤波器(filters)、模板(templates)、或特征提取器(feature extractors);
- blob 数据块 - 也被叫做张量(tesor),一种N维数据结构,N-D维张量,包含了数据、梯度或权重(偏置bias);
- units 神经元 - 也被叫做 neurons,在数据块进行非线性变化;
- feature maps 特征图 - 也被叫做通道(channels);
- testing 测试 - 也被叫做推断(inference)、分类、得分(scoring)或部署(deployment);
- model 模型 - 也被叫做拓扑结构或网络结构.
快速熟悉Caffe:
- 安装
- 基于MNIST训练和测试LeNet
- 在一些图片上,比如cat和fish-bike,测试训练好的模型,比如,bvlc_googlenet.caffemodel
- 在Cats vs Dogs Challenge对已有模型微调
这里仅针对Ubuntu14.04平台说明Caffe的安装,其他Linux和OS X操作系统,BVLC官方提供了相应的安装方法.
sudo apt-get update &&
sudo apt-get -y install build-essential git cmake &&
sudo apt-get -y install libprotobuf-dev libleveldb-dev libsnappy-dev &&
sudo apt-get -y install libopencv-dev libhdf5-serial-dev protobuf-compiler &&
sudo apt-get -y install --no-install-recommends libboost-all-dev &&
sudo apt-get -y install libgflags-dev libgoogle-glog-dev liblmdb-dev &&
sudo apt-get -y install libatlas-base-dev
对于Ubuntu16.04,需要进行以下库的链接:
find . -type f -exec sed -i -e 's^"hdf5.h"^"hdf5/serial/hdf5.h"^g' -e 's^"hdf5_hl.h"^"hdf5/serial/hdf5_hl.h"^g' '{}' ;
cd /usr/lib/x86_64-linux-gnu
sudo ln -s libhdf5_serial.so.10.1.0 libhdf5.so
sudo ln -s libhdf5_serial_hl.so.10.0.2 libhdf5_hl.so
针对CentOS7,安装以下依赖项:
sudo yum -y update &&
sudo yum -y groupinstall "Development Tools" &&
sudo yum -y install wget cmake git &&
sudo yum -y install protobuf-devel protobuf-compiler boost-devel &&
sudo yum -y install snappy-devel opencv-devel atlas-devel &&
sudo yum -y install gflags-devel glog-devel lmdb-devel leveldb-devel hdf5-devel
# The following steps are only required if some packages failed to install
# add EPEL repository then install missing packages
wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
sudo rpm -ivh epel-release-latest-7.noarch.rpm
sudo yum -y install gflags-devel glog-devel lmdb-devel leveldb-devel hdf5-devel &&
sudo yum -y install protobuf-devel protobuf-compiler boost-devel
# if packages are still not found--download and install/build the packages, e.g.,
# snappy:
wget http://mirror.centos.org/centos/7/os/x86_64/Packages/snappy-devel-1.1.0-3.el7.x86_64.rpm
sudo yum -y install http://mirror.centos.org/centos/7/os/x86_64/Packages/snappy-devel-1.1.0-3.el7.x86_64.rpm
# atlas:
wget http://mirror.centos.org/centos/7/os/x86_64/Packages/atlas-devel-3.10.1-10.el7.x86_64.rpm
sudo yum -y install http://mirror.centos.org/centos/7/os/x86_64/Packages/atlas-devel-3.10.1-10.el7.x86_64.rpm
# opencv:
wget https://github.com/Itseez/opencv/archive/2.4.13.zip
unzip 2.4.13.zip
cd opencv-2.4.13/
mkdir build && cd build
cmake -DCMAKE_INSTALL_PREFIX:PATH=/usr/local ..
NUM_THREADS=$(($(grep 'core id' /proc/cpuinfo | sort -u | wc -l)*2))
make all -j $NUM_THREADS
sudo make install -j $NUM_THREADS
# optional (not required for Caffe)
# other useful repositories for CentOS are RepoForge and IUS:
wget http://pkgs.repoforge.org/rpmforge-release/rpmforge-release-0.5.3-1.el7.rf.x86_64.rpm
sudo rpm -Uvh rpmforge-release-0.5.3-1.el7.rf.x86_64.rpm
wget https://rhel7.iuscommunity.org/ius-release.rpm
sudo rpm -Uvh ius-release*.rpm
各依赖项的说明(source):
- boost - 使用math functions 和 shared pointer的C++库;
- glog、gflags - 提供日志和命令行工具,对于调试十分必要;
- leveldb、lmdb - 数据库IO,用于准备数据;
- protobuf - 用于有效的定义数据结构;
- BLAS(Basic Linear Algebra Subprograms) - 由Intel MKL提供的矩阵乘法、矩阵加法等操作库,类似的有ATLAS、openBLAS等.
Caffe安装指南指出对于CPU来说,安装MKL会有更好的表现.
为了最佳表现,采用Intel MKL 2017,可以免费从Intel® Parallel Studio XE 2017 Beta获取Beta版.
安装好后,正确的环境库可以设置如下(其中的路径需要根据实际情况修改):
echo 'source /opt/intel/bin/compilervars.sh intel64' >> ~/.bashrc
# alternatively edit /mkl/bin/mklvars.sh replacing INSTALLDIR in
# CPRO_PATH= with the actual mkl path: CPRO_PATH=
# echo 'source /mkl/bin/mklvars.sh intel64' >> ~/.bashrc
克隆并准备基于Intel优化的Caffe:
cd ~
# For BVLC caffe use:
# git clone https://github.com/BVLC/caffe.git
# For intel caffe use:
git clone https://github.com/intel/caffe.git
cd caffe
echo "export CAFFE_ROOT=`pwd`" >> ~/.bashrc
source ~/.bashrc
cp Makefile.config.example Makefile.config
# Open Makefile.config and modify it (see comments in the Makefile)
vi Makefile.config
编辑Makefile.config:
# To run on CPU only and to avoid installing CUDA installers, uncomment
CPU_ONLY := 1
# To use MKL, replace atlas with mkl as follows
# (make sure that the BLAS_DIR and BLAS_LIB paths are correct)
BLAS := mkl
BLAS_DIR := $(MKLROOT)/include
BLAS_LIB := $(MKLROOT)/lib/intel64
# To use MKL2017 DNN primitives as the default engine, uncomment
# (however leave it commented if using multinode training)
# USE_MKL2017_AS_DEFAULT_ENGINE := 1
# To customized compiler choice, uncomment and set the following
# CUSTOM_CXX := g++
# To train on multinode uncomment and verify path
# USE_MPI := 1
# CXX := /usr/bin/mpicxx
如果是Ubuntu16.04, 编辑Makefile:
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial/
并创建链接:
cd /usr/lib/x86_64-linux-gnu
sudo ln -s libhdf5_serial.so.10.1.0 libhdf5.so
sudo ln -s libhdf5_serial_hl.so.10.0.2 libhdf5_hl.so
如果是CentOS7和ATLAS库(而不是推荐的MKL库),编辑Makefile:
# Change this line
LIBRARIES += cblas atlas
# to
LIBRARIES += satlas
编译Caffe-Intel:
NUM_THREADS=$(($(grep 'core id' /proc/cpuinfo | sort -u | wc -l)*2))
make -j $NUM_THREADS
# To save the output stream to file makestdout.log use this instead
# make -j $NUM_THREADS 2>&1 | tee makestdout.log
另一种方式是采用cmake方式:
mkdir build
cd build
cmake -DCPU_ONLY=on -DBLAS-mkl -DUSE_MKL2017_AS_DEFAULT_ENGINE=on /path/to/caffe
NUM_THREADS=$(($(grep 'core id' /proc/cpuinfo | sort -u | wc -l)*2))
make -j $NUM_THREADS
安装Python依赖项:
# These steps are OPTIONAL but highly recommended to use the Python interface
sudo apt-get -y install gfortran python-dev python-pip
cd ~/caffe/python
for req in $(cat requirements.txt); do sudo pip install $req; done
sudo pip install scikit-image #depends on other packages
sudo ln -s /usr/include/python2.7/ /usr/local/include/python2.7
sudo ln -s /usr/local/lib/python2.7/dist-packages/numpy/core/include/numpy/ \
/usr/local/include/python2.7/numpy
cd ~/caffe
make pycaffe -j NUM_THREADS
echo "export PYTHONPATH=$CAFFE_ROOT/python" >> ~/.bashrc
source ~/.bashrc
其它安装选项:
# These steps are OPTIONAL to test caffe
make test -j $NUM_THREADS
make runtest #"YOU HAVE DISABLED TESTS" output is OK
# This step is OPTIONAL to disable cam hardware OpenCV driver
# alternatively, the user can skip this and ignore the harmless
# libdc1394 error that may occasionally appears
sudo ln /dev/null /dev/raw1394
该部分是可选,将对一些数据类型进行阐述,对于学习Caffe是非必须的,主要基于Caffe官方提供的材料和 src/caffe/proto/caffe.proto.
Data通过数据层进入Caffe,其位于网络的 最底部,在prototxt文件中进行定义. 关于prototxt文件的更多信息会在Training部分详细介绍. Data可以来自有效的数据库(LevelDB或LMDB), 直接从内存、从磁盘HDF5格式文件或通用图像格式.
常用的输入预处理(比如中心化(mean subtraction)、尺度变换、随机裁剪、镜像处理等)变换可以通过指定transfrom_params(不是所有的数据类型都支持该参数,比如HDF5即不支持)来定义. 如果已经预先进行数据变换,则不必再使用. 常用的数据变换定义方式:
transform_param {
# randomly horizontally mirror the image
mirror: 1
# crop a `crop_size` x `crop_size` patch:
# - at random during training
# - from the center during testing
crop_size: 227
# substract mean value: these mean_values can equivalently be replaced with a mean.binaryproto file as
# mean_file: name_of_mean_file.binaryproto
mean_value: 104
mean_value: 117
mean_value: 123
}
这里,图像要进行裁剪、镜像、中心化变换. 其他数据变换操作可以查看src/caffe/proto/caffe.proto文件的TransformationParameter参数.
LMDB(Lightning Memory-Mapped Databases )和LevelDB数据形式可以作为输入数据的一种有效方式. 他们只对于1-of-K分类任务较适用. 由于Caffe在读取数据集效率问题,这两种数据形式被推荐用于1-of-K任务.
data_params
属性
- source - 包含数据库的路径
- batch_size - 一次处理输入的数目
参数
- backend[默认LEVELDB] - 选择采用LEVELDB或LMDB
- rand_skip - 在开始处跳过的输入数目,对于async sgd有用
详细介绍查看src/caffe/proto/caffe.proto文件中DataParameter参数.
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mirror: 1
crop_size: 227
mean_value: 104
mean_value: 117
mean_value: 123
}
data_param {
source: "examples/imagenet/ilsvrc12_train_lmdb"
batch_size: 32
backend: LMDB
}
}
或者,均值中心化可以通过均值图像(“data/ilsvrc12/imagenet_mean.binaryproto”)来取代mean_value. LMDB数据集的binaryproto的计算为:
cd ~/caffe
build/tools/compute_image_mean examples/imagenet/ilsvr12_train_lmdb
data/ilsvrc12/imagenet_mean.binaryproto
根据实际需求,可以分别替换examples/imagenet/ilsvr12_train_lmdb和data/ilsvrc12/imagenet_mean.binaryproto为合适的lmdb文件夹和binaryproto文件.
直接从图像文件得到images和labels.
image_data_params
属性
- source - 包含了输入数据和labels的文本文件名字
参数
- batch_size[默认为1] - 一次处理的输入数目
- new_height[默认为0] - 调整图像height值,如果为0,则忽略
- new_width[默认为0] - 调整图像width值,如果为0,则忽略
- shuffle[默认为0] - 打乱数据,如果为0,则忽略
- rand_skip[默认为0] - 在开始处跳过的输入数目,对于async sgd有用
详细介绍查看src/caffe/proto/caffe.proto文件中ImageDataParameter参数.
layer {
name: "data"
type: "ImageData"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mirror: true
crop_size: 227
mean_value: 104
mean_value: 117
mean_value: 123
}
image_data_param {
source: "/path/to/file/train.txt"
batch_size: 32
shuffle: 1
}
}
这里,图像进行了顺序打乱、裁剪、镜像和中心化处理.
需要注意的是,文本中每行应为图像名和对应的labels,比如, “tran.txt”形式:
/path/to/images/img3423.jpg 2
/path/to/images/img3424.jpg 13
/path/to/images/img3425.jpg 8
...
指定数据维度时,采用零值blob作为输入数据.
input_params
属性
- shape - 指定为1或top blobs的维度信息
layer {
name: "input"
type: "Input"
top: "data"
input_param {
shape {
dim: 32
dim: 3
dim: 227
dim: 227
}
}
}
等价形式
input: "data"
input_dim: 32
input_dim: 3
input_dim: 227
input_dim: 227
类似于Input类型,不同之处在于需要指定数据类型. 往往用于调试,详细可参考例子
dummy_data_params
属性
- shape - 指定为1或top blobs的维度信息
参数
- data_filler[默认是值为0的ConstantFiller] - 指定top blob的值
layer {
name: "data"
type: "DummyData"
top: "data"
include {
phase: TRAIN
}
dummy_data_param {
data_filler {
type: "constant"
value: 0.01
}
shape {
dim: 32
dim: 3
dim: 227
dim: 227
}
}
}
layer {
name: "data"
type: "DummyData"
top: "label"
include {
phase: TRAIN
}
dummy_data_param {
data_filler {
type: "constant"
}
shape {
dim: 32
}
}
}
直接从内存读取数据,调用方式为:调用MemoryDataLayer::Reset (from C++) 和Net.set_input_arrays (from Python)来读取连续的数据,一般是4D array,一次读取一个batch_size.
由于该方式需要将数据首先送到内存中,速率可能会慢,但一旦放到内存中,这种方式很有效率.
memory_data_param
属性
- bacth_size,channels, height, width - 数据的维度信息
layers {
name: "data"
type: MEMORY_DATA
top: "data"
top: "label"
transform_param {
crop_size: 227
mirror: true
mean_file: "mean.binaryproto"
}
memory_data_param {
batch_size: 32
channels: 3
height: 227
width: 227
}
以HDF5格式文件来读取数据,对于很多任务都是可用的,但一般只用于FP32和FP64数据,不是uint8,故图像数据会很大. 不允许使用transform_param. 只在必要的时候使用该方式.
hdf5_data_param
属性
- source - 包含输入数据和labels路径的文本文件名
- batch_size
参数
- shuffle[默认false] - 打乱HDF5文件顺序
layer {
name: "data"
type: "HDF5_DATA"
top: "data"
top: "label"
include {
phase: TRAIN
}
hdf5_data_param {
source: "examples/hdf5_classification/data/train.txt"
batch_size: 32
}
}
HDF5输出层的作用与其他数据层相反,将输入数据块写入磁盘
hdf5_output_param
属性
- file_name
layer {
name: "data_output"
type: "HDF5_OUTPUT"
bottom: "data"
bottom: "label"
include {
phase: TRAIN
}
hdf5_output_param {
file_name: "output_file.h5"
}
}
用于detection,Read windows from image files class labels.
window_data_param
属性
- source - 指定数据源
- mean_file
- batch_size
参数
- mirror
- crop_size - 随机裁剪图像
- crop_mode[默认”warp”] - 裁剪detection window的模式,比如,”warp”裁剪为固定尺寸, “square”在window四周裁剪紧凑方框
- fg_threshold[默认0.5] - 前景重叠阈值(foreground (object) overlap threshold)
- bg_threshold[默认0.5] - 背景重叠阈值(background (object) overlap threshold)
- fg_fraction[默认0.25]: 前景物体交集(fraction of batch that should be foreground) objects
- context_pad[默认10]: 围绕window补零数目(amount of contextual padding around a window)
详细信息可参考src/caffe/proto/caffe.proto文件中的WindowDataParameter参数.
layers {
name: "data"
type: "WINDOW_DATA"
top: "data"
top: "label"
window_data_param {
source: "/path/to/file/window_train.txt"
mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"
batch_size: 128
mirror: true
crop_size: 227
fg_threshold: 0.5
bg_threshold: 0.5
fg_fraction: 0.25
context_pad: 16
}
}
对于1-of-K分类任务推荐使用LMDB数据格式. 在使用Caffe工具生成LMDB格式数据需要:
- 数据所在目录
- 输出目录,比如mydataset_train_lmdb,必须
- 包含图像名和对应labels的文本文件,比如,”train.txt“,内容格式为:
img3423.jpg 2
img3424.jpg 13
img3425.jpg 8
...
如果数据分散在不同的文件夹, “train.txt“需要包含数据的绝对路径.
create_label_file.py可以生成针对 Kaggle’s Dog vs Cats Competition任务的training和validation数据集划分,同样适用于其它任务.
create_label_file.py
#!/usr/bin/env python
import sys
import os
import os.path
def main():
TRAIN_TEXT_FILE = 'train.txt'
VAL_TEXT_FILE = 'val.txt'
IMAGE_FOLDER = 'train'
# Selects 10% of the images (the ones that end in '2') for validation
fr = open(TRAIN_TEXT_FILE, 'w')
fv = open(VAL_TEXT_FILE, 'w')
filenames = os.listdir(IMAGE_FOLDER)
for filename in filenames:
if filename[0:3] == 'cat':
if filename[-5] == '2':# or filename[-5] == '8':
fv.write(filename + ' 0\n')
else:
fr.write(filename + ' 0\n')
if filename[0:3] == 'dog':
if filename[-5] == '2':# or filename[-5] == '8':
fv.write(filename + ' 1\n')
else:
fr.write(filename + ' 1\n')
fr.close()
fv.close()
# Standard boilerplate to call the main() function to begin the program.
if __name__ == '__main__':
main()
在测试阶段,假设labels不存在的. 如果labels可用,可以采用相同的方法生成 test LMDB数据集.
下面的例子生成training LMDB,工作路径位于$CAFFE_ROOT
#!/usr/bin/env sh
# folder containing the training and validation images
TRAIN_DATA_ROOT=/path/to/training/images
# folder containing the file with the name of training images
DATA=/path/to/file
# folder for the lmdb datasets
OUTPUT=/path/to/output/directory
TOOLS=/path/to/caffe/build/tools
# Set to resize the images to 256x256
RESIZE_HEIGHT=256
RESIZE_WIDTH=256
echo "Creating train lmdb..."
# Delete the shuffle line if shuffle is not desired
GLOG_logtostderr=1 $TOOLS/convert_imageset
--resize_height=$RESIZE_HEIGHT
--resize_width=$RESIZE_WIDTH
--shuffle
$TRAIN_DATA_ROOT/
$DATA/train.txt
$OUTPUT/mydataset_train_lmdb
echo "Done."
计算LMDB数据集的图像均值:
#!/usr/bin/env sh
# Compute the mean image in lmdb dataset
OUTPUT=/path/to/output/directory
# folder for the lmdb datasets and output for mean image
TOOLS=/path/to/caffe/build/tools
$TOOLS/compute_image_mean $OUTPUT/mydataset_train_lmdb
$OUTPUT/train_mean.binaryproto
$TOOLS/compute_image_mean $OUTPUT/mydataset_val_lmdb
$OUTPUT/val_mean.binaryproto
灰度值图像(Gray scale images, 单通道)、RADAR图像(双通道)、视频(videos,四通道)、图像+深度信息(四通道)、brometry(单通道)以及频谱图(spectrograms,单通道)需要进行变换以生成LMDB数据集(参考资料).
有两种调整图像尺寸的方式:
- 变换图像到指定尺寸
- 按比例调整到比指定尺寸相对较小的尺寸,然后中心裁剪大的一边以达到指定尺寸
调整图像尺寸的方法有:
- 基于OPENCV* - build/tools/convert_imageset –resize_height=256 –resize_width=256 将图像裁剪到指定尺寸,其中convert_imageset 调用了ReadImageToDatum函数,后者调用了caffe/src/util/io.cpp中的ReadImageToCVMat函数;
- 基于ImageMagick - convert -resize 256x256! 将图像裁剪到指定尺寸;
- 基于OPENCV - 采用脚本tools/extra/resize_and_crop_images.py来进行多线程图像变换,对图像进行比例地变换,再进行中心裁剪
sudo pip install git+https://github.com/Yangqing/mincepie.git
sudo apt-get install -y python-opencv
vi tools/extra/launch_resize_and_crop_images.sh # set number of clients (use num_of_cores*2); file.txt, input, and output folders
另外,网络中的图像可以在数据层定义参数来进行裁剪或者调整尺寸:
layer {
name: "data"
transform_param {
crop_size: 227
...
}
layer {
name: "data"
image_data_param {
new_height: 227
new_width: 227
...
网络训练需要:
- train_val.prototxt - 定义了网络结构、初始化参数和学习率
- solver.prototxt - 定义了优化参数的方式,训练深度网络的文件
- deploy.prototxt - 只用于testing,与train_val.prototxt基本一致,除了没有输入层、loss层
参数初始化十分重要,其主要方式有:
- gaussian - 从高斯分布 N(0,std)采样权重值
- xavier - 从uniform distribution U(-a,a)采样权重,其中 a=sqrt(3/fan_in), where fan_in is the number of incoming inputs
- MSRAFiller - 从正态分布normal distribution N(0,a)采样权重, 其中a=sqrt(2/fan_in)
网络层关于学习率的参数:
- base_lr - 初始化学习率,默认为0.01,训练时如果出现NAN,则将值调小
- lr_mult - 偏置的lr_mult一般设为2×非偏置权重的lr_mult
以LeNet为例,分别定义 lenet_train_test.prototxt, deploy.prototxt, solver.prototxt:
solver.prototxt
# 网络定义
net: "examples/mnist/lenet_train_test.prototxt"
# 每500次训练迭代进行一次validation test
test_interval: 500
# 指定validation test迭代的次数,推荐值设为 num_val_imgs / batch_size
test_iter: 100
# 训练网络的基础学习率、动量和权重衰减
base_lr: 0.01
momentum: 0.9
weight_decay: 0.0005
# 不同的学习策略
# fixed: always return base_lr.
# step: return base_lr * gamma ^ (floor(iter / step))
# exp: return base_lr * gamma ^ iter
# inv: return base_lr * (1 + gamma * iter) ^ (- power)
# multistep: similar to step but it allows non uniform steps defined by stepvalue
# poly: the effective learning rate follows a polynomial decay, to be zero by the max_iter: return base_lr (1 - iter/max_iter) ^ (power)
# sigmoid: the effective learning rate follows a sigmod decay: return base_lr * ( 1/(1 + exp(-gamma * (iter - stepsize))))
lr_policy: "step"
gamma: 0.1
stepsize: 10000 # Drop the learning rate in steps by a factor of gamma every stepsize iterations
# 每100次迭代显示一次结果
display: 100
# 最大迭代次数
max_iter: 10000
# 每5000次迭代输出一次快照,即模型训练状态和模型参数
snapshot: 5000
snapshot_prefix: "examples/mnist/lenet_multistep"
# solver mode: CPU or GPU
solver_mode: CPU
训练网络:
$CAFFE_ROOT/build/tools/caffe train -solver solver.prototxt
训练网络会输出两种类型的文件,比如:
- lenet_multistep_10000.caffemodel - 网络的权重,即用于测试的模型参数
- lenet_multistep_10000.solverstate - 如果中间训练过程中断,便于恢复训练
训练网络,并画出验证数据集上的精度或loss vs迭代的曲线:
#CHART_TYPE=[0-7]
# 0: Test accuracy vs. Iters
# 1: Test accuracy vs. Seconds
# 2: Test loss vs. Iters
# 3: Test loss vs. Seconds
# 4: Train learning rate vs. Iters
# 5: Train learning rate vs. Seconds
# 6: Train loss vs. Iters
# 7: Train loss vs. Seconds
CHART_TYPE=0
$CAFFE_ROOT/build/tools/caffe train -solver solver.prototxt 2>&1 | tee logfile.log
python $CAFFE_ROOT/tools/extra/plot_training_log.py.example $CHART_TYPE name_of_plot.png logfile.log
Dropout被用于全连接层,在forward-pass过程只激活部分权重来避免权重间的协同性,以降低过拟合. 在测试过程被忽略.
layer {
name: "fc6"
type: "InnerProduct"
bottom: "pool5"
top: "fc6"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 4096
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 1
}
}
}
layer {
name: "relu6"
type: "ReLU"
bottom: "fc6"
top: "fc6"
}
layer {
name: "drop6"
type: "Dropout"
bottom: "fc6"
top: "fc6"
dropout_param {
dropout_ratio: 0.5
}
}
估计前向传播和后向传播的时间,不更新权重:
# 计算NUMITER=50次前向和后向传播的时间,总时间以及平均时间
# 可能需要训练样本和mean.binaryproto
NUMITER=50
/path/to/caffe/build/tools/caffe time --model=train_val.prototxt -iterations $NUMITER
Linux的numactl工具可以进行内存分配管理:
numactl -i all /path/to/caffe/build/tools/caffe time --model=train_val.prototxt -iterations $NUMITER
Caffe Model Zoo给出了针对不同任务的网络模型以及模型参数,便于fine-tuning或者testing.
该部分内容基于Intel’s Caffe Github wiki. 主要有两种方式进行多节点的分布式训练:
- 模型并行
- 数据并行
模型并行是指,将模型置于不同的节点,每个节点都进行全部的数据处理;
数据并行是指,将数据块置于不同的节点,每个节点都有全部的模型参数.
对于模型中权重数较少,数据块较大时,数据并行比较使用.
混合模型和数据并行可以同时进行,对于网络层权重较少,比如卷积层采用数据并行训练,对于网络层权重较多,比如全连接层采用模型并行训练.
Intel的论文对混合方法中数据并行和模型并行间的优化平衡进行了理论分析.
结合当前比较流行的权重较少的深度网络,比如GoogleNet和ResNet,以及采用数据并行分布式训练的成功案例, 可以看出,Caffe-Intel支持数据并行计算的. 多节点分布式训练也是当前比较活跃的发展方向.
多节点网络训练对 Makefile.config进行修改:
USE_MPI := 1
# update with the path to binary MPI library
CXX := /usr/bin/mpicxx
采用多节点进行训练也比较简单:
mpirun --hostfile path/to/hostfile -n /path/to/caffe/build/tools/caffe train --solver=/path/to/solver.prototxt --param_server=mpi
其中,
- - 使用节点的数目
- hostfile - 包含了每条线节点的ip地址
solver.prototxt中指定了各节点的train.prototxt,而且每个train.prototxt需要指定到数据集的不同部分. 更多细节,参考相关材料.
重复利用prototxt中定义的网络结构,并进行两处修改如下:
- 1 修改网络数据层,以适应新数据
layer {
name: "mnist"
type: "Data"
top: "data"
top: "label"
transform_param {
scale: 0.00390625 # 1/255
}
data_param {
source: "newdata_lmdb" # 指定到新的数据集
batch_size: 64
backend: LMDB
}
}
layer {
name: "ip2-ft" # 修改网络名
type: "InnerProduct"
bottom: "ip1"
top: "ip2-ft" # 修改网络输出名
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 2 # 修改为新数据集的类别数目,这里是2
bias_filler {
type: "constant"
}
}
}
在Caffe中fine-tuning:
#From the command line on $CAFFE_ROOT
./build/tools/caffe train -solver /path/to/solver.prototxt -weights /path/to/trained_model.caffemodel
微调技巧:
- 首先学习最后网络输出层,其它层不变动
- 减小初始学习率,一般为10×或100×
- 可定义Caffe网络层的局部学习率 lr_mult
- 保持除了最后输出层或倒数第二层网络不变,以进行快速优化,即: 局部学习率lr_mult=0
- 增大最后输出层的局部学习率为10×,倒数第二层的局部学习率为5×
- 如果效果已足够好,停止,或者微调其它网络层
微调网络的特点:
- 创建了新的网络结构
- 复制初始化网络权重
- 类似于网络的训练,参考实例.
测试也被叫做推断、分类、或者打得分,可以使用Caffe提供的Python接口或者C++工具进行. C++工具不够灵活,推荐使用Python.
分类一张图片或信号或图像集,需要:
- 图片
- 网络结构
- 网络权重
模型的prototxt中应该有TEST数据层,指定了testing数据集,以测试模型表现:
/path/to/caffe/build/tools/caffe test -model /path/to/train_val.prototxt
- weights /path/to/trained_model.caffemodel -iterations <num_iter>
该实例参考了材料.
首先,在使用训练好的模型进行图片分类前,需要下载模型:
./scripts/download_model_binary.py models/bvlc_reference_caffenet
然后,下载数据集labels,来映射网络预测结果到图片类别,这里以ILSVRC2012为例:
./data/ilsvrc12/get_ilsvrc_aux.sh
最后,分类图片:
./build/examples/cpp_classification/classification.bin
models/bvlc_reference_caffenet/deploy.prototxt
models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel
data/ilsvrc12/imagenet_mean.binaryproto
data/ilsvrc12/synset_words.txt
examples/images/cat.jpg
输出结果样式:
---------- Prediction for examples/images/cat.jpg ----------
0.3134 - "n02123045 tabby, tabby cat"
0.2380 - "n02123159 tiger cat"
0.1235 - "n02124075 Egyptian cat"
0.1003 - "n02119022 red fox, Vulpes vulpes"
0.0715 - "n02127052 lynx, catamount"
网络卷积层的权重数据格式为: output_feature_maps x height x width x input_feature_maps,feature_maps也被叫做channels. Caffe的特征提取方式有两种: Python API和C++ API.
# 下载模型参数
scripts/download_model_binary.py models/bvlc_reference_caffenet
# Generate a list of the files to process
# Use the images that ship with caffe
find `pwd`/examples/images -type f -exec echo {} ; > examples/images/test.txt
# Add a 0 to the end of each line
# input data structures expect labels after each image file name
sed -i "s/$/ 0/" examples/images/test.txt
# Get the mean of trainint set to subtract it from images
./data/ilsvrc12/get_ilsvrc_aux.sh
# Copy and modify the data layer to load and resize the images:
cp examples/feature_extraction/imagenet_val.prototxt examples/images
vi examples/iamges/imagenet_val.prototxt
# 提取特征
./build/tools/extract_features.bin models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel
examples/images/imagenet_val.prototxt fc7 examples/images/features 10 lmdb
这里提取了 fc7网络层的特征图,表现的是模型的最高层特征. 同样的,也可以提取其它层的特征,比如conv5、pool3等. 最后的参数10 lmdb是最小的batch size, 提取的特征被保存在examples/images/features的LevelDB文件夹内.
Caffe提供了testing、分类、特定提取、网络定义和网络训练的Python API.
编译Caffe后需要再执行make pycaffe,成功后即可进行调用:
import sys
CAFFE_ROOT = '/path/to/caffe/' #路径要设置正确
sys.path.insert(0, CAFFE_ROOT + 'python')
import caffe
caffe.set_mode_cpu() # CPU模式
网络结构定义在train_val.prototxt或者deploy.prototxt中:
net = caffe.Net('train_val.prototxt', caffe.TRAIN)
如果指定了权重,则:
net = caffe.Net('deploy.prototxt', 'trained_model.caffemodel', caffe.TRAIN)
net中包含了数据块(net.blobs)和权重参数块(net.params),以conv1网络层为例:
- net.blobs[‘conv1’] - conv1层的输出数据,也被叫做特征图(feature maps)
- net.params[‘conv1’][0] - conv1层权重项
- net.params[‘conv1’][1] - conv1层偏置项
- net.blobs.items() - 所有网络层的数据块
这里需要安装pydot和graphviz模块:
sudo apt-get install -y GraphViz
sudo pip install pydot
利用caffe的draw_net.py脚本实现可视化:
python python/draw_net.py examples/net_surgery/deploy.prototxt train_val_net.png
open train_val_net.png
import numpy as np
# get input image and arrange it as a 4-D tensor
im = np.array(Image.open('/path/to/caffe/examples/images/cat_gray.jpg'))
im = im[np.newaxis, np.newaxis, :, :]
# resize the blob to be the size of the input image
net.blobs['data'].reshape(im.shape) # if the image input is different
# compute the blobs given the input data
net.blobs['data'].data[...] = im
im = caffe.io.load.image('/path/to/caffe/examples/images/cat_gray.jpg')
shape = net.blobs['data'].data.shape
# resize the img to be the size of the data blob
im = caffe.io.resize(im, shape[3], shape[2], shape[1])
# compute the blobs given the input data
net.blobs['data'].data[...] = im
net = caffe.Net('deploy.prototxt', 'trained_model.caffemodel', caffe.TRAIN)
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
ilsvrc_mean = 'python/caffe/imagenet/ilsvrc_2012_mean.npy'
transformer.set_mean('data', np.load(ilsvrc_mean).mean(1).mean(1))
# puts the channel as the first dimention
transformer.set_transpose('data', (2,0,1))
# (2,1,0) maps RGB to BGR for example
transformer.set_channel_swap('data', (2,1,0))
transformer.set_raw_scale('data', 255.0)
# the batch size can be changed on-the-fly
net.blobs['data'].reshape(1,3,227,227)
# load the image in the data layer
im = caffe.io.load.image('/path/to/caffe/examples/images/cat_gray.jpg')
# transform the image and store it in the net.blob
net.blobs['data'].data[...] = transformer.preprocess('data', im)
图像可视化:
import matplotlib.pyplot as plt
plt.imshow(im)
输入图像的网络预测:
# assumes that images are loaded
prediction = net.forward()
print 'predicted class:', prediction['prob'].argmax()
也可以统计forward propagation的时间(不包括数据处理的时间):
timeit net.forward()
Caffe还提供了对多个输入数据同时进行数据变换和分类的Python API - net.Classifier, 可以取代net.Net和caffe.io.Transformer.
im1 = caffe.io.load.images('/path/to/caffe/examples/images/cat.jpg')
im2 = caffe.io.load.images('/path/to/caffe/examples/images/fish-bike.jpg')
imgs = [im1, im2]
ilsvrc_mean = '/path/to/caffe/python/caffe/imagenet/ilsvrc_2012_mean.npy'
net = caffe.Classifier('deploy.prototxt', 'trained_model.caffemodel',
mean=np.load(ilsvrc_mean).mean(1).mean(1),
channel_swap=(2,1,0),
raw_scale=255,
image_dims=(256, 256))
prediction = net.predict(imgs) # predict takes any number of images
print 'predicted classes:', prediction[0].argmax(), prediction[1].argmax()
对于多张图片的文件夹,只需修改 imgs部分:
IMAGES_FOLDER = '/path/to/folder/w/images/'
import os
images = os.listdir(IMAGES_FOLDER)
imgs = [ caffe.io.load_image(IMAGES_FOLDER + im) for im in images ]
plt.plot(prediction[0]) # 以bar chart的形式可视化所有类别的概率
timeit net.predict([im1]) # 时间统计
timeit net.predict([im1], oversample=0)
以fc7层为例,
# Retrieve details of the network's layers
[(k, v.data.shape) for k, v in net.blobs.items()]
# Retrieve weights of the network's layers
[(k, v[0].data.shape) for k, v in net.params.items()]
# Retrieve the features in the last fully connected layer
# prior to outputting class probabilities
feat = net.blobs['fc7'].data[4]
# Retrieve size/dimensions of the array
feat.shape
# Assumes that the "net = caffe.Classifier" module has been called
# and data has been formatted as in the example above
# Take an array of shape (n, height, width) or (n, height, width, channels)
# and visualize each (height, width) section in a grid
# of size approx. sqrt(n) by sqrt(n)
def vis_square(data, padsize=1, padval=0):
# values between 0 and 1
data -= data.min()
data /= data.max()
# force the number of filters to be square
n = int(np.ceil(np.sqrt(data.shape[0])))
padding = ((0, n ** 2 - data.shape[0]), (0, padsize), (0, padsize)) + ((0, 0),) * (data.ndim - 3)
data = np.pad(data, padding, mode='constant', constant_values=(padval, padval))
# tile the filters into an image
data = data.reshape((n, n) + data.shape[1:]).transpose((0, 2, 1, 3) + tuple(range(4, data.ndim + 1)))
data = data.reshape((n * data.shape[1], n * data.shape[3]) + data.shape[4:])
plt.imshow(data)
plt.rcParams['figure.figsize'] = (25.0, 20.0)
# visualize the weights after the 1st conv layer
net.params['conv1'][0].data.shape
filters = net.params['conv1'][0].data
vis_square(filters.transpose(0, 2, 3, 1))
# visualize the feature maps after 1st conv layer
net.blobs['conv1'].data.shape
feat = net.blobs['conv1'].data[0,:96]
vis_square(feat, padval=1)
# visualize the weights after the 2nd conv layer
net.blobs['conv2'].data.shape
feat = net.blobs['conv2'].data[0,:96]
vis_square(feat, padval=1)
# visualize the weights after the 2nd pool layer
net.blobs['pool2'].data.shape
feat = net.blobs['pool2'].data[0,:256] # change 256 data = np.pad(data, padding, mode='constanto number of pool outputs
vis_square(feat, padval=1)
# Visualize the neuron activations for the 2nd fully-connected layer
net.blobs['ip2'].data.shape
feat = net.blobs['ip2'].data[0]
plt.plot(feat.flat)
plt.legend()
plt.show()
from caffe import layers as L
from caffe import params as P
def lenet(lmdb, batch_size):
# auto generated LeNet
n = caffe.NetSpec()
n.data, n.label = L.Data(batch_size=batch_size, backend=P.Data.LMDB, source=lmdb, transform_param=dict(scale=1./255), ntop=2)
n.conv1 = L.Convolution(n.data, kernel_size=5, num_output=20, weight_filler=dict(type='xavier'))
n.pool1 = L.Pooling(n.conv1, kernel_size=2, stride=2, pool=P.Pooling.MAX)
n.conv2 = L.Convolution(n.pool1, kernel_size=5, num_output=50, weight_filler=dict(type='xavier'))
n.pool2 = L.Pooling(n.conv2, kernel_size=2, stride=2, pool=P.Pooling.MAX)
n.ip1 = L.InnerProduct(n.pool2, num_output=500, weight_filler=dict(type='xavier'))
n.relu1 = L.ReLU(n.ip1, in_place=True)
n.ip2 = L.InnerProduct(n.relu1, num_output=10, weight_filler=dict(type='xavier'))
n.loss = L.SoftmaxWithLoss(n.ip2, n.label)
return n.to_proto()
with open('examples/mnist/lenet_auto_train.prototxt', 'w') as f:
f.write(str(lenet('examples/mnist/mnist_train_lmdb', 64)))
with open('examples/mnist/lenet_auto_test.prototxt', 'w') as f:
f.write(str(lenet('examples/mnist/mnist_test_lmdb', 100)))
生成的prototxt文件内容如下:
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
transform_param {
scale: 0.00392156862745
}
data_param {
source: "examples/mnist/mnist_train_lmdb"
batch_size: 64
backend: LMDB
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
convolution_param {
num_output: 20
kernel_size: 5
weight_filler {
type: "xavier"
}
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
convolution_param {
num_output: 50
kernel_size: 5
weight_filler {
type: "xavier"
}
}
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "ip1"
type: "InnerProduct"
bottom: "pool2"
top: "ip1"
inner_product_param {
num_output: 500
weight_filler {
type: "xavier"
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "ip1"
top: "ip1"
}
layer {
name: "ip2"
type: "InnerProduct"
bottom: "ip1"
top: "ip2"
inner_product_param {
num_output: 10
weight_filler {
type: "xavier"
}
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "ip2"
bottom: "label"
top: "loss"
}
solver = caffe.get_solver('models/bvlc_reference_caffenet/solver.prototxt')
net = caffe.Net('train_val.prototxt', caffe.TRAIN)
solver.net.forward() # train net
solver.test_nets[0].forward() # test net (there can be more than one)
solver.net.backward() # 计算梯度
# data gradients
net.blobs['conv1'].diff
# weight gradients
net.params['conv1'][0].diff
# biases gradients
net.params['conv1'][1].diff
solver.step(1) # 进行一次迭代,包括一次forward propagation 和一次backward propagation
solver.step() # 进行solver.prototxt中定义的max_iter次迭代
Debugging是可选部分,只针对Caffe开发者.
Debugging有用的小技巧:
- 移除随机性 remove randomness
- 对比caffemodels compare caffemodels
- 利用Caffe的调试信息 use Caffe’s debug info
移除随机性有利于重用和输出. 随机性出现在很多阶段,如
- 权重的随机初始化,一般是从概率分布在进行初始化,比如Gaussion分布
- 输入图像的水平随机翻转、随机裁剪以及图像顺序的随机打乱等随机性
- dropout层随机训练部分权重,忽略其它权重
一中解决方案是使用seed,即在solver.prototxt中加入以下内容:
# pick some value for random_seed that is greater or equal to 1, for example:
random_seed: 42
保证每次都是相同的’random’值. 不过在不同的机器上,seed会产生不同的值.
针对多台机器,一种鲁棒的方式是:
- 采用相同的打乱顺序的图片进行数据准备,即每次实验中不再打乱顺序
- train.prototxt的 ImageDataLayer层中,定义 transform_param不进行图片裁剪和镜像:
layer {
name: "data"
type: "ImageData"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
# mirror: true
# crop_size: 227
mean_value: 104
mean_value: 117
mean_value: 123
}
image_data_param {
source: "/path/to/file/train.txt"
batch_size: 32
new_height: 224
new_width: 224
}
}
为了对比两个caffemodels,下面的脚本统计了两个caffemodels的所有权重间的差异之和:
# Intel Corporation
# Author: Ravi Panchumarthy
import sys, os, argparse, time
import pdb
import numpy as np
def get_args():
parser = argparse.ArgumentParser('Compare weights of two caffe models')
parser.add_argument('-m1', dest='modelFile1', type=str, required=True,
help='Caffe model weights file to compare')
parser.add_argument('-m2', dest='modelFile2', type=str, required=True,
help='Caffe model weights file to compare aganist')
parser.add_argument('-n', dest='netFile', type=str, required=True,
help='Network prototxt file associated with model')
return parser.parse_args()
if __name__ == "__main__":
import caffe
args = get_args()
net = caffe.Net(args.netFile, args.modelFile1, caffe.TRAIN)
net2compare = caffe.Net(args.netFile, args.modelFile2, caffe.TRAIN)
wt_sumOfAbsDiffByName = dict()
bias_sumOfAbsDiffByName = dict()
for name, blobs in net.params.iteritems():
wt_diffTensor = np.subtract(net.params[name][0].data, net2compare.params[name][0].data)
wt_absDiffTensor = np.absolute(wt_diffTensor)
wt_sumOfAbsDiff = wt_absDiffTensor.sum()
wt_sumOfAbsDiffByName.update({name : wt_sumOfAbsDiff})
# if args.layerDebug == 1:
# print("%s : %s" % (name,wt_sumOfAbsDiff))
bias_diffTensor = np.subtract(net.params[name][1].data, net2compare.params[name][1].data)
bias_absDiffTensor = np.absolute(bias_diffTensor)
bias_sumOfAbsDiff = bias_absDiffTensor.sum()
bias_sumOfAbsDiffByName.update({name : bias_sumOfAbsDiff})
print("\nThe sum of absolute difference of all layer's weight is : %s" % sum(wt_sumOfAbsDiffByName.values()))
print("The sum of absolute difference of all layer's bias is : %s" % sum(bias_sumOfAbsDiffByName.values()))
finalDiffVal = sum(wt_sumOfAbsDiffByName.values())+ sum(bias_sumOfAbsDiffByName.values())
print("The sum of absolute difference of all layers weight's and bias's is : %s" % finalDiffVal )
在Makefile.config中取消注释 DEBUG := 1,以进一步的debugging:
gdb /path/to/caffe/build/caffe
gdb开始后,运行命令:
run train -solver /path/to/solver.prototxt
# 准备数据集
cd $CAFFE_ROOT
./data/mnist/get_mnist.sh # downloads MNIST dataset
./examples/mnist/create_mnist.sh # creates dataset in LMDB format
# 训练模型
# Reduce the number of iterations from 10K to 1K to quickly run through this example
sed -i 's/max_iter: 10000/max_iter: 1000/g' examples/mnist/lenet_solver.prototxt
./build/tools/caffe train -solver examples/mnist/lenet_solver.prototxt
# 估计forward propagation和backward propagation的时间
./build/tools/caffe time --model=examples/mnist/lenet_train_test.prototxt -iterations 50 # runs on CPU
# 测试模型
# the file with the model should have a 'phase: TEST'
./build/tools/caffe test -model examples/mnist/lenet_train_test.prototxt
-weights examples/mnist/lenet_iter_1000.caffemodel -iterations 50
Kaggle下载Dogs vs Cats Dataset. 解压 dogvscat.zip, 并运行 dogvscat.sh.
#!/usr/bin/env sh
CAFFE_ROOT=/path/to/caffe
mkdir dogvscat
DOG_VS_CAT_FOLDER=/path/to/dogvscat
cd $DOG_VS_CAT_FOLDER
## Download datasets (requires first a login)
#https://www.kaggle.com/c/dogs-vs-cats/download/train.zip
#https://www.kaggle.com/c/dogs-vs-cats/download/test1.zip
# Unzip train and test data
sudo apt-get -y install unzip
unzip train.zip -d .
unzip test1.zip -d .
# Format data
python create_label_file.py # creates 2 text files with labels for training and validation
./build_datasets.sh # build lmdbs
# Download ImageNet pretrained weights (takes ~20 min)
$CAFFE_ROOT/scripts/download_model_binary.py $CAFFE_ROOT/models/bvlc_reference_caffenet
# Fine-tune weights in the AlexNet architecture (takes ~100 min)
$CAFFE_ROOT/build/tools/caffe train -solver $DOG_VS_CAT_FOLDER/dogvscat_solver.prototxt
-weights $CAFFE_ROOT/models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel
# Classify test dataset
cd $DOGVSCAT_FOLDER
python convert_binaryproto2npy.py
python dogvscat_classify.py # Returns prediction.txt (takes ~30 min)
# A better approach is to train five AlexNets w/init parameters from the same distribution,
# fine-tune those five, and compute the average of the five networks
解压voc2012.zip,运行 voc2012.sh,以训练AlexNet.
#!/usr/bin/env sh
# Copy and unzip voc2012.zip (it contains this file) then run this file. But first
# change paths in: voc2012.sh; build_datasets.sh; solvers/*; nets/*; classify.py
# As you run various files, you can ignore the following error if it shows up:
# libdc1394 error: Failed to initialize libdc1394
# set Caffe root directory
CAFFE_ROOT=$CAFFE_ROOT
VOC=/path/to/voc2012
chmod 700 *.sh
# Download datasets
# Details: http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html#devkit
if [ ! -f VOCtrainval_11-May-2012.tar ]; then
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
fi
# VOCtraival_11-May-2012.tar contains the VOC folder with:
# JPGImages: all jpg images
# Annotations: objects and corresponding bounding box/pose/truncated/occluded per jpg
# ImageSets: breaks the images by the type of task they are used for
# SegmentationClass and SegmentationObject: segmented images (duplicate directories)
tar -xvf VOCtrainval_11-May-2012.tar
# Run Python scripts to create labeled text files
python create_labeled_txt_file.py
# Execute shell script to create training and validation lmdbs
# Note that lmdbs directories w/the same name cannot exist prior to creating them
./build_datasets.sh
# Execute following command to download caffenet pre-trained weights (takes ~20 min)
# if weights exist already then the command is ignored
$CAFFE_ROOT/scripts/download_model_binary.py $CAFFE_ROOT/models/bvlc_reference_caffenet
# Fine-tune weights in the AlexNet architecture (takes ~60 min)
# you can also chose one of six solvers: pascal_solver[1-6].prototxt
$CAFFE_ROOT/build/tools/caffe train -solver $VOC/solvers/voc2012_solver.prototxt
-weights $CAFFE_ROOT/models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel
# The lines below are not really needed; they served as examples on how to do some tasks
# Test against voc2012_val_lmbd dataset (name of lmdb is the model under PHASE: test)
$CAFFE_ROOT/build/tools/caffe test -model $VOC/nets/voc2012_train_val_ft678.prototxt
-weights $VOC/weights_iter_5000.caffemodel -iterations 116
# Classify validation dataset: returns a file w/the labels of the val dataset
# but it doesn't report accuracy (that would require some adjusting of the code)
python convert_binaryproto2npy.py
mkdir results
python cls_confidence.py
python average_precision.py
VOC相关信息:
- PASCAL VOC datasets
- 20 classes
- Training: 5,717 images, 13,609 objects
- Validation: 5,823 images, 13,841 objects
- Testing: 10,991 images