字符识别实战项目,前面章节已经都有涉及,这里做一个总结,看看一个完整的项目从数据制作、训练
、识别、分析的全过程。
这里就使用lmdb格式,其他格式参考 【caffe源码研究】第二章:使用篇(1): 制作数据集 中的方法即可。
我们的数据如下,每个traindata和testdata里面都有10个文件夹,命名为0-9,分别对应数字0-9. 下方是目录结构部分显示。
/home/fangjin/CAFFE/DATA
│ list.txt
│
├─testData
│ ├─0
│ │ 0-3-033OJJ7KZA.jpg
│ │ 0-5-CV7UTRECKB.jpg
│ │
│ ├─1
│ │ 1-3-01VZAOCIPC.jpg
│ │ 1-3-09GBY203S5.jpg
│ ...
│
└─trainData
│ train.txt
│
├─0
│ 0-3-00DUJ0RVR9.jpg
│ 0-3-0AWLKVU51V.jpg
│
├─1
│ 1-7-E3Y0H6X1TR.jpg
│ 1-7-E5DLYZ289T.jpg
...
先制作一个txt文件,包含数据的路径和标签,格式如下
trainData/0/0-3-00DUJ0RVR9.jpg 0
trainData/0/0-3-0AWLKVU51V.jpg 0
trainData/0/0-3-0DS9V90EJ6.jpg 0
trainData/0/0-3-0DUO09DFPD.jpg 0
trainData/0/0-3-0F1UTHN9O9.jpg 0
trainData/0/0-3-0KBIEMMCYC.jpg 0
trainData/0/0-3-0QPBZLGTF7.jpg 0
trainData/0/0-3-0R5LZ0FG2H.jpg 0
trainData/0/0-3-0T1RBO2IMH.jpg 0
trainData/0/0-3-0TTN1FAFZY.jpg 0
写个简单的python脚本
import os
rootPath = './'
f = open(rootPath+'train.txt','w')
for i in range(10):
path = 'trainData/' + str(i)
lists = os.listdir(rootPath + path)
for listfile in lists:
if listfile != 'Thumbs.db':
f.writelines([path,'/',listfile,' ',str(i),'\n'])
f.close()
f = open(rootPath+'test.txt','w')
for i in range(10):
path = 'testData/' + str(i)
lists = os.listdir(rootPath + path)
for listfile in lists:
if listfile != 'Thumbs.db':
f.writelines([path,'/',listfile,' ',str(i),'\n'])
f.close()
即可以生成train.txt和test.txt。
使用接口convert_imageset
进行转换。
shell脚本如下
TOOLS=/home/users/fangjin/caffe/build/tools
ESIZE_HEIGHT=32
RESIZE_WIDTH=32
TRAIN_DATA_ROOT=/home/users/fangjin/test/number_data/
echo "Creating train lmdb..."
GLOG_logtostderr=1 $TOOLS/convert_imageset \
--resize_height=32 \
--resize_width=32 \
--shuffle \
$TRAIN_DATA_ROOT \
train.txt \
number_train_lmdb
echo "Creating test lmdb..."
GLOG_logtostderr=1 $TOOLS/convert_imageset \
--resize_height=32 \
--resize_width=32 \
--shuffle \
$TRAIN_DATA_ROOT \
test.txt \
number_test_lmdb #输出
参数说明
$TRAIN_DATA_ROOT
这个参数指的是图片生成txt文件中的相对主目录。也就是说$TRAIN_DATA_ROOT
+ txt中路径才是完整路径。如果报错一般都是路径错误,每次重新运行都需要先删除原来的lmdb数据。
脚本执行完会生成两个文件夹,存储的是lmdb数据。
drwxr--r-- 2 fangjin fangjin 4096 Dec 21 12:01 number_test_lmdb
drwxr--r-- 2 fangjin fangjin 4096 Dec 21 12:01 number_train_lmdb
在caffe可以先计算均值,然后对所有图像去除均值,现在展示如何使用caffe自己编译的工具计算均值。
使用说明:
compute_image_mean: Compute the mean_image of a set of images given by a leveldb/lmdb
Usage:
compute_image_mean [FLAGS] INPUT_DB [OUTPUT_FILE]
Flags from tools/compute_image_mean.cpp:
-backend (The backend {leveldb, lmdb} containing the images) type: string
default: "lmdb"
写脚本compute_image_mean.sh
#!/usr/bin/env sh
set -e
CAFFETOOL=/home/users/fangjin/caffe/build/tools
${CAFFETOOL}/compute_image_mean number_train_lmdb image_mean.binaryproto
就会生成均值文件image_mean.binaryproto
。
使用方式是将Data层的transform_param
添加一个mean_file
,训练集和验证集都要添加
transform_param {
scale: 0.00390625
mean_file: "image_mean.binaryproto"
}
然后同样的方式训练。
train_lenet.sh
#!/usr/bin/env sh
set -e
CAFFETOOL=/home/users/fangjin/caffe/build/tools
GLOG_logtostderr=1 $CAFFETOOL/caffe train \
--solver=lenet_solver.prototxt 2>&1 | tee log_1st.txt
lenet_solver.prototxt
# The train/test net protocol buffer definition
net: "lenet_train_test.prototxt"
# test_iter specifies how many forward passes the test should carry out.
# In the case of MNIST, we have test batch size 100 and 100 test iterations,
# covering the full 10,000 testing images.
test_iter: 100
# Carry out testing every 500 training iterations.
test_interval: 500
# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.01
momentum: 0.9
weight_decay: 0.0005
# The learning rate policy
lr_policy: "inv"
gamma: 0.0001
power: 0.75
# Display every 100 iterations
display: 100
# The maximum number of iterations
max_iter: 10000
# snapshot intermediate results
snapshot: 5000
snapshot_prefix: "lenet"
# solver mode: CPU or GPU
solver_mode: CPU
lenet_train_test.prototxt
name: "LeNet"
layer {
name: "mnist"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
scale: 0.00390625
mean_file: "image_mean.binaryproto"
}
data_param {
source: "number_train_lmdb"
batch_size: 64
backend: LMDB
}
}
layer {
name: "mnist"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
scale: 0.00390625
mean_file: "image_mean.binaryproto"
}
data_param {
source: "number_test_lmdb"
batch_size: 100
backend: LMDB
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 20
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 50
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "ip1"
type: "InnerProduct"
bottom: "pool2"
top: "ip1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 500
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "ip1"
top: "ip1"
}
layer {
name: "ip2"
type: "InnerProduct"
bottom: "ip1"
top: "ip2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 10
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "ip2"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "ip2"
bottom: "label"
top: "loss"
}
将lenet_train_test.prototxt
内容贴到 http://ethereon.github.io/netscope/#/editor 中可以可视化看到网络结构,如下图
也可以使用python提供的接口来实现
python /home/users/fangjin/caffe/python/draw_net.py lenet_train_test.prototxt lenet_train_test_net.png
执行shell脚本就可以了
sh train_lenet.sh
会生成最终的model文件lenet_iter_10000.caffemodel
和状态文件lenet_iter_10000.solverstate
,由于配置文件中配置的每5000次保存一次结果,所以也会生成lenet_iter_5000.caffemodel
和状态文件lenet_iter_5000.solverstate
可以用测试集测试一下准确率。可以直接使用上面的测试集,也可以自己制作一个新的测试集,把lenet_train_test.prototxt
文件中的TEST对应的数据换成新的测试集。
#!/usr/bin/env sh
set -e
CAFFETOOL=/home/users/fangjin/caffe/build/tools
GLOG_logtostderr=1 $CAFFETOOL/caffe test \
--model=lenet_train_test.prototxt
--weights=lenet_iter_10000.caffemodel
--iterations=100
输出测试结果
I1222 10:31:07.233053 27436 caffe.cpp:308] Batch 45, accuracy = 0.97
I1222 10:31:07.233103 27436 caffe.cpp:308] Batch 45, loss = 0.165972
I1222 10:31:07.301012 27436 caffe.cpp:308] Batch 46, accuracy = 0.98
I1222 10:31:07.301057 27436 caffe.cpp:308] Batch 46, loss = 0.126905
I1222 10:31:07.369249 27436 caffe.cpp:308] Batch 47, accuracy = 0.97
I1222 10:31:07.369297 27436 caffe.cpp:308] Batch 47, loss = 0.161663
I1222 10:31:07.437602 27436 caffe.cpp:308] Batch 48, accuracy = 0.99
I1222 10:31:07.437649 27436 caffe.cpp:308] Batch 48, loss = 0.0438317
I1222 10:31:07.505509 27436 caffe.cpp:308] Batch 49, accuracy = 0.98
I1222 10:31:07.505556 27436 caffe.cpp:308] Batch 49, loss = 0.0786786
I1222 10:31:07.505565 27436 caffe.cpp:313] Loss: 0.134281
I1222 10:31:07.505623 27436 caffe.cpp:325] accuracy = 0.9752
I1222 10:31:07.505653 27436 caffe.cpp:325] loss = 0.134281 (* 1 = 0.134281 loss)
预测过程指的是给一张新的图片,不知道分类的情况下预测分类。
这里分C++接口和python接口介绍。
先注意,预测阶段的网络和训练阶段的网络结构有细微差异,主要体现在输入和输出上,所以先写预测的网络结构lenet_deploy.prototxt
name: "LeNet"
layer {
name: "data"
type: "Input"
top: "data"
input_param { shape: { dim: 64 dim: 3 dim: 32 dim: 32 } }
transform_param {
scale: 0.00390625
mean_file: "image_mean.binaryproto"
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 20
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 50
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "ip1"
type: "InnerProduct"
bottom: "pool2"
top: "ip1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 500
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "ip1"
top: "ip1"
}
layer {
name: "ip2"
type: "InnerProduct"
bottom: "ip1"
top: "ip2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 10
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "prob"
type: "Softmax"
bottom: "ip2"
top: "prob"
}
注意看data层的区别。
在输出阶段,把Accuracy和SoftmaxWithLoss两层去掉了,改成Softmax层。
见【caffe源码研究】第二章:使用篇(3) : C++接口
见 【caffe源码研究】第二章:使用篇(4) : python接口
见 【caffe源码研究】第二章:使用篇(5) : 模型可视化
见 【caffe源码研究】第二章:使用篇(6) : 训练过程分析工具