前情提要:在网上找了一类验证码图片,对验证码进行处理后分割得到其上的汉字。在此基础上用alexnet对分割得到的汉字训练识别,测试识别的准确率。由于找的验证码数量不够,所以仿照原验证码类型生成了一波验证码数据。
(先把整个流程纲要和代码部分贴出来,后续的所有截图等我这两天整理好再补充上来。) ----2018.3.19
本文结构:
1.数据准备、制作lmdb数据集 2.计算均值
3.编写网络结构文件 4.编写solver求解文件
5.训练 6.编写deploy文件进行测试
7.可能出现的问题
1. 数据准备、制作lmdb数据集
保留一部分原图作为测试集,剩下的图按照训练集:验证集=5:1的数量分在两个文件夹内。生成两个标签文件train.txt和val.txt。标签文件内容编写格式为:图片名称.jpg 类别
训练集和验证集可以用原图掺杂自己生成的图,测试集要用原图。
复制并修改caffe中自带的文件(.../caffe-master/examples/imagenet/create_imagenet.sh)将图片数据转换为lmdb格式的数据:
#!/usr/bin/env sh
# Create the imagenet lmdb inputs
# N.B. set the path to the imagenet train + val data dirs
set -e
EXAMPLE=examples/imagenet #EXAMPLE改成生成的lmdb文件要存放的路径
DATA=data/ilsvrc12 #DATA改成标签文件存放的路径
TOOLS=build/tools #工具目录最好写绝对路径,我的是/home/hp/software/caffe-master/build/tools
TRAIN_DATA_ROOT=/path/to/imagenet/train/ #改成训练集路径
VAL_DATA_ROOT=/path/to/imagenet/val/ #改成测试集路径
#一定要将图片归一化,否则计算均值时会报错
# 如果图片已经被归一化成相同大小,则RESIZE写false。如果没有,则把RESIZE写true。
RESIZE=false
if $RESIZE; then
RESIZE_HEIGHT=256 #resize的高度和宽度可自己修改
RESIZE_WIDTH=256
else
RESIZE_HEIGHT=0
RESIZE_WIDTH=0
fi
if [ ! -d "$TRAIN_DATA_ROOT" ]; then
echo "Error: TRAIN_DATA_ROOT is not a path to a directory: $TRAIN_DATA_ROOT"
echo "Set the TRAIN_DATA_ROOT variable in create_imagenet.sh to the path" \
"where the ImageNet training data is stored."
exit 1
fi
if [ ! -d "$VAL_DATA_ROOT" ]; then
echo "Error: VAL_DATA_ROOT is not a path to a directory: $VAL_DATA_ROOT"
echo "Set the VAL_DATA_ROOT variable in create_imagenet.sh to the path" \
"where the ImageNet validation data is stored."
exit 1
fi
echo "Creating train lmdb..."
GLOG_logtostderr=1 $TOOLS/convert_imageset \
--resize_height=$RESIZE_HEIGHT \
--resize_width=$RESIZE_WIDTH \
--shuffle \
$TRAIN_DATA_ROOT \
$DATA/train.txt \
$EXAMPLE/ilsvrc12_train_lmdb #lmdb数据命名可修改
echo "Creating val lmdb..."
GLOG_logtostderr=1 $TOOLS/convert_imageset \
--resize_height=$RESIZE_HEIGHT \
--resize_width=$RESIZE_WIDTH \
--shuffle \
$VAL_DATA_ROOT \
$DATA/val.txt \
$EXAMPLE/ilsvrc12_val_lmdb
echo "Done."
进入改好的create文件目录,在ubuntu终端总敲入sh create_imagenet.sh,然后就可以静等格式转换完成啦。
2.生成均值文件
复制并修改caffe中自带的文件(.../caffe-master/examples/imagenet/make_imagenet_mean.sh),主要是更改路径
#!/usr/bin/env sh
# Compute the mean image from the imagenet training lmdb
# N.B. this is available in data/ilsvrc12
EXAMPLE=examples/imagenet #改为训练集lmdb所在路径
DATA=data/ilsvrc12 #改为均值文件生成路径
TOOLS=build/tools #最好改为绝对路径,我的是/home/hp/software/caffe-master/build/tools
$TOOLS/compute_image_mean $EXAMPLE/ilsvrc12_train_lmdb \
$DATA/imagenet_mean.binaryproto
echo "Done."
进入改好的文件所在目录,在终端敲入sh make_imagenet_mean.sh,显示“Done”就发现均值文件已经生成啦
生成的均值文件是二进制格式,如果要转换为python格式:复制如下代码,新建convert_mean.py文件,按标亮的Usage方法进行使用就好啦~
#!/usr/bin/env python
import numpy as np
import os
import sys
import glob
import time,datetime
import scipy.io as scio
caffe_root='/home/data/caffe/'
import sys
sys.path.insert(0,caffe_root+'python')
import caffe
if len(sys.argv)!=3:
print "Usage: python convert_mean.py mean.binaryproto mean.npy"
sys.exit()
#python convert_mean.py mean.binaryproto mean.npy
blob = caffe.proto.caffe_pb2.BlobProto()
bin_mean = open( sys.argv[1] , 'rb' ).read()
blob.ParseFromString(bin_mean)
arr = np.array( caffe.io.blobproto_to_array(blob) )
npy_mean = arr[0]
np.save( sys.argv[2] , npy_mean )
3. 编写网络结构文件
新建alexnet_trainval.prototxt文件,复制并修改如下代码:
name: "AlexNet"
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mean_file: ".../mean.binaryproto" #改为均值文件所在路径,均值文件的计算见2
}
data_param {
source: ".../train_lmdb" #改为...
batch_size: 1000
backend: LMDB
}
}
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
mean_file: ".../mean.binaryproto" #改为...
}
data_param {
source: ".../val_lmdb" #改为...
batch_size: 500
backend: LMDB
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_size: 11
stride: 4
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "norm1"
type: "LRN"
bottom: "conv1"
top: "norm1"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "norm1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad: 2
kernel_size: 5
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 1
}
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "norm2"
type: "LRN"
bottom: "conv2"
top: "norm2"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "pool2"
type: "Pooling"
bottom: "norm2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "conv3"
type: "Convolution"
bottom: "pool2"
top: "conv3"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu3"
type: "ReLU"
bottom: "conv3"
top: "conv3"
}
layer {
name: "conv4"
type: "Convolution"
bottom: "conv3"
top: "conv4"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 1
}
}
}
layer {
name: "relu4"
type: "ReLU"
bottom: "conv4"
top: "conv4"
}
layer {
name: "conv5"
type: "Convolution"
bottom: "conv4"
top: "conv5"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 1
}
}
}
layer {
name: "relu5"
type: "ReLU"
bottom: "conv5"
top: "conv5"
}
layer {
name: "pool5"
type: "Pooling"
bottom: "conv5"
top: "pool5"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "fc6"
type: "InnerProduct"
bottom: "pool5"
top: "re-fc6"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 4096
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 1
}
}
}
layer {
name: "re-relu6"
type: "ReLU"
bottom: "re-fc6"
top: "re-fc6"
}
layer {
name: "re-drop6"
type: "Dropout"
bottom: "re-fc6"
top: "re-fc6"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc7"
type: "InnerProduct"
bottom: "re-fc6"
top: "re-fc7"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 4096
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 1
}
}
}
layer {
name: "re-relu7"
type: "ReLU"
bottom: "re-fc7"
top: "re-fc7"
}
layer {
name: "re-drop7"
type: "Dropout"
bottom: "re-fc7"
top: "re-fc7"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc8"
type: "InnerProduct"
bottom: "re-fc7"
top: "re-fc8"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 4099 #你有多少个类要分,这里就改成多少!!!
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "re-fc8"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "re-fc8"
bottom: "label"
top: "loss"
}
4. 编写solver求解文件
复制并修改caffe中自带的文件(.../caffe-master/examples/mnist/lenet_solver.prototxt)或者直接复制并修改我下面的代码也可以。命名为alexnet_solver.prototxt。注意net后面的路径是上一步新建的alexnet_trainval.prototxt文件的路径。其它要修改的见注释。
net: "examples/mnist/lenet_train_test.prototxt" #改为自己网络结构文件的绝对路径
# test_iter specifies how many forward passes the test should carry out.
test_iter: 100 #验证测试的迭代次数=验证集/验证集的batchsize
test_interval: 500 #将所有的训练集过一遍后,进行一次验证测试。验证测试间隔=训练集/训练集batchsize
# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.01 #初始时,可将学习率设为0.01,观察损失后可对学习率进行调整。如果要加载其他模型训练的话,一般学习率调成0.001就好。
momentum: 0.9
weight_decay: 0.0005
# The learning rate policy
lr_policy: "inv"
gamma: 0.0001
power: 0.75
display: 100 #每经过多少次迭代在终端上展示一下
max_iter: 10000 #max_iter/test_interval=完整训练的次数
snapshot: 5000 #快照间隔,每多少次迭代后保存模型和解状态,如果中止训练,后续训练可从快照中恢复
snapshot_prefix: "examples/mnist/lenet" #模型和解状态的保存路径
# solver mode: CPU or GPU
solver_mode: GPU
深度学习中的epoch、iteration和batch size:
batchsize:每次训练在训练集中取batchsize个样本进行训练。
iteration:一个iteration等于使用batchsize个样本训练一次。
epoch:一个epoch等于使用训练集中的全部样本训练一次。
5. 训练
新建train_alexnet.sh,复制粘贴如下代码并修改:
#!/bin/bash
set -e
/home/hp/software/caffe-master/build/tools/caffe train
--solver=/home/hp/software/caffe-master/data/luosainan/yidun/alexnet_solver.prototxt
--gpu=1 --weights=/home/hp/software/caffe-master/data/wp/output1/_iter_18783.caffemodel
echo "Done"
--gpu是选择训练所用的gpu。--weights是加载已有模型训练,没有的话可以去掉。
改好后在终端敲入sh train_alexnet.sh即可开始训练啦,恭喜你!
更多更详细的caffe命令及其参数解析:caffe命令及其参数解析
6. 编写deploy文件
deploy文件相当于把网络结构又重写了一遍,只是把输入层改了改。
新建alexnet_deploy.prototxt文件,复制并修改如下代码:
第一个dim ——表示对待识别样本进行数据增广的数量,该值的大小可自行定义。但一般会进行5次crop,将整幅图像分为多个flip输入到网络进行识别。如果相对整幅图像进行识别而不进行图像数据增广,可将该值设置为1。注意dim:108 dim:108这两个地方是你的输入图像的长宽,测试图像和训练的图像应当具有相同的大小。
name: "AlexNet"
layer {
name: "data"
type: "Input"
top: "data"
input_param { shape: { dim: 1 dim: 3 dim: 108 dim: 108 } } #1. 2.处理图像的通道数,RGB图设3,灰度图设1 3.图像长 4.图像宽
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_size: 11
stride: 4
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "norm1"
type: "LRN"
bottom: "conv1"
top: "norm1"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "norm1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad: 2
kernel_size: 5
group: 2
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "norm2"
type: "LRN"
bottom: "conv2"
top: "norm2"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "pool2"
type: "Pooling"
bottom: "norm2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "conv3"
type: "Convolution"
bottom: "pool2"
top: "conv3"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
}
}
layer {
name: "relu3"
type: "ReLU"
bottom: "conv3"
top: "conv3"
}
layer {
name: "conv4"
type: "Convolution"
bottom: "conv3"
top: "conv4"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
group: 2
}
}
layer {
name: "relu4"
type: "ReLU"
bottom: "conv4"
top: "conv4"
}
layer {
name: "conv5"
type: "Convolution"
bottom: "conv4"
top: "conv5"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
group: 2
}
}
layer {
name: "relu5"
type: "ReLU"
bottom: "conv5"
top: "conv5"
}
layer {
name: "pool5"
type: "Pooling"
bottom: "conv5"
top: "pool5"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "fc6"
type: "InnerProduct"
bottom: "pool5"
top: "re-fc6"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 4096
}
}
layer {
name: "re-relu6"
type: "ReLU"
bottom: "re-fc6"
top: "re-fc6"
}
layer {
name: "re-drop6"
type: "Dropout"
bottom: "re-fc6"
top: "re-fc6"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc7"
type: "InnerProduct"
bottom: "re-fc6"
top: "re-fc7"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 4096
}
}
layer {
name: "re-relu7"
type: "ReLU"
bottom: "re-fc7"
top: "re-fc7"
}
layer {
name: "re-drop7"
type: "Dropout"
bottom: "re-fc7"
top: "re-fc7"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc8"
type: "InnerProduct"
bottom: "re-fc7"
top: "re-fc8"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 3782
}
}
layer {
name: "prob"
type: "Softmax"
bottom: "re-fc8"
top: "prob"
}
7. 测试验证
编写classify_alexnet.py文件,复制并修改如下代码:
#coding=utf-8
import numpy as np
import pickle
import os
import time
import sys
import shutil
from skimage import io, transform
os.environ['CUDA_VISIBLE_DEVICES']='1'
sys.path.insert(0, "/home/hp/software/caffe-master/python") #需要改为你自己的
sys.path.insert(0, "/home/hp/software/caffe-master/python/caffe") #需要改为你自己的
import caffe
from chineseMap_yidun2 import label_to_char # 这里导入的是我的chineseMap_yidun2.py中定义的标签和字符的转换
def resize_test(img_path0, img_path):
filenames=os.listdir(img_path0)
for filename in filenames:
im = io.imread(img_path0 + filename)
im = transform.resize(im, (108, 108))
io.imsave(img_path + filename, im)
def evaluate(top_k):
print '---------'
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
# python读取的图片文件格式为H×W×K,需转化为K×H×W
transformer.set_transpose('data', (2,0,1))
transformer.set_mean('data', np.load(mean_file).mean(1).mean(1))
# caffe中图片是BGR格式,而原始格式是RGB,所以要转化
transformer.set_channel_swap('data', (2,1,0))
# python中将图片存储为[0, 1],而caffe中将图片存储为[0, 255]
transformer.set_raw_scale('data', 255)
fre=open('.../answer.txt','w')
#for img_name in allimage:
filenames=os.listdir(img_path)
filenames.sort()
i=0
for filename in filenames:
fre.write(filename+' ')
img_name=filename
print img_name
image=caffe.io.load_image(img_path+img_name)
tmp_char=img_name.split('_')[2].split('.')[0]
print tmp_char
# 数据输入、预处理
net.blobs['data'].data[...] = transformer.preprocess('data',image)
out = net.forward()
#取出最后一层(prob)属于某个类别的概率值,deploy文件中最后一层的名称是'prob'。
#.flatten里面存储的是图像im属于每一类的概率值,argsort() 函数的作用是对数组按照从小到大的顺序排列。
#[-1:-2:-1]使用切片,倒数第一个数到倒数第二个数每一个取一个就把倒数第一个数取出来了,即取出了最大概率的类。
label_index = net.blobs['prob'].data[0].flatten().argsort()[-1:-top_k-1:-1]
labels=label_index[0]
fre.write(label_to_char[labels]+'\n')
if label_to_char[labels]==tmp_char:
i+=1
return i
#以下路径都应改成自己对应的
net_file = '.../alexnet_deploy.prototxt'
caffe_model = '.../output/_iter_26100.caffemodel'
mean_file = '.../mean.npy'
net = caffe.Net(net_file,caffe_model,caffe.TEST)
img_path0='.../test_image/'
img_path='.../test_resize/'
top_k = 1
resize_test(img_path0, img_path)
summ=evaluate(top_k)
print summ,float(summ/3572) #这里计算的是准确率,3572是测试图片张数
8. 查看结果(略)
执行classify_alexnet.py文件,就可以测试并在屏幕上看到结果了。
结果展示:
数据集展示:
alexnet训练识别结果:
数据字典:label_to_char
结果展示:
数据集展示:
alexnet训练识别结果:
数据字典:label_to_char