学习卷积神经网络,首先需要理解卷积的概念,二维离散卷积的概念,以及卷积核在图像中进行卷积操作得出结果图的实际直观含义。
卷积的实际计算方式:二维卷积更正式的名字是块循环矩阵(double block circulant matrix),矩阵论,Toeplitz矩阵
caffe中,默认的矩阵计算实现都是基于矩阵乘法的。
就是把卷积核在图像中的滑动通过矩阵每一行元素位置的移动来实现的。
深度神经网络在视觉中的应用主要是以层(二维矩阵的方式进行计算的)
要理解feature-map中的一个像素就是一个神经元,普通的一个神经元的结构,权重weight与偏置bias,
卷积核的参数就是连接神经元上权重的参数,而偏执是另外的。
每个卷积核心对应一个输入图像和卷积之后得到的特征图,这种关系被称为权值共享(即一个特征图对应(共享)了一个卷积核与一个偏置参数);要“学习”的就是卷积核与偏置这些参数。
神经网络的训练:BP算法,后向传播算法,根植于一定的数学优化基础知识,(有监督学习:减小模型预测值与实际值直接误差的问题:优化问题),只需理解一般的优化问题,凸优化就是要求求解的函数的局部最小就是全局最小(凸函数),凸优化是一般优化问题的一个子集,机器学习与深度学习中设计的优化不用去深入凸优化问题。
现阶段只要学习和了解基于梯度优化方法即可。
目录
目录
Caffe环境minist数据集手写数字识别
1.1LeNet-5 卷集神经网络如下图:
1.2步骤:
1.数据下载
2.回到caffe主目录
3.LeNet的网络结构定义在 examples/mnist/lenet_train_test.prototxt文件里:
4.训练网络
5.网络测试
6.mnist数据集转换成bmp图片数据:
7.minis图片手写数字测试
2.Caffe网络模型要素及构成
2.1LeNet模型
1.数据层
2.卷积层
3.Pooling层
4.内积(全连接)层
5.ReLU(矫正线性单元)Rectifier Line Uint
6.Softmax层
2.2 参数配置文件(训练后得到的网络参数文件)
LeNet-5包含了卷积层,Pooling层,全连接层,这些层元素构成了现代CNN的基本组件,学习和研究LeNet是研究更复杂网络模型的基础!
论文链接:http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf
下图是使用Caffe的draw_net.py脚本,输入LeNet-5的网络描述文件lenet_train_test.prototxt 画出了的网络结构图:
Left to right:
Button to up
进入Caffe主目录,在examples/mnist/目录下找到LeNet的Caffe实现版本
在caffe主目录下执行脚本:./data/mnist/get_mnist.sh 该脚本文件内容如下:
# This scripts downloads the mnist data and unzips it.
#!/usr/bin/env sh
# This scripts downloads the mnist data and unzips it.
DIR="$( cd "$(dirname "$0")" ; pwd -P )"
cd "$DIR"
echo "Downloading..."
for fname in train-images-idx3-ubyte train-labels-idx1-ubyte t10k-images-idx3-ubyte t10k-labels-idx1-ubyte
do
if [ ! -e $fname ]; then
wget --no-check-certificate http://yann.lecun.com/exdb/mnist/${fname}.gz
gunzip ${fname}.gz
fi
done
在文件夹/data/mnist/下得到: 1、train-images-idx3-ubyte 2、train-labels-idx1-ubyte 3、t10k-images-idx3-ubyte 4、t10k-labels-idx1-ubyte 4个文件
执行./examples/mnist/create_minist.sh 脚本文件 将2步骤得到的数据集和标签转化成lmdb文件
执行该命令可以在终端输入:回车执行即可,也可以使用sudo sh ./examples/mnist/create_mnist.sh
sh ./examples/mnist/create_mnist.sh
在文件夹/examples/mnist下生成 mnist-train-lmdb 和mnist-test-lmdb 两个文件
注意:1.执行脚本命令时若使用了sudo 则root权限,以后使用这些文件时也需要root权限
网络训练时候的梯度下降求解模型定义在examples/mnist/solver.prototxt里面
lenet_train_test.prototxt文件内容如下:
name: "LeNet"
layer {
name: "mnist"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
scale: 0.00390625
}
data_param {
source: "examples/mnist/mnist_train_lmdb"
batch_size: 64
backend: LMDB
}
}
layer {
name: "mnist"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
scale: 0.00390625
}
data_param {
source: "examples/mnist/mnist_test_lmdb"
batch_size: 100
backend: LMDB
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 20
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 50
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "ip1"
type: "InnerProduct"
bottom: "pool2"
top: "ip1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 500
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "ip1"
top: "ip1"
}
layer {
name: "ip2"
type: "InnerProduct"
bottom: "ip1"
top: "ip2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 10
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "ip2"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "ip2"
bottom: "label"
top: "loss"
}
lenet_solver.prototxt 文件如下:lenet_solver.prototxt 文件里面定义了网络原型文件的路径,迭代次数,学习率,迭代多少次显示一结果,和使用CPU还是使用GPU。
# The train/test net protocol buffer definition
net: "examples/mnist/lenet_train_test.prototxt"
# test_iter specifies how many forward passes the test should carry out.
# In the case of MNIST, we have test batch size 100 and 100 test iterations,
# covering the full 10,000 testing images.
test_iter: 100
# Carry out testing every 500 training iterations.
test_interval: 500
# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.01
momentum: 0.9
weight_decay: 0.0005
# The learning rate policy
lr_policy: "inv"
gamma: 0.0001
power: 0.75
# Display every 100 iterations
display: 100
# The maximum number of iterations
max_iter: 10000
# snapshot intermediate results
snapshot: 5000
snapshot_prefix: "examples/mnist/lenet"
# solver mode: CPU or GPU
solver_mode: GPU
在caffe主目录下执行 sh ./examples/mnist/train_lenet.sh 脚本文件
train_lenet.sh 脚本的内容如下:
#!/usr/bin/env sh
set -e
./build/tools/caffe train --solver=examples/mnist/lenet_solver.prototxt $@
训练完成后在/examples/mnist/目录下生成:
lenet_iter_5000.caffemodel lenet_iter_5000.solverstate lenet_iter_10000.caffemodel lenet_iter_10000.solverstate
分别是迭代5000次和迭代10000次训练生成的.caffemodel caffe模型文件和 .solverstate 求解状态文件
回到caffe主目录,在终端执行: sh ./examples/mnist/test_lenet.sh 命令
test_lenet.sh脚本内容如下:
#!/usr/bin/env sh
set -e
./build/tools/caffe test -model=examples/mnist/lenet_train_test.prototxt -weights=examples/mnist/lenet_iter_10000.caffemodel -gpu=0
此脚本调用caffe主目录下的tools目录里面的test命名,带上网络的定义文件.prototxt和训练的caffe模型的权重参数文件.caffemodel 和指定使用第一块gpu进行运算。
mnist2bmp.py 将训练数据集转换成bmp图像格式
import numpy as np
import struct
import matplotlib.pyplot as plt
#import Image
from PIL import Image,ImageFont
filename='t10k-images-idx3-ubyte'
binfile=open(filename,'rb')
buf=binfile.read()
index=0
s=struct.Struct('>IIII');
#magic,numImages,numRows,numColumns=strcut.unpack_from('>IIII',buf,index)
magic,numImages,numRows,numColumns=s.unpack_from(buf,index)
index+=struct.calcsize('>IIII')
for image in range(0,numImages):
im=struct.unpack_from('>784B',buf,index) #28*28=784
index+=struct.calcsize('>784B')
im=np.array(im,dtype='uint8')
im=im.reshape(28,28)
im=Image.fromarray(im)
im.save('mnist_test/test_%s.bmp'%image,'bmp')
python若有中文注释的话,文件的保存需要使用utf-8编码
在python文件的第一行加:
# -*- coding:utf-8 -*-
上面的mnist2bmp.py 使用的python的struct模块:
该模块可以格式化地读写二进制文件数据
'>IIII' 的意思是 :>表示是大端序,即网络字节顺序;大写的I表示4字节的整型数。
下面的脚本完成加载网络模型和参数文件,输入一幅28*28的手写数字,并输出识别结果:
# -*- coding:utf-8 -*-
#有中文注释的话需要utf-8编码
import os
import sys
import numpy as np
import matplotlib.pyplot as plt
caffe_root='/home/yang/caffe/' #设置Caffe环境的根目录
sys.path.insert(0,caffe_root+'python') #添加系统环境变量
import caffe
MODEL_FILE='/home/yang/caffe/examples/mnist/lenet.prototxt' #Lenet网络的定义文件
PRETRAINED='/home/yang/caffe/examples/mnist/lenet_iter_10000.caffemodel' #网络模型参数
IMAGE_FILE='/home/yang/caffe/data/mnist/mnist_test/test_0.bmp' #测试图片路径
input_image=caffe.io.load_image(IMAGE_FILE,color=False)
net=caffe.Classifier(MODEL_FILE,PRETRAINED)
prediction=net.predict([input_image],oversample=False)
caffe.set_mode_cpu()
print 'predicted class:',prediction[0].argmax()
输出如下:预测结果为数字7
$ python caffeP103LeNet.py
/home/yang/.local/lib/python2.7/site-packages/skimage/io/_io.py:49: UserWarning: `as_grey` has been deprecated in favor of `as_gray`
warn('`as_grey` has been deprecated in favor of `as_gray`')
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1120 19:10:51.107837 6016 net.cpp:53] Initializing net from parameters:
name: "LeNet"
state {
phase: TEST
level: 0
}
layer {
name: "data"
type: "Input"
top: "data"
input_param {
shape {
dim: 64
dim: 1
dim: 28
dim: 28
}
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 20
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 50
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "ip1"
type: "InnerProduct"
bottom: "pool2"
top: "ip1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 500
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "ip1"
top: "ip1"
}
layer {
name: "ip2"
type: "InnerProduct"
bottom: "ip1"
top: "ip2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 10
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "prob"
type: "Softmax"
bottom: "ip2"
top: "prob"
}
I1120 19:10:51.107987 6016 layer_factory.hpp:77] Creating layer data
I1120 19:10:51.108000 6016 net.cpp:86] Creating Layer data
I1120 19:10:51.108006 6016 net.cpp:382] data -> data
I1120 19:10:51.108021 6016 net.cpp:124] Setting up data
I1120 19:10:51.108026 6016 net.cpp:131] Top shape: 64 1 28 28 (50176)
I1120 19:10:51.108033 6016 net.cpp:139] Memory required for data: 200704
I1120 19:10:51.108037 6016 layer_factory.hpp:77] Creating layer conv1
I1120 19:10:51.108047 6016 net.cpp:86] Creating Layer conv1
I1120 19:10:51.108052 6016 net.cpp:408] conv1 <- data
I1120 19:10:51.108057 6016 net.cpp:382] conv1 -> conv1
I1120 19:10:51.806463 6016 net.cpp:124] Setting up conv1
I1120 19:10:51.806486 6016 net.cpp:131] Top shape: 64 20 24 24 (737280)
I1120 19:10:51.806499 6016 net.cpp:139] Memory required for data: 3149824
I1120 19:10:51.806514 6016 layer_factory.hpp:77] Creating layer pool1
I1120 19:10:51.806524 6016 net.cpp:86] Creating Layer pool1
I1120 19:10:51.806529 6016 net.cpp:408] pool1 <- conv1
I1120 19:10:51.806535 6016 net.cpp:382] pool1 -> pool1
I1120 19:10:51.806552 6016 net.cpp:124] Setting up pool1
I1120 19:10:51.806556 6016 net.cpp:131] Top shape: 64 20 12 12 (184320)
I1120 19:10:51.806562 6016 net.cpp:139] Memory required for data: 3887104
I1120 19:10:51.806565 6016 layer_factory.hpp:77] Creating layer conv2
I1120 19:10:51.806576 6016 net.cpp:86] Creating Layer conv2
I1120 19:10:51.806581 6016 net.cpp:408] conv2 <- pool1
I1120 19:10:51.806586 6016 net.cpp:382] conv2 -> conv2
I1120 19:10:51.808470 6016 net.cpp:124] Setting up conv2
I1120 19:10:51.808481 6016 net.cpp:131] Top shape: 64 50 8 8 (204800)
I1120 19:10:51.808488 6016 net.cpp:139] Memory required for data: 4706304
I1120 19:10:51.808497 6016 layer_factory.hpp:77] Creating layer pool2
I1120 19:10:51.808506 6016 net.cpp:86] Creating Layer pool2
I1120 19:10:51.808509 6016 net.cpp:408] pool2 <- conv2
I1120 19:10:51.808516 6016 net.cpp:382] pool2 -> pool2
I1120 19:10:51.808526 6016 net.cpp:124] Setting up pool2
I1120 19:10:51.808531 6016 net.cpp:131] Top shape: 64 50 4 4 (51200)
I1120 19:10:51.808537 6016 net.cpp:139] Memory required for data: 4911104
I1120 19:10:51.808542 6016 layer_factory.hpp:77] Creating layer ip1
I1120 19:10:51.808552 6016 net.cpp:86] Creating Layer ip1
I1120 19:10:51.808557 6016 net.cpp:408] ip1 <- pool2
I1120 19:10:51.808563 6016 net.cpp:382] ip1 -> ip1
I1120 19:10:51.811174 6016 net.cpp:124] Setting up ip1
I1120 19:10:51.811182 6016 net.cpp:131] Top shape: 64 500 (32000)
I1120 19:10:51.811187 6016 net.cpp:139] Memory required for data: 5039104
I1120 19:10:51.811197 6016 layer_factory.hpp:77] Creating layer relu1
I1120 19:10:51.811213 6016 net.cpp:86] Creating Layer relu1
I1120 19:10:51.811216 6016 net.cpp:408] relu1 <- ip1
I1120 19:10:51.811221 6016 net.cpp:369] relu1 -> ip1 (in-place)
I1120 19:10:51.825971 6016 net.cpp:124] Setting up relu1
I1120 19:10:51.825984 6016 net.cpp:131] Top shape: 64 500 (32000)
I1120 19:10:51.825992 6016 net.cpp:139] Memory required for data: 5167104
I1120 19:10:51.825999 6016 layer_factory.hpp:77] Creating layer ip2
I1120 19:10:51.826009 6016 net.cpp:86] Creating Layer ip2
I1120 19:10:51.826014 6016 net.cpp:408] ip2 <- ip1
I1120 19:10:51.826022 6016 net.cpp:382] ip2 -> ip2
I1120 19:10:51.826073 6016 net.cpp:124] Setting up ip2
I1120 19:10:51.826079 6016 net.cpp:131] Top shape: 64 10 (640)
I1120 19:10:51.826086 6016 net.cpp:139] Memory required for data: 5169664
I1120 19:10:51.826094 6016 layer_factory.hpp:77] Creating layer prob
I1120 19:10:51.826102 6016 net.cpp:86] Creating Layer prob
I1120 19:10:51.826107 6016 net.cpp:408] prob <- ip2
I1120 19:10:51.826114 6016 net.cpp:382] prob -> prob
I1120 19:10:51.826570 6016 net.cpp:124] Setting up prob
I1120 19:10:51.826580 6016 net.cpp:131] Top shape: 64 10 (640)
I1120 19:10:51.826586 6016 net.cpp:139] Memory required for data: 5172224
I1120 19:10:51.826592 6016 net.cpp:202] prob does not need backward computation.
I1120 19:10:51.826597 6016 net.cpp:202] ip2 does not need backward computation.
I1120 19:10:51.826603 6016 net.cpp:202] relu1 does not need backward computation.
I1120 19:10:51.826608 6016 net.cpp:202] ip1 does not need backward computation.
I1120 19:10:51.826613 6016 net.cpp:202] pool2 does not need backward computation.
I1120 19:10:51.826619 6016 net.cpp:202] conv2 does not need backward computation.
I1120 19:10:51.826625 6016 net.cpp:202] pool1 does not need backward computation.
I1120 19:10:51.826630 6016 net.cpp:202] conv1 does not need backward computation.
I1120 19:10:51.826637 6016 net.cpp:202] data does not need backward computation.
I1120 19:10:51.826642 6016 net.cpp:244] This network produces output prob
I1120 19:10:51.826650 6016 net.cpp:257] Network initialization done.
I1120 19:10:51.843571 6016 net.cpp:746] Ignoring source layer mnist
I1120 19:10:51.844007 6016 net.cpp:746] Ignoring source layer loss
/home/yang/.local/lib/python2.7/site-packages/skimage/transform/_warps.py:110: UserWarning: Anti-aliasing will be enabled by default in skimage 0.15 to avoid aliasing artifacts when down-sampling images.
warn("Anti-aliasing will be enabled by default in skimage 0.15 to "
predicted class: 7
Caffe的模型需要两个重要的参数文件:1.网络模型定义文件*.prototxt;2.网络参数文件*.solver.prototxt;
网络参数文件可以认为是经过训练算法求解得到网络参数后得到的,所以命名是*.solver.prototxt;
Caffe的网络模型定义文件定义了网络每一层的行为(行为描述)
看右路分支:data->conv1->pool1->conv2->pool2->ip1->(relu1)->ip2->loss(softMaxWithLoss)
依次了解:数据层(训练数据层和测试数据层)、卷积层、池化层、内积(全连接)层、ReLU层、Loss层、
网络模型的输入层为数据层,即网络模型的数据输入定义,一般包括训练数据和测试数据层两种类型。
source字段指明数据库文件的路径;
bachsize:指明批处理的大小
scale:取值在[0,1]区间
layer {
name: "mnist"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
scale: 0.00390625
}
data_param {
source: "examples/mnist/mnist_train_lmdb"
batch_size: 64
backend: LMDB
}
}
layer {
name: "mnist"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
scale: 0.00390625
}
data_param {
source: "examples/mnist/mnist_test_lmdb"
batch_size: 100
backend: LMDB
}
}
blobs_lr:1,blobs_lr:2 分别表示weight和bias更新时的学习率,这里的权值学习率为solver.prototxt中定义的学习率。
bias学习率为权重学习了的两倍,这样会得到较好的收敛速度。
num_output表示滤波器的个数;kernel_size是滤波核的大小,stride是滤波器的滑动步长,
weight_filter表示滤波器的类型;
xavier(发音[‘zeɪvɪr]):是从[-scale,scale]中进行均匀采样,对卷积层或全连接层中的参数进行初始化的方法。
xavier:caffe中具体是怎样实现的,代码位于include/caffe/filler.hpp文件中
#代表注释
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1 #权重weight的学习率
}
param {
lr_mult: 2 #偏置bias的学习率,一般为weight学习率的两倍
}
convolution_param {
num_output: 20 #卷积核(滤波器)的个数
kernel_size: 5 #卷积核的大小5*5
stride: 1 #卷积核的滑动步长1
weight_filler {
type: "xavier" #滤波器的类型
}
bias_filler {
type: "constant" #偏置的初始化方式
}
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "ip1"
type: "InnerProduct"
bottom: "pool2"
top: "ip1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 500
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
非线性变化层:max(0,x)
一般与卷积层成对出现。
layer {
name: "relu1"
type: "ReLU"
bottom: "ip1"
top: "ip1"
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "ip2"
bottom: "label"
top: "loss"
}
caffe LeNet 计算机视觉-深度学习入门
*.slover.ptototxt文件
Caffe的参数配置文件*.slover.prototxt定义了网络模型训练过程中需要设置的参数,比如,学习率、权重衰减系数、迭代次数,使用GPU还是CPU进行计算。
参考文献:
[1]深度学习 Caffe之经典模型详解与实战P88