本文基于Caffe官方教程中的手写体数字识别项目记录caffe和pycaffe的使用方法.
MNIST 是一个手写体数字识别数据集. 它是从 National Institute of Standards and Technology (NIST) 的庞大数据集中构建的一个子集.
LeNet 是 Lecun 设计的卷积神经网络,用于手写数字分类任务.
本文基于【DeepLearning】【Caffe】编译caffe及虚拟python环境的pycaffe接口 部署的环境.
Training LeNet on MNIST with Caffe
1、 进入caffe-master
目录,$CAFFE_ROOT
是caffe-master
的绝对路径.
$ cd $CAFFE_ROOT
2、 下载并解压数据集. 在$CAFFE_ROOT/data/mnist/
得到train-images-idx3-ubyte
、train-labels-idx1-ubyte
、t10k-images-idx3-ubyte
、t10k-labels-idx1-ubyte
.
$ ./data/mnist/get_mnist.sh
3、 将原始数据集转换为lmdb
格式. 训练集在$CAFFE_ROOT/examples/mnist/mnist_train_lmdb
目录下,测试集在$CAFFE_ROOT/examples/mnist/mnist_test_lmdb
.
$ ./examples/mnist/create_mnist.sh
caffe 命令行工具使用 Google Protobuf 定义模型结构和优化方法. 在 src/caffe/proto/caffe.proto 中可以查看 caffe 用到的设置.
1、 在 examples/mnist/lenet_train_test.prototxt 中定义 LeNet 的模型结构.
定义模型名称:
name: "LeNet"
定义数据层,从 lmdb 中读取数据:
layer {
name: "mnist"
type: "Data"
transform_param {
scale: 0.00390625
}
data_param {
source: "mnist_train_lmdb"
backend: LMDB
batch_size: 64
}
top: "data"
top: "label"
}
该层name
为 mnist,type
为 data,从source
读取数据,backend
定义数据格式,batch_size
大小为64. scale
为 0.00390625 = 1 / 256 0.00390625=1/256 0.00390625=1/256,用于将像素值标准化到 [ 0 , 1 ) [0,1) [0,1)区间. top
表示该层的输出,该层产生两个 blobs,一个是 data blob,另一个是 label blob.
定义卷积层:
layer {
name: "conv1"
type: "Convolution"
param { lr_mult: 1 }
param { lr_mult: 2 }
convolution_param {
num_output: 20
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
bottom: "data"
top: "conv1"
}
该层接收 data blob,神经元数量为 num_output
. 卷积核大小为 kernel_size
,步长为 stride
,weight_filler
和 bias_filler
分别定义初始权重参数和偏置参数. lr_mult
表示该层自适应的学习率,1 表示该层权重参数的学习率和优化器定义的学习率相同,2 表示该层偏置参数的学习率是优化器定义的学习率的2倍. bottom
表示该层的输入,top
表示该层的输出.
定义池化层:
layer {
name: "pool1"
type: "Pooling"
pooling_param {
kernel_size: 2
stride: 2
pool: MAX
}
bottom: "conv1"
top: "pool1"
}
该层定义比较简单,kernel_size
为池化核大小,stride
为步长,pool
定义池化方式.
定义全连接层:
layer {
name: "ip1"
type: "InnerProduct"
param { lr_mult: 1 }
param { lr_mult: 2 }
inner_product_param {
num_output: 500
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
bottom: "pool2"
top: "ip1"
}
该层定义也很简单,神经元数量为 num_output
,其它与前面相似.
定义激活层:
layer {
name: "relu1"
type: "ReLU"
bottom: "ip1"
top: "ip1"
}
该层定义激活函数,type
为激活函数类型. 如果该层支持 in-place 操作,bottom
和 top
可以同名,节省存储空间.
定义损失层:
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "ip2"
bottom: "label"
}
最后该层定义损失函数,type
为损失函数类型. 该层接收两个blobs计算损失值,没有输出. 当反向传播开始计算时,该层产生对上一层的梯度.
定义 Layer Rules :
各层使用include
定义该层在何时处于模型的结构中,例如:
layer {
// ...layer definition...
include: { phase: TRAIN }
}
在这个例子中,该层仅在模型训练阶段被包含在内. 如果将TRAIN
改为TEST
,则该层仅在测试阶段被包含在内. 默认的,该层没有include
定义的 layer rules 时总是在网络模型之中. 因此,在 lenet_train_test.prototxt
中定义了两个 Data 层(它们的 batch_size 不同),一个用于训练阶段,另一个用于测试阶段. 同样的,Accuracy 层仅在测试阶段出现在模型之中.
2、 在 examples/mnist/lenet_solver.prototxt 中定义训练 LeNet 的优化方法.
# The train/test net protocol buffer definition
net: "examples/mnist/lenet_train_test.prototxt"
# test_iter specifies how many forward passes the test should carry out.
# In the case of MNIST, we have test batch size 100 and 100 test iterations,
# covering the full 10,000 testing images.
test_iter: 100
# Carry out testing every 500 training iterations.
test_interval: 500
# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.01
momentum: 0.9
weight_decay: 0.0005
# The learning rate policy
lr_policy: "inv"
gamma: 0.0001
power: 0.75
# Display every 100 iterations
display: 100
# The maximum number of iterations
max_iter: 10000
# snapshot intermediate results
snapshot: 5000
snapshot_prefix: "examples/mnist/lenet"
# solver mode: CPU or GPU
solver_mode: GPU
执行下面脚本开始训练 LeNet
$ ./examples/mnist/train_lenet.sh
该脚本的执行的具体命令如下
#!/usr/bin/env sh
set -e
./build/tools/caffe train --solver=examples/mnist/lenet_solver.prototxt $@
执行训练脚本后,命令行开始输出训练过程信息. 最开始为硬件信息.
I1107 20:52:36.823984 10432 caffe.cpp:204] Using GPUs 0
I1107 20:52:37.016600 10432 caffe.cpp:209] GPU 0: GeForce GTX 1070
从 examples/mnist/lenet_solver.prototxt 初始化优化器
I1107 20:52:38.889734 10432 solver.cpp:45] Initializing solver from parameters:
从 examples/mnist/lenet_train_test.prototxt 创建训练模型
I1107 20:52:38.925832 10432 solver.cpp:102] Creating training net from net file: examples/mnist/lenet_train_test.prototxt
I1107 20:52:38.926434 10432 net.cpp:296] The NetState phase (0) differed from the phase (1) specified by a rule in layer mnist
I1107 20:52:38.926487 10432 net.cpp:296] The NetState phase (0) differed from the phase (1) specified by a rule in layer accuracy
I1107 20:52:38.926750 10432 net.cpp:53] Initializing net from parameters:
开始逐层初始化,例如下面是创建 conv1
的输出信息. 这些信息对debug非常有用.
I1107 20:52:38.967473 10432 net.cpp:86] Creating Layer conv1
I1107 20:52:38.967494 10432 net.cpp:408] conv1 <- data
I1107 20:52:38.967540 10432 net.cpp:382] conv1 -> conv1
I1107 20:52:43.570137 10432 net.cpp:124] Setting up conv1
I1107 20:52:43.570161 10432 net.cpp:131] Top shape: 64 20 24 24 (737280)
I1107 20:52:43.570166 10432 net.cpp:139] Memory required for data: 3150080
训练模型初始化完成.
I1107 20:52:43.579761 10432 net.cpp:257] Network initialization done.
从 examples/mnist/lenet_train_test.prototxt 创建测试模型
I1107 20:52:43.579917 10432 solver.cpp:190] Creating test net (#0) specified by net file: examples/mnist/lenet_train_test.prototxt
I1107 20:52:43.579960 10432 net.cpp:296] The NetState phase (1) differed from the phase (0) specified by a rule in layer mnist
I1107 20:52:43.580045 10432 net.cpp:53] Initializing net from parameters:
测试模型也要初始化
I1107 20:52:43.640388 10432 net.cpp:86] Creating Layer conv1
I1107 20:52:43.640391 10432 net.cpp:408] conv1 <- data
I1107 20:52:43.640398 10432 net.cpp:382] conv1 -> conv1
I1107 20:52:43.642616 10432 net.cpp:124] Setting up conv1
I1107 20:52:43.642632 10432 net.cpp:131] Top shape: 100 20 24 24 (1152000)
I1107 20:52:43.642637 10432 net.cpp:139] Memory required for data: 4922800
测试模型初始化完成
I1107 20:52:43.649025 10432 net.cpp:257] Network initialization done.
开始训练过程
I1107 20:52:43.649063 10432 solver.cpp:57] Solver scaffolding done.
I1107 20:52:43.649327 10432 caffe.cpp:239] Starting Optimization
I1107 20:52:43.649333 10432 solver.cpp:289] Solving LeNet
在优化选项中,设置每隔100次迭代打印训练信息
I1107 20:52:44.042378 10432 solver.cpp:239] Iteration 100 (561.408 iter/s, 0.178124s/100 iters), loss = 0.210162
I1107 20:52:44.042407 10432 solver.cpp:258] Train net output #0: loss = 0.210162 (* 1 = 0.210162 loss)
I1107 20:52:44.042412 10432 sgd_solver.cpp:112] Iteration 100, lr = 0.00992565
每隔500次迭代测试一次模型
I1107 20:52:44.659152 10432 solver.cpp:347] Iteration 500, Testing net (#0)
I1107 20:52:44.717660 10443 data_layer.cpp:73] Restarting data prefetching from start.
I1107 20:52:44.719583 10432 solver.cpp:414] Test net output #0: accuracy = 0.9743
I1107 20:52:44.719601 10432 solver.cpp:414] Test net output #1: loss = 0.083975 (* 1 = 0.083975 loss)
每隔5000次迭代保存一次模型参数和优化设置
I1107 20:52:52.069522 10432 solver.cpp:464] Snapshotting to binary proto file examples/mnist/lenet_iter_5000.caffemodel
I1107 20:52:52.153425 10432 sgd_solver.cpp:284] Snapshotting solver state to binary proto file examples/mnist/lenet_iter_5000.solverstate
训练完成后测试模型
I1107 20:53:00.412792 10432 solver.cpp:347] Iteration 10000, Testing net (#0)
I1107 20:53:00.473321 10443 data_layer.cpp:73] Restarting data prefetching from start.
I1107 20:53:00.474253 10432 solver.cpp:414] Test net output #0: accuracy = 0.9911
I1107 20:53:00.474272 10432 solver.cpp:414] Test net output #1: loss = 0.029594 (* 1 = 0.029594 loss)
I1107 20:53:00.474277 10432 solver.cpp:332] Optimization Done.
I1107 20:53:00.474280 10432 caffe.cpp:250] Optimization Done.
Learning LeNet
01-learning-lenet.ipynb
位于 $CAFFE_ROOT/examples/
目录下.
$ workon caffe-master
(caffe-master)$ pip install ipykernel
(caffe-master)$ python -m ipykernel install --user --name caffe-master --display-name "caffe-master"
(caffe-master)$ cd $CAFFE_ROOT/examples/
(caffe-master)$ jupyter notebook
新建 notebook,命名为 Solving in Python with LeNet
from pylab import *
%matplotlib inline
In [2]:
caffe_root = '../' # this file should be run from {caffe_root}/examples (otherwise change this line)
在《3.3 配置pycaffe接口依赖》中已经设置过 PYTHONPATH
,可以直接导入. 如果没有设置,使用 sys.path
方法导入.
In [3]:
import caffe
使用 python 定义 LeNet 模型.
In [4]:
from caffe import layers as L, params as P
def lenet(lmdb, batch_size):
# our version of LeNet: a series of linear and simple nonlinear transformations
n = caffe.NetSpec()
n.data, n.label = L.Data(batch_size=batch_size, backend=P.Data.LMDB, source=lmdb,
transform_param=dict(scale=1./255), ntop=2)
n.conv1 = L.Convolution(n.data, kernel_size=5, num_output=20, weight_filler=dict(type='xavier'))
n.pool1 = L.Pooling(n.conv1, kernel_size=2, stride=2, pool=P.Pooling.MAX)
n.conv2 = L.Convolution(n.pool1, kernel_size=5, num_output=50, weight_filler=dict(type='xavier'))
n.pool2 = L.Pooling(n.conv2, kernel_size=2, stride=2, pool=P.Pooling.MAX)
n.fc1 = L.InnerProduct(n.pool2, num_output=500, weight_filler=dict(type='xavier'))
n.relu1 = L.ReLU(n.fc1, in_place=True)
n.score = L.InnerProduct(n.relu1, num_output=10, weight_filler=dict(type='xavier'))
n.loss = L.SoftmaxWithLoss(n.score, n.label)
return n.to_proto()
转换为 protobuf 以便 Caffe 读取.
In [5]:
with open('mnist/lenet_auto_train.prototxt', 'w') as f:
f.write(str(lenet('mnist/mnist_train_lmdb', 64)))
with open('mnist/lenet_auto_test.prototxt', 'w') as f:
f.write(str(lenet('mnist/mnist_test_lmdb', 100)))
查看训练模型.
In [6]:
!cat mnist/lenet_auto_train.prototxt
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
transform_param {
scale: 0.003921568859368563
}
data_param {
source: "mnist/mnist_train_lmdb"
batch_size: 64
backend: LMDB
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
convolution_param {
num_output: 20
kernel_size: 5
weight_filler {
type: "xavier"
}
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
convolution_param {
num_output: 50
kernel_size: 5
weight_filler {
type: "xavier"
}
}
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "fc1"
type: "InnerProduct"
bottom: "pool2"
top: "fc1"
inner_product_param {
num_output: 500
weight_filler {
type: "xavier"
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "fc1"
top: "fc1"
}
layer {
name: "score"
type: "InnerProduct"
bottom: "fc1"
top: "score"
inner_product_param {
num_output: 10
weight_filler {
type: "xavier"
}
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "score"
bottom: "label"
top: "loss"
}
查看测试模型.
In [7]:
!cat mnist/lenet_auto_test.prototxt
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
transform_param {
scale: 0.003921568859368563
}
data_param {
source: "mnist/mnist_test_lmdb"
batch_size: 100
backend: LMDB
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
convolution_param {
num_output: 20
kernel_size: 5
weight_filler {
type: "xavier"
}
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
convolution_param {
num_output: 50
kernel_size: 5
weight_filler {
type: "xavier"
}
}
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "fc1"
type: "InnerProduct"
bottom: "pool2"
top: "fc1"
inner_product_param {
num_output: 500
weight_filler {
type: "xavier"
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "fc1"
top: "fc1"
}
layer {
name: "score"
type: "InnerProduct"
bottom: "fc1"
top: "score"
inner_product_param {
num_output: 10
weight_filler {
type: "xavier"
}
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "score"
bottom: "label"
top: "loss"
}
!cat mnist/lenet_auto_sol
查看优化器设置.
In [8]:
!cat mnist/lenet_auto_solver.prototxt
# The train/test net protocol buffer definition
train_net: "mnist/lenet_auto_train.prototxt"
test_net: "mnist/lenet_auto_test.prototxt"
# test_iter specifies how many forward passes the test should carry out.
# In the case of MNIST, we have test batch size 100 and 100 test iterations,
# covering the full 10,000 testing images.
test_iter: 100
# Carry out testing every 500 training iterations.
test_interval: 500
# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.01
momentum: 0.9
weight_decay: 0.0005
# The learning rate policy
lr_policy: "inv"
gamma: 0.0001
power: 0.75
# Display every 100 iterations
display: 100
# The maximum number of iterations
max_iter: 10000
# snapshot intermediate results
snapshot: 5000
snapshot_prefix: "mnist/lenet"
选择硬件设备.
In [9]:
caffe.set_device(0)
caffe.set_mode_gpu()
选择优化算法并加载优化设置》
In [10]:
solver = None # ignore this workaround for lmdb data (can't instantiate two solvers on the same data)
solver = caffe.SGDSolver('mnist/lenet_auto_solver.prototxt')
查看各层中间结果的张量维度.
In [11]:
# each output is (batch size, feature dim, spatial dim)
[(k, v.data.shape) for k, v in solver.net.blobs.items()]
Out[11]:
[('data', (64, 1, 28, 28)),
('label', (64,)),
('conv1', (64, 20, 24, 24)),
('pool1', (64, 20, 12, 12)),
('conv2', (64, 50, 8, 8)),
('pool2', (64, 50, 4, 4)),
('fc1', (64, 500)),
('score', (64, 10)),
('loss', ())]
查看权重变量的张量维度.
In [12]:
# just print the weight sizes (we'll omit the biases)
[(k, v[0].data.shape) for k, v in solver.net.params.items()]
Out[12]:
[('conv1', (20, 1, 5, 5)),
('conv2', (50, 20, 5, 5)),
('fc1', (500, 800)),
('score', (10, 500))]
训练模型前向计算.
In [13]:
solver.net.forward() # train net
Out[13]:
{'loss': array(2.3712316, dtype=float32)}
测试模型前向计算.
In [14]:
solver.test_nets[0].forward() # test net (there can be more than one)
Out[14]:
{'loss': array(2.4383156, dtype=float32)}
查看前8个训练图像和标签.
In [15]:
# we use a little trick to tile the first eight images
imshow(solver.net.blobs['data'].data[:8, 0].transpose(1, 0, 2).reshape(28, 8*28), cmap='gray'); axis('off')
print('train labels:', solver.net.blobs['label'].data[:8])
train labels: [5. 0. 4. 1. 9. 2. 1. 3.]
查看前8个测试图像和标签.
In [16]:
imshow(solver.test_nets[0].blobs['data'].data[:8, 0].transpose(1, 0, 2).reshape(28, 8*28), cmap='gray'); axis('off')
print('test labels:', solver.test_nets[0].blobs['label'].data[:8])
test labels: [7. 2. 1. 0. 4. 1. 4. 9.]
计算一次反向传播.
In [17]:
solver.step(1)
查看第一层的梯度值. 4x5的网格中每一格都是5x5的卷积核.
In [18]:
imshow(solver.net.params['conv1'][0].diff[:, 0].reshape(4, 5, 5, 5).transpose(0, 2, 1, 3).reshape(4*5, 5*5), cmap='gray'); axis('off')
Out[18]:
(-0.5, 24.5, 19.5, -0.5)
自定义一个迭代过程.
In [19]:
%%time
niter = 200
test_interval = 25
# losses will also be stored in the log
train_loss = zeros(niter)
test_acc = zeros(int(np.ceil(niter / test_interval)))
output = zeros((niter, 8, 10))
# the main solver loop
for it in range(niter):
solver.step(1) # SGD by Caffe
# store the train loss
train_loss[it] = solver.net.blobs['loss'].data
# store the output on the first test batch
# (start the forward pass at conv1 to avoid loading new data)
solver.test_nets[0].forward(start='conv1')
output[it] = solver.test_nets[0].blobs['score'].data[:8]
# run a full test every so often
# (Caffe can also do this for us and write to a log, but we show here
# how to do it directly in Python, where more complicated things are easier.)
if it % test_interval == 0:
print('Iteration', it, 'testing...')
correct = 0
for test_it in range(100):
solver.test_nets[0].forward()
correct += sum(solver.test_nets[0].blobs['score'].data.argmax(1)
== solver.test_nets[0].blobs['label'].data)
test_acc[it // test_interval] = correct / 1e4
Iteration 0 testing...
Iteration 25 testing...
Iteration 50 testing...
Iteration 75 testing...
Iteration 100 testing...
Iteration 125 testing...
Iteration 150 testing...
Iteration 175 testing...
CPU times: user 1.91 s, sys: 551 ms, total: 2.46 s
Wall time: 1.41 s
查看训练损失和测试精度关于迭代次数的曲线.
In [20]:
_, ax1 = subplots()
ax2 = ax1.twinx()
ax1.plot(arange(niter), train_loss)
ax2.plot(test_interval * arange(len(test_acc)), test_acc, 'r')
ax1.set_xlabel('iteration')
ax1.set_ylabel('train loss')
ax2.set_ylabel('test accuracy')
ax2.set_title('Test Accuracy: {:.2f}'.format(test_acc[-1]))
Out[20]:
Text(0.5, 1.0, 'Test Accuracy: 0.94')
查看前8个测试数据,和 LeNet 对它们的预测结果. 预测结果为长度为10的一维向量,分别表示 0~9 的置信度得分. 这里的预测结果是全连接层的直接输出,没有经过softmax激活函数. 第一横栏图像是第 1 个测试图片,手写数字 7. 第二横栏图像是 LeNet 迭代过程中 50 次测试结果. 横坐标表示使用第几次测试时的模型,它反映了迭代次数. 纵坐标表示预测标签 0~9. 像素的亮度表示置信度得分,越亮得分越高. 可以看出随着迭代次数增加,标签为 7 的那一行像素越来越亮. 在最后的标签向量中,7 是最亮的.
In [21]:
for i in range(8):
figure(figsize=(2, 2))
imshow(solver.test_nets[0].blobs['data'].data[i, 0], cmap='gray')
figure(figsize=(10, 2))
imshow(output[:50, i].T, interpolation='nearest', cmap='gray')
xlabel('iteration')
ylabel('label')
从上往下这 8 个测试图片越来越困难. 第八张测试图片“9”看起来很像“4”. LeNet 对其预测的结果中,4 和 9 都有很高的置信度得分.
下面展示了 softmax 激活后的结果. 可以看出,预测结果更为明显了.
In [22]:
for i in range(8):
figure(figsize=(2, 2))
imshow(solver.test_nets[0].blobs['data'].data[i, 0], cmap='gray')
figure(figsize=(10, 2))
imshow(exp(output[:50, i].T) / exp(output[:50, i].T).sum(0), interpolation='nearest', cmap='gray')
xlabel('iteration')
ylabel('label')
自定义修改网络结构.
In [23]:
train_net_path = 'mnist/custom_auto_train.prototxt'
test_net_path = 'mnist/custom_auto_test.prototxt'
solver_config_path = 'mnist/custom_auto_solver.prototxt'
### define net
def custom_net(lmdb, batch_size):
# define your own net!
n = caffe.NetSpec()
# keep this data layer for all networks
n.data, n.label = L.Data(batch_size=batch_size, backend=P.Data.LMDB, source=lmdb,
transform_param=dict(scale=1./255), ntop=2)
# EDIT HERE to try different networks
# this single layer defines a simple linear classifier
# (in particular this defines a multiway logistic regression)
n.score = L.InnerProduct(n.data, num_output=10, weight_filler=dict(type='xavier'))
# EDIT HERE this is the LeNet variant we have already tried
# n.conv1 = L.Convolution(n.data, kernel_size=5, num_output=20, weight_filler=dict(type='xavier'))
# n.pool1 = L.Pooling(n.conv1, kernel_size=2, stride=2, pool=P.Pooling.MAX)
# n.conv2 = L.Convolution(n.pool1, kernel_size=5, num_output=50, weight_filler=dict(type='xavier'))
# n.pool2 = L.Pooling(n.conv2, kernel_size=2, stride=2, pool=P.Pooling.MAX)
# n.fc1 = L.InnerProduct(n.pool2, num_output=500, weight_filler=dict(type='xavier'))
# EDIT HERE consider L.ELU or L.Sigmoid for the nonlinearity
# n.relu1 = L.ReLU(n.fc1, in_place=True)
# n.score = L.InnerProduct(n.fc1, num_output=10, weight_filler=dict(type='xavier'))
# keep this loss layer for all networks
n.loss = L.SoftmaxWithLoss(n.score, n.label)
return n.to_proto()
with open(train_net_path, 'w') as f:
f.write(str(custom_net('mnist/mnist_train_lmdb', 64)))
with open(test_net_path, 'w') as f:
f.write(str(custom_net('mnist/mnist_test_lmdb', 100)))
### define solver
from caffe.proto import caffe_pb2
s = caffe_pb2.SolverParameter()
# Set a seed for reproducible experiments:
# this controls for randomization in training.
s.random_seed = 0xCAFFE
# Specify locations of the train and (maybe) test networks.
s.train_net = train_net_path
s.test_net.append(test_net_path)
s.test_interval = 500 # Test after every 500 training iterations.
s.test_iter.append(100) # Test on 100 batches each time we test.
s.max_iter = 10000 # no. of times to update the net (training iterations)
# EDIT HERE to try different solvers
# solver types include "SGD", "Adam", and "Nesterov" among others.
s.type = "SGD"
# Set the initial learning rate for SGD.
s.base_lr = 0.01 # EDIT HERE to try different learning rates
# Set momentum to accelerate learning by
# taking weighted average of current and previous updates.
s.momentum = 0.9
# Set weight decay to regularize and prevent overfitting
s.weight_decay = 5e-4
# Set `lr_policy` to define how the learning rate changes during training.
# This is the same policy as our default LeNet.
s.lr_policy = 'inv'
s.gamma = 0.0001
s.power = 0.75
# EDIT HERE to try the fixed rate (and compare with adaptive solvers)
# `fixed` is the simplest policy that keeps the learning rate constant.
# s.lr_policy = 'fixed'
# Display the current training loss and accuracy every 1000 iterations.
s.display = 1000
# Snapshots are files used to store networks we've trained.
# We'll snapshot every 5K iterations -- twice during training.
s.snapshot = 5000
s.snapshot_prefix = 'mnist/custom_net'
# Train on the GPU
s.solver_mode = caffe_pb2.SolverParameter.GPU
# Write the solver to a temporary file and return its filename.
with open(solver_config_path, 'w') as f:
f.write(str(s))
### load the solver and create train and test nets
solver = None # ignore this workaround for lmdb data (can't instantiate two solvers on the same data)
solver = caffe.get_solver(solver_config_path)
### solve
niter = 250 # EDIT HERE increase to train for longer
test_interval = niter / 10
# losses will also be stored in the log
train_loss = zeros(niter)
test_acc = zeros(int(np.ceil(niter / test_interval)))
# the main solver loop
for it in range(niter):
solver.step(1) # SGD by Caffe
# store the train loss
train_loss[it] = solver.net.blobs['loss'].data
# run a full test every so often
# (Caffe can also do this for us and write to a log, but we show here
# how to do it directly in Python, where more complicated things are easier.)
if it % test_interval == 0:
print('Iteration', it, 'testing...')
correct = 0
for test_it in range(100):
solver.test_nets[0].forward()
correct += sum(solver.test_nets[0].blobs['score'].data.argmax(1)
== solver.test_nets[0].blobs['label'].data)
test_acc[int(it // test_interval)] = correct / 1e4
_, ax1 = subplots()
ax2 = ax1.twinx()
ax1.plot(arange(niter), train_loss)
ax2.plot(test_interval * arange(len(test_acc)), test_acc, 'r')
ax1.set_xlabel('iteration')
ax1.set_ylabel('train loss')
ax2.set_ylabel('test accuracy')
ax2.set_title('Custom Test Accuracy: {:.2f}'.format(test_acc[-1]))
Iteration 0 testing...
Iteration 25 testing...
Iteration 50 testing...
Iteration 75 testing...
Iteration 100 testing...
Iteration 125 testing...
Iteration 150 testing...
Iteration 175 testing...
Iteration 200 testing...
Iteration 225 testing...
Out[23]:
Text(0.5, 1.0, 'Custom Test Accuracy: 0.88')