注1:本文内容属于caffe_root/example/下的ipynb文件的源码解析,旨在通过源码注释,加速初学者的学习进程。
注2:以下解析中,未对各部分英文注释做翻译,旨在告诫初学者,应该去适应原汁原味的英文教程阅读,这样有助于提升自己阅读技术文献的能力,也是高级程序员的必备素养。
注3:建议大家在jupyter nootebook环境下结合源码注释,运行程序。
In this example, we’ll explore learning with Caffe in Python, using the fully-exposed Solver
interface.
pylab
import for numpy and plot inline.from pylab import *#可以直接用imshow,subplots等函数
%matplotlib inline
caffe
, adding it to sys.path
if needed. Make sure you’ve built pycaffe.#caffe_root路径应该改成你的实际路径,注意多余空格的影响
import sys
caffe_root = '/home/slb103/softwares/caffe/' # this file should be run from {caffe_root}/examples (otherwise change this line),
sys.path.insert(0, caffe_root + 'python') #也可以用的sys.path.append()
import caffe
# If you get "No module named _caffe", either you have not built pycaffe or you have the wrong path.
# run scripts from caffe root
import os
os.chdir(caffe_root) #改变当前工作路径,因为运行脚本从根目录下运行!!!
# Download data
!data/mnist/get_mnist.sh
# Prepare data
!examples/mnist/create_mnist.sh
# back to examples
os.chdir('examples')
Now let’s make a variant of LeNet, the classic 1989 convnet architecture.
We’ll need two external files to help out:
* the net prototxt
, defining the architecture and pointing to the train/test data
* the solver prototxt
, defining the learning parameters和the net prototxt所在路径
We start by creating the net. We’ll write the net in a succinct and natural way as Python code that serializes to Caffe’s protobuf model format.
This network expects to read from pregenerated LMDBs, but reading directly from ndarray
s is also possible using MemoryDataLayer
.
# 用python通过Caffe's protobuf 得到lenet_auto_train.prototxt和lenet_auto_test.prototxt
from caffe import layers as L, params as P
def lenet(lmdb, batch_size):
# our version of LeNet: a series of linear and simple nonlinear transformations
n = caffe.NetSpec()#定制化的net对象
n.data, n.label = L.Data(batch_size=batch_size, backend=P.Data.LMDB, source=lmdb, # data, label 等名字可以用于索引blob字典结构
transform_param=dict(scale=1./255), ntop=2) #ntop=2表示输出有两种n.data, n.label
n.conv1 = L.Convolution(n.data, kernel_size=5, num_output=20, weight_filler=dict(type='xavier'))
n.pool1 = L.Pooling(n.conv1, kernel_size=2, stride=2, pool=P.Pooling.MAX)
n.conv2 = L.Convolution(n.pool1, kernel_size=5, num_output=50, weight_filler=dict(type='xavier'))
n.pool2 = L.Pooling(n.conv2, kernel_size=2, stride=2, pool=P.Pooling.MAX)
n.fc1 = L.InnerProduct(n.pool2, num_output=500, weight_filler=dict(type='xavier'))
n.relu1 = L.ReLU(n.fc1, in_place=True)
n.score = L.InnerProduct(n.relu1, num_output=10, weight_filler=dict(type='xavier'))
#Train和验证阶段共用,但后者不会考虑反向梯度计算和传播, 部署时网络可用deploy.prototxt中的 L.Softmax代替 L.SoftmaxWithLoss
n.loss = L.SoftmaxWithLoss(n.score, n.label)
return n.to_proto()
#以下是train和test的prototxt分开写,当然也可写在一个train_val.prottxt中,实例化时,他们在内存中网络结构共用,只是输入和输出接口不同
with open('mnist/lenet_auto_train.prototxt', 'w') as f:
f.write(str(lenet('mnist/mnist_train_lmdb', 64)))#得到lenet_auto_train.prototxt,输入lenet()参数分别为train网络的数据源和batchsize的大小
with open('mnist/lenet_auto_test.prototxt', 'w') as f:
f.write(str(lenet('mnist/mnist_test_lmdb', 100)))#得到lenet_auto_train.prototxt,输入lenet()参数分别为test网络的数据源和batchsize的大小
The net has been written to disk in a more verbose but human-readable serialization format using Google’s protobuf library. You can read, write, and modify this description directly. Let’s take a look at the train net.
!cat mnist/lenet_auto_train.prototxt
Now let’s see the learning parameters, which are also written as a prototxt
file (already provided on disk). We’re using SGD with momentum, weight decay, and a specific learning rate schedule.
#lenet_auto_solver.prototxt 训练文件尽量不用python生成,因为他的格式很简单,很容易配置参数
!cat mnist/lenet_auto_solver.prototxt #默认为SGD梯度训练法,下面为SGD+动量法,caffe.SGDSolver('mnist/lenet_auto_solver.prototxt')
#当mnist/lenet_auto_solver.prototxt中无设备设置项时,可用如下代码设置
caffe.set_device(0)
caffe.set_mode_gpu()
### load the solver and create train and test nets
solver = None # ignore this workaround for lmdb data (can't instantiate two solvers on the same data)因为train和test是分开的
#当有solver类型("SGD", "Adam", and "Nesterov" )设置项时用solver = caffe.get_solver(文件)这种形式
#当enet_auto_solver.prototxt中无solver类型设置项时,用下面这种形式,
solver = caffe.SGDSolver('mnist/lenet_auto_solver.prototxt')#会形成solver.net和solver.test_net对象
# each output is (batch size, feature dim, spatial dim)即不同于net = caffe.Net()的net.blobs.iteritems(),测试net往往是deply.prototxt定义的net结构
#此处为solver = caffe.SGDSolver('mnist/lenet_auto_solver.prototxt'),输出的是训练阶段,即train.prototxt和test.prototxt的net结构
[(k, v.data.shape) for k, v in solver.net.blobs.items()] #元素为(layer_name序号, v.data.shape)
# just print the weight sizes (we'll omit the biases)
[(k, v[0].data.shape) for k, v in solver.net.params.items()]#元素为(layer_name序号, 权重项.data.shape),不含偏置
#前向一次操作后,net中就有一个batchsize的数据了!!!
solver.net.forward() # train net,
#下面的0代表第0个test_net,因为solver.prototxt可以添加多个测试test.prototxt的路径,即如果olver = caffe.SGDSolver('mnist/lenet_auto_solver.prototxt')会生成对多个testnet对象
solver.test_nets[0].forward() # test net (there can be more than one)
# we use a little trick to tile the first eight images
#data[:8, 0]代表取batchsize的前8个图像,因为是data层,通道为1个,所以索引为0,即(batchsize,C,H,W),transpose(1, 0, 2)是把CHW变为plt可绘制的HWC形式
imshow(solver.net.blobs['data'].data[:8, 0].transpose(1, 0, 2).reshape(28, 8*28), cmap='gray'); axis('off')#cmap='gray'即将data尺度化到0-255,用HWC形式显示
print 'train labels:', solver.net.blobs['label'].data[:8]
imshow(solver.test_nets[0].blobs['data'].data[:8, 0].transpose(1, 0, 2).reshape(28, 8*28), cmap='gray'); axis('off')
print 'test labels:', solver.test_nets[0].blobs['label'].data[:8]
Both train and test nets seem to be loading data, and to have correct labels.
#一次step代表完成一次梯度的后向传播和filter中的梯度更新!
solver.step(1)
Do we have gradients propagating through our filters? Let’s see the updates to the first layer, shown here as a 4×5 grid of 5×5 filters.
#['conv1'][0]代表filter的权重项,不含偏置。diff中国为梯度数据,而不是权重参数!!!!!!!!
#diff[:, 0]即f从ilters*c*k*k中取所有filter,让c=0,因为输入为单通道的
imshow(solver.net.params['conv1'][0].diff[:, 0].reshape(4, 5, 5, 5)#cmap='gray'即将data尺度化到0-255,用HWC形式显示出图像,因为这是matplotlib画图的格式
.transpose(0, 2, 1, 3).reshape(4*5, 5*5), cmap='gray'); axis('off')
Something is happening. Let’s run the net for a while, keeping track of a few things as it goes.
Note that this process will be the same as if training through the caffe
binary. In particular:
* logging will continue to happen as normal
* snapshots will be taken at the interval specified in the solver prototxt (here, every 5000 iterations),每个batchsize为一次iteration,这些设置在solver.prototxt中
* testing will happen at the interval specified (here, every 500 iterations)
Since we have control of the loop in Python, we’re free to compute additional things as we go, as we show below. We can do many other things as well, for example:
* write a custom stopping criterion
* change the solving process by updating the net in the loop
#注意,前面我们没有用类似这样的sh ./build/tools/caffe train --solver=examples/mnist/train_lenet.sh命令取开始循环执行默认参数下的solver训练,我们采用以下方法训练!
%%time
#采用python的循环控制实现自定义的训练
niter = 200 #对应solver.prototxt中的max_iter=10000
test_interval = 25 #对应solver.prototxt的test_interval: 500,Carry out testing every 500 training iterations.
# losses will also be stored in the log
train_loss = zeros(niter)
#最终共200//25=8个test准确率
test_acc = zeros(int(np.ceil(niter / test_interval)))#每test_interval次的batchsize训练迭代得到一个test准确率,test准确率为遍历所有测试样本后的标量值,
output = zeros((niter, 8, 10))#8从下文可知,返回的是一个batchsize的前8张图片,10为每张图片在网络中最后全连接层的输出score,为长度为10 的向量
# the main solver loop
for it in range(niter):
#一次完整的batchsize训练,训练net包括数据batchsize更新、前向和后向计算、及权重更新,而与test-net无关,也与test-net前向的操作无关,
solver.step(1) # SGD by Caffe,
# store the train loss
train_loss[it] = solver.net.blobs['loss'].data #每迭代一次,存储一次batchsize的训练loss标量值
#store the output on the first test batch
# 即权重更新后,做一次测试网络前向操作,记录一个batchsize中前8张图片的score值,
# (start the forward pass at conv1 to avoid loading new data)
solver.test_nets[0].forward(start='conv1')
#因为前向是从conv1开始的,所以data层的数据没有变,还是在前面的cell中test-net的前向操作中装入的test-LMDB的第一个batchsize数据
#注意由于下文是solver.test_nets[0].forward(),所以会重新
#存储这个batchsize中前8张图片的score值向量,score为最后一个全连接输出,非softmax后的概率输出
output[it] = solver.test_nets[0].blobs['score'].data[:8]
# run a full test every so often
# (Caffe can also do this for us and write to a log, but we show here
# how to do it directly in Python, where more complicated things are easier.)
if it % test_interval == 0: #每隔test_interval 的训练,做一次完整的100*batchsize的测试,即遍历一次总的测试样本,这里的batchsize为test-net的batchsize,为100
print 'Iteration', it, 'testing...'
correct = 0
for test_it in range(100): #100相当于solver.prototxt中的test_iter,因为共有10000测试样本,而batchsize为100
#会从test-LMDB的第二个batchsize数据装入开始,100次test迭代后,即遍历完一次测试集,此时,网络中又变为test-LMDB的第一个batchsize数据
solver.test_nets[0].forward()
correct += sum(solver.test_nets[0].blobs['score'].data.argmax(1) #1表示沿着横向比大小,返回一个batchsize大小的索引向量
== solver.test_nets[0].blobs['label'].data)
test_acc[it // test_interval] = correct / 1e4 #100*batchsize=1e4,在初始化test.prototxt时,给如lenet()的batchsize为100,所以得1e4
_, ax1 = subplots()
ax2 = ax1.twinx()#
ax1.plot(arange(niter), train_loss)
ax2.plot(test_interval * arange(len(test_acc)), test_acc, 'r')
ax1.set_xlabel('iteration')
ax1.set_ylabel('train loss')
ax2.set_ylabel('test accuracy')
ax2.set_title('Test Accuracy: {:.2f}'.format(test_acc[-1]))
The loss seems to have dropped quickly and coverged (except for stochasticity), while the accuracy rose correspondingly. Hooray!
for i in range(8):
figure(figsize=(2, 2))
#data[i, 0]代表取batchsize的第i个图像,因为是data层,通道为1个,所以索引为0,即(batchsize,C,H,W)
imshow(solver.test_nets[0].blobs['data'].data[i, 0], cmap='gray')
figure(figsize=(10, 2))
#batchsize中前第i张图片的score值向量,在前50次batchsize训练迭代的score向量
#转置后图像为10*50,即H:10,W:50,因为是非softmax的概率值,每列可能有大于256,所以plt绘制时大于256的会压缩至0-256,所以可能会分辨率图模糊
imshow(output[:50, i].T, interpolation='nearest', cmap='gray')#存储batchsize中前8张图片的score值向量,score为最后一个全连接输出,前(niter, 8, 10)的50*i*10
xlabel('iteration')
ylabel('label')
We started with little idea about any of these digits, and ended up with correct classifications for each. If you’ve been following along, you’ll see the last digit is the most difficult, a slanted “9” that’s (understandably) most confused with “4”.
for i in range(8):
figure(figsize=(2, 2))
imshow(solver.test_nets[0].blobs['data'].data[i, 0], cmap='gray')
figure(figsize=(10, 2))
##转置后图像为10*50,即H:10,W:50,把每一列值归一化到0-1之间,得到的分辨率图区分性更合理,但是会harder to see the scores for less likely digits
imshow(exp(output[:50, i].T) / exp(output[:50, i].T).sum(0), interpolation='nearest', cmap='gray')
xlabel('iteration')
ylabel('label')
Now that we’ve defined, trained, and tested LeNet there are many possible next steps:
base_lr
and the like or simply training longerSGD
to an adaptive method like AdaDelta
or Adam
Feel free to explore these directions by editing the all-in-one example that follows.
Look for “EDIT HERE
” comments for suggested choice points.
By default this defines a simple linear classifier as a baseline.
In case your coffee hasn’t kicked in and you’d like inspiration, try out
ReLU
to ELU
or a saturing nonlinearity like Sigmoid
0.1
and 0.001
)Adam
(this adaptive solver type should be less sensitive to hyperparameters, but no guarantees…)niter
higher (to 500 or 1,000 for instance) to better show training differencestrain_net_path = 'mnist/custom_auto_train.prototxt'
test_net_path = 'mnist/custom_auto_test.prototxt'
solver_config_path = 'mnist/custom_auto_solver.prototxt'
### define net,会生成'mnist/路径下的custom_auto_train.prototxt和custom_auto_test.prototxt文件
def custom_net(lmdb, batch_size):
# define your own net!
n = caffe.NetSpec()
# keep this data layer for all networks
n.data, n.label = L.Data(batch_size=batch_size, backend=P.Data.LMDB, source=lmdb,
transform_param=dict(scale=1./255), ntop=2)
# EDIT HERE to try different networks
#...................示例一个全连接层的线性分类器
# this following single layer defines a simple linear classifier
# (in particular this defines a multiway logistic regression)
n.score = L.InnerProduct(n.data, num_output=10, weight_filler=dict(type='xavier'))
# EDIT HERE this is the LeNet variant we have already tried
# n.conv1 = L.Convolution(n.data, kernel_size=5, num_output=20, weight_filler=dict(type='xavier'))
# n.pool1 = L.Pooling(n.conv1, kernel_size=2, stride=2, pool=P.Pooling.MAX)
# n.conv2 = L.Convolution(n.pool1, kernel_size=5, num_output=50, weight_filler=dict(type='xavier'))
# n.pool2 = L.Pooling(n.conv2, kernel_size=2, stride=2, pool=P.Pooling.MAX)
# n.fc1 = L.InnerProduct(n.pool2, num_output=500, weight_filler=dict(type='xavier'))
# EDIT HERE consider L.ELU or L.Sigmoid for the nonlinearity
# n.relu1 = L.ReLU(n.fc1, in_place=True)
# n.score = L.InnerProduct(n.fc1, num_output=10, weight_filler=dict(type='xavier'))
# keep this loss layer for all networks
#Train和验证阶段共用,但后者不会考虑反向梯度计算和传播, 部署时网络可用deploy.prototxt中的 L.Softmax代替 L.SoftmaxWithLoss
n.loss = L.SoftmaxWithLoss(n.score, n.label)
return n.to_proto()
with open(train_net_path, 'w') as f:
f.write(str(custom_net('mnist/mnist_train_lmdb', 64)))
with open(test_net_path, 'w') as f:
f.write(str(custom_net('mnist/mnist_test_lmdb', 100)))
### define solver,生成'mnist/路径下的custom_auto_solver.prototxt文件,当然也可以直接编辑custom_auto_solver.prototxt文件相应can参数,因为python生成可读性也不强
from caffe.proto import caffe_pb2
s = caffe_pb2.SolverParameter()
# Set a seed for reproducible experiments:
# this controls for randomization in training.
s.random_seed = 0xCAFFE
# Specify locations of the train and (maybe) test networks.
s.train_net = train_net_path
s.test_net.append(test_net_path)
# 每500次迭代做一次全部测试集的迭代遍历,输出全部测试集输入网络被分对的准确率,当max_iter为10000时,会输出20次的accuracy!!!
s.test_interval = 500 # Test after every 500 training iterations.
s.test_iter.append(100) # Test on 100 batches each time we test.因为共10000个测试样本,在test-Batchsize设置为100时,test-iter应设置为100
s.max_iter = 10000 # no. of times to update the net (training iterations)
# EDIT HERE to try different solvers
# solver types include "SGD", "Adam", and "Nesterov" among others.
s.type = "SGD"
# Set the initial learning rate for SGD.
s.base_lr = 0.01 # EDIT HERE to try different learning rates
# Set momentum to accelerate learning by
# taking weighted average of current and previous updates.动量加速学习进程
s.momentum = 0.9
# Set weight decay to regularize and prevent overfitting权重衰减率,防止过拟合
s.weight_decay = 5e-4
# Set `lr_policy` to define how the learning rate changes during training.
# This is the same policy as our default LeNet.
s.lr_policy = 'inv' #学习率改变策略!
s.gamma = 0.0001
s.power = 0.75
# EDIT HERE to try the fixed rate (and compare with adaptive solvers)
# `fixed` is the simplest policy that keeps the learning rate constant.
# s.lr_policy = 'fixed'
# Display the current training loss and accuracy every 1000 iterations.#每1000次train-Batchsize的训练,在屏幕显示一次训练loss
s.display = 1000
# Snapshots are files used to store networks we've trained.
# We'll snapshot every 5K iterations -- twice during training.
s.snapshot = 5000#max_iter = 10000 时会生成两个权重模型文件
s.snapshot_prefix = 'mnist/custom_net'
# Train on the GPU
s.solver_mode = caffe_pb2.SolverParameter.GPU
#训练验证时一般使用CPU还是GPU写在solve.prototxt参数中,而部署模型时,用:
#caffe.set_device(0) #if we have multiple GPUs, pick the first one
#caffe.set_mode_gpu()
# Write the solver to a temporary file and return its filename.
with open(solver_config_path, 'w') as f:
f.write(str(s))
### load the solver and create train and test nets
solver = None # ignore this workaround for lmdb data (can't instantiate two solvers on the same data)
#solver = caffe.SGDSolver('mnist/lenet_auto_solver.prototxt')#会形成solver.net和solver.test_net对象
solver = caffe.get_solver(solver_config_path)#因为mnist/custom_auto_solver.prototxt'中设置solver类型为SGD了!
### solve
niter = 250 # EDIT HERE increase to train for longer
test_interval = niter / 10 #表示进行10次测试样本的全遍历
# losses will also be stored in the log
train_loss = zeros(niter)
test_acc = zeros(int(np.ceil(niter / test_interval)))
# the main solver loop
for it in range(niter): #
solver.step(1) # SGD by Caffe
# store the train loss
train_loss[it] = solver.net.blobs['loss'].data
# run a full test every so often
# (Caffe can also do this for us and write to a log, but we show here
# how to do it directly in Python, where more complicated things are easier.)
if it % test_interval == 0:
print 'Iteration', it, 'testing...'
correct = 0
for test_it in range(100):
solver.test_nets[0].forward()
correct += sum(solver.test_nets[0].blobs['score'].data.argmax(1)
== solver.test_nets[0].blobs['label'].data)
test_acc[it // test_interval] = correct / 1e4
_, ax1 = subplots() #_代表figure
ax2 = ax1.twinx()
ax1.plot(arange(niter), train_loss)
ax2.plot(test_interval * arange(len(test_acc)), test_acc, 'r')
ax1.set_xlabel('iteration')
ax1.set_ylabel('train loss')
ax2.set_ylabel('test accuracy')
ax2.set_title('Custom Test Accuracy: {:.2f}'.format(test_acc[-1]))