caffe示例实现之12微调CaffeNet用于风格识别

本例是caffe示例实现之6的ipython版,在预训练好的网络上微调自己的数据,这在实际应用中非常有用。由于预训练的网络是在很大的图像集上学习到的,中间层能捕捉到一般视觉外观的“语义”,因此这种特征非常强大(可以把它看成一个黑盒子),而且只需要少数几个层就能获得好性能。
首先准备数据,(1)用shell脚本获取ImageNet ilsvrc上预训练过的模型;(2)下载Flickr风格数据集的子集;(3)把下载到的数据集编辑为Caffe能用的数据库格式。

import os
os.chdir('..')
import sys
sys.path.insert(0, './python')

import caffe
import numpy as np
from pylab import *
%matplotlib inline
# This downloads the ilsvrc auxiliary data (mean file, etc),
# and a subset of 2000 images for the style recognition task.
!data/ilsvrc12/get_ilsvrc_aux.sh
!scripts/download_model_binary.py models/bvlc_reference_caffenet
!python examples/finetune_flickr_style/assemble_data.py \
    --workers=-1 --images=2000 --seed=1701 --label=5

下面看看微调网络与原始caffe模型的差别:

!diff models/bvlc_reference_caffenet/train_val.prototxt models/finetune_flickr_style/train_val.prototxt

输出显示如下:

1c1
< name: "CaffeNet"
---
> name: "FlickrStyleCaffeNet"
4c4
<   type: "Data"
---
>   type: "ImageData"
15,26c15,19
< # mean pixel / channel-wise mean instead of mean image
< # transform_param {
< # crop_size: 227
< # mean_value: 104
< # mean_value: 117
< # mean_value: 123
< # mirror: true
< # }
<   data_param {
<     source: "examples/imagenet/ilsvrc12_train_lmdb"
<     batch_size: 256
<     backend: LMDB
---
>   image_data_param {
>     source: "data/flickr_style/train.txt"
>     batch_size: 50
>     new_height: 256
>     new_width: 256
31c24
<   type: "Data"
---
>   type: "ImageData"
42,51c35,36
< # mean pixel / channel-wise mean instead of mean image
< # transform_param {
< # crop_size: 227
< # mean_value: 104
< # mean_value: 117
< # mean_value: 123
< # mirror: true
< # }
<   data_param {
<     source: "examples/imagenet/ilsvrc12_val_lmdb"
---
>   image_data_param {
>     source: "data/flickr_style/test.txt"
53c38,39
<     backend: LMDB
---
>     new_height: 256
>     new_width: 256
323a310
>   # Note that lr_mult can be set to 0 to disable any fine-tuning of this, and any other, layer
360c347
<   name: "fc8"
---
>   name: "fc8_flickr"
363c350,351
<   top: "fc8"
---
>   top: "fc8_flickr"
>   # lr_mult is set to higher than for other layers, because this layer is starting from random while the others are already trained
365c353
<     lr_mult: 1
---
>     lr_mult: 10
369c357
<     lr_mult: 2
---
>     lr_mult: 20
373c361
<     num_output: 1000
---
>     num_output: 20
384a373,379
>   name: "loss"
>   type: "SoftmaxWithLoss"
>   bottom: "fc8_flickr"
>   bottom: "label"
>   top: "loss"
> }
> layer {
387c382
<   bottom: "fc8"
---
>   bottom: "fc8_flickr"
393,399d387
< }
< layer {
<   name: "loss"
<   type: "SoftmaxWithLoss"
<   bottom: "fc8"
<   bottom: "label"
<   top: "loss"

下面用python进行训练,将经过预训练微调的模型与直接训练的模型做比较:

niter = 200
# losses will also be stored in the log
train_loss = np.zeros(niter)
scratch_train_loss = np.zeros(niter)

caffe.set_device(0)
caffe.set_mode_gpu()
# We create a solver that fine-tunes from a previously trained network.
solver = caffe.SGDSolver('models/finetune_flickr_style/solver.prototxt')
solver.net.copy_from('models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel')
# For reference, we also create a solver that does no finetuning.
scratch_solver = caffe.SGDSolver('models/finetune_flickr_style/solver.prototxt')

# We run the solver for niter times, and record the training loss.
for it in range(niter):
    solver.step(1)  # SGD by Caffe
    scratch_solver.step(1)
    # store the train loss
    train_loss[it] = solver.net.blobs['loss'].data
    scratch_train_loss[it] = scratch_solver.net.blobs['loss'].data
    if it % 10 == 0:
        print 'iter %d, finetune_loss=%f, scratch_loss=%f' % (it, train_loss[it], scratch_train_loss[it])
print 'done'

输出显示如下:

iter 0, finetune_loss=3.360094, scratch_loss=3.136188
iter 10, finetune_loss=2.672608, scratch_loss=9.736364
iter 20, finetune_loss=2.071996, scratch_loss=2.250404
iter 30, finetune_loss=1.758295, scratch_loss=2.049553
iter 40, finetune_loss=1.533391, scratch_loss=1.941318
iter 50, finetune_loss=1.561658, scratch_loss=1.839706
iter 60, finetune_loss=1.461696, scratch_loss=1.880035
iter 70, finetune_loss=1.267941, scratch_loss=1.719161
iter 80, finetune_loss=1.192778, scratch_loss=1.627453
iter 90, finetune_loss=1.541176, scratch_loss=1.822061
iter 100, finetune_loss=1.029039, scratch_loss=1.654087
iter 110, finetune_loss=1.138547, scratch_loss=1.735837
iter 120, finetune_loss=0.917412, scratch_loss=1.851918
iter 130, finetune_loss=0.971519, scratch_loss=1.801927
iter 140, finetune_loss=0.868252, scratch_loss=1.745545
iter 150, finetune_loss=0.790020, scratch_loss=1.844925
iter 160, finetune_loss=1.092668, scratch_loss=1.695591
iter 170, finetune_loss=1.055344, scratch_loss=1.661715
iter 180, finetune_loss=0.969769, scratch_loss=1.823639
iter 190, finetune_loss=0.780566, scratch_loss=1.820862
done

看看两种训练方法的训练损失:

plot(np.vstack([train_loss, scratch_train_loss]).T)

结果显示如下:

[<matplotlib.lines.Line2D at 0x7fbb36f0ad50>,
 <matplotlib.lines.Line2D at 0x7fbb36f0afd0>]

微调方法产生的损失函数变化更为平滑,最终的损失也比较小。下面再将较小值的部分放大来看看:

plot(np.vstack([train_loss, scratch_train_loss]).clip(0, 4).T)

输出显示如下:

[<matplotlib.lines.Line2D at 0x7fbb347a8310>,
 <matplotlib.lines.Line2D at 0x7fbb347a8590>]

caffe示例实现之12微调CaffeNet用于风格识别_第1张图片

看一下运行200次迭代后的测试准确率。这个例子的分类任务中有5个类别,随机准确率是20%。预计微调结果会比直接训练的结果要好很多。

test_iters = 10
accuracy = 0
scratch_accuracy = 0
for it in arange(test_iters):
    solver.test_nets[0].forward()
    accuracy += solver.test_nets[0].blobs['accuracy'].data
    scratch_solver.test_nets[0].forward()
    scratch_accuracy += scratch_solver.test_nets[0].blobs['accuracy'].data
accuracy /= test_iters
scratch_accuracy /= test_iters
print 'Accuracy for fine-tuning:', accuracy
print 'Accuracy for training from scratch:', scratch_accuracy

结果显示如下:

Accuracy for fine-tuning: 0.570000001788
Accuracy for training from scratch: 0.224000000954

显然微调的结果要好很多,以后可以把迭代次数设置多一点,把整个训练过程运行完再看看结果。

你可能感兴趣的:(数据集,cnn,深度学习,caffe,微调)