Fully Convolutional Networks
在经典的CNN架构中,在卷积和池化之后之后,网络的最后都会有三层全链接的网络,caffe中叫做Inner product。例如经典的AlexNet架构下所示:
这里的全连接就是把上一层中的输出在flat之后输入相应的单元。全连接本质上就是把输入向量和本层对应的权重相乘而再加上偏置,经过激活函数而得到的数。这种框架在最后可以理解为把图像transfer成为数据表示而用来进行classification。在最后把图像表达的信息进行了具体化为数据可以进行分类,但是,却不能直观的看到最后的图像,感觉有点可惜。说白了,我们想看到的是下图之类的信息,而不是用一串数据来进行分类,毕竟cnn并不局限于classification嘛。
上面的这幅图是多方位人脸检测的最终结果,该方法把最后的fc层全部改成了fc-conv之后,就在最后得到了这些信息,至于人脸具体如何检测,可以使用非极大值抑制就是canny的边缘检测去做。本人认为cnn的特征提取的确很牛掰,但是有时候还是需要有一些简单的数字图像输出来辅助,例如最初的r-cnn和fast r-cnn。
下面我们看看具体是如何进行转化的。首先在caffe的源码中找到下面这个框架,大致上类似于VGG,具体是 net_surgery/bvlc_caffenet_full_conv.prototxt 和/models/bvlc_reference_caffenet/deploy.prototxt,在整体的框架没有区别,就是最后的三层改成了fc-conv。就是把全连接层inner product改成了convolution。用diff命令可以很清楚的看到他们之间的区别,如下:
1,2c1,2//代表的是两个个文件中的1到2行之间有所不同
< # Fully convolutional network version of CaffeNet.
< name: "CaffeNetConv"
---//分割线,上面的第一个文件的内容,下面是第二个人文件的内容
> name: "CaffeNet"
> input: "data"
7,11c7
< input_param {
< # initial shape for a fully convolutional network:
< # the shape can be set for each input by reshape.
< shape: { dim: 1 dim: 3 dim: 451 dim: 451 }
< }
---
> input_param { shape: { dim: 10 dim: 3 dim: 227 dim: 227 } }
157,158c153,154
< name: "fc6-conv"
< type: "Convolution"
---//上下比较你会发现type的类型改变了
> name: "fc6"
> type: "InnerProduct"
160,161c156,157
< top: "fc6-conv"
< convolution_param {
---
> top: "fc6"
> inner_product_param {
163d158
< kernel_size: 6
169,170c164,165
< bottom: "fc6-conv"
< top: "fc6-conv"
---
> bottom: "fc6"
> top: "fc6"
175,176c170,171
< bottom: "fc6-conv"
< top: "fc6-conv"
---
> bottom: "fc6"
> top: "fc6"
182,186c177,181
< name: "fc7-conv"
< type: "Convolution"
< bottom: "fc6-conv"
< top: "fc7-conv"
< convolution_param {
---
> name: "fc7"
> type: "InnerProduct"
> bottom: "fc6"
> top: "fc7"
> inner_product_param {
188d182
< kernel_size: 1
194,195c188,189
< bottom: "fc7-conv"
< top: "fc7-conv"
---
> bottom: "fc7"
> top: "fc7"
200,201c194,195
< bottom: "fc7-conv"
< top: "fc7-conv"
---
> bottom: "fc7"
> top: "fc7"
207,211c201,205
< name: "fc8-conv"
< type: "Convolution"
< bottom: "fc7-conv"
< top: "fc8-conv"
< convolution_param {
---
> name: "fc8"
> type: "InnerProduct"
> bottom: "fc7"
> top: "fc8"
> inner_product_param {
213d206
< kernel_size: 1
219c212
< bottom: "fc8-conv"
---
> bottom: "fc8"
一般情况下都是先去pre-train CNN,这个是时候的CNN框架末端使用的是inner product的框架,但是在参数训练完成之后就会把后面的那个fc层改称为fc-conv类型的框架。但是这个时候有一个问题,我们需要怎样做才可以使得在fc层上训练的权重矩阵可以用于fc-conv类型的矩阵呢?前者的矩阵格式是:(num_layer_top_output,num_layer_bottom_input),而后者的格式是(output, input , height , width),如下所示
# Make sure that caffe is on the python path:
caffe_root = '../' # this file is expected to be in {caffe_root}/examples
import sys
sys.path.insert(0, caffe_root + 'python')
import caffe
# Load the original network and extract the fully connected layers' parameters.
net = caffe.Net('../models/bvlc_reference_caffenet/deploy.prototxt',
'../models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel',
caffe.TEST)
params = ['fc6', 'fc7', 'fc8']
# fc_params = {name: (weights, biases)}
fc_params = {pr: (net.params[pr][0].data, net.params[pr][1].data) for pr in params}
for fc in params:
print '{} weights are {} dimensional and biases are {} dimensional'.format(fc, fc_params[fc][0].shape, fc_params[fc][1].shape)
fc6 weights are (4096, 9216) dimensional and biases are (4096,) dimensional
fc7 weights are (4096, 4096) dimensional and biases are (4096,) dimensional
fc8 weights are (1000, 4096) dimensional and biases are (1000,) dimensional
fc-conv的参数
# Load the fully convolutional network to transplant the parameters.
net_full_conv = caffe.Net('net_surgery/bvlc_caffenet_full_conv.prototxt',
'../models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel',
caffe.TEST)
params_full_conv = ['fc6-conv', 'fc7-conv', 'fc8-conv']
# conv_params = {name: (weights, biases)}
conv_params = {pr: (net_full_conv.params[pr][0].data, net_full_conv.params[pr][1].data) for pr in params_full_conv}
for conv in params_full_conv:
print '{} weights are {} dimensional and biases are {} dimensional'.format(conv, conv_params[conv][0].shape, conv_params[conv][1].shape)
fc6-conv weights are (4096, 256, 6, 6) dimensional and biases are (4096,) dimensional
fc7-conv weights are (4096, 4096, 1, 1) dimensional and biases are (4096,) dimensional
fc8-conv weights are (1000, 4096, 1, 1) dimensional and biases are (1000,) dimensional
仔细点儿你会发现,fc-conv的后面那三个数的乘机应该等于fc中的第二个是参数,256*6*6=4096。主要是因为pool5层的输入是36个element,并且stride位1,想要全部覆盖,就用卷积核6*6.就是这样转化过来的,代码如下:
for pr, pr_conv in zip(params, params_full_conv):
conv_params[pr_conv][0].flat = fc_params[pr][0].flat # flat unrolls the arrays
conv_params[pr_conv][1][...] = fc_params[pr][1]
上面的代码就可以实现两者之间参数的转换。这里就介绍玩了参数的如何从fc转化到fc-conv啦。