论文笔记:Very Deep Convolutional Networks for Large-Scale Image Recognition

前言

论文:Very Deep Convolutional Networks for Large-Scale Image Recognition

一、INTRODUCTION

  • 作者在这篇论文中提出了如何解决卷积网络深度的问题。作者先是固定好模型的其他部分参数,然后通过增加一些卷积层来逐渐增大网络的深度,而且作者的卷积层使用的是非常小的卷积核

In this paper, we address another important aspect of ConvNet architecture design – its depth. To this end, we fix other parameters of the architecture, and steadily increase the depth of the network by adding more convolutional layers, which is feasible due to the use of very small (3 × 3) convolution filters in all layer

  • 模型最终的performance是非常好的,不仅在分类任务中表现达到了state-of-the-art,在location任务中效果也是非常好的

As a result, we come up with significantly more accurate ConvNet architectures, which not only achieve the state-of-the-art accuracy on ILSVRC classification and localisation tasks, but are also applicable to other image recognition datasets, where they achieve excellent performance even when used as a part of a relatively simple pipelines

二、CONVNET CONFIGURATIONS

  • 在训练过程中,模型的输入被固定为224x224的三通道

During training, the input to our ConvNets is a fixed-size 224 × 224 RGB image

  • 作者使用了非常小的3x3的卷积核,这也是其模型特色之一

we use filters with a very small receptive field: 3 × 3

  • 作者还使用了1x1的卷积核,1x1的卷积核可以把它看作是一个线性变换

In one of the configurations we also utilise 1 × 1 convolution filters, which can be seen as a linear transformation of the input channels

  • 如果某个卷积层的卷积核为3x3,那么其padding将其设置为1

the padding is 1 pixel for 3 × 3 conv. layers

  • 池化层使用的是2x2的pool切步长为2

Max-pooling is performed over a 2 × 2 pixel window, with stride 2

  • 卷积层之后有三个全连接层,前两层隐藏层单元数为4096个,第三次隐藏单元数为4096

A stack of convolutional layers (which has a different depth in different architectures) is followed by three Fully-Connected (FC) layers: the first two have 4096 channels each, the third performs 1000- way ILSVRC classification and thus contains 1000 channels (one for each class

  • 该网络使用的激活函数均为relu

All hidden layers are equipped with the rectification

三、CONFIGURATIONS

  • 这篇论文设计了(A-E)个模型,这些模型基本上参数都是一致的,不一样的就是深度了,模型A有11层(8个卷积3个全连接)到模型E有19层(16个卷积层3个全连接层),依次递增,如下
    论文笔记:Very Deep Convolutional Networks for Large-Scale Image Recognition_第1张图片

  • 作者提出模型的显著特点就是卷积核特别小,但也正是因为卷积核变小了,才使得网络可以变得更深。因为2层3x3的卷积层的感受野实际上就相当于一层5x5的卷积核的感受野。3层3x3的卷积层感受野相当于1层7x7卷积核的感受野

  • 为什么要使用三个3x3的卷积层而不直接使用1个7x7的卷积核的呢?第一个原因就是将一层non-linear分解成三层non-linear,可以增强网络的表达能力。第二个原因就是这样做可以减少网络的参数,如果使用前者替换后者,可以减少大概19%的参数。

First, we incorporate three non-linear rectification layers instead of a single one, which makes the decision function more discriminative. Second, we decrease the number of parameters: assuming that both the input and the output of a three-layer 3 × 3 convolution stack has C channels, the stack is parametrised by 3(32)C2= 27C^2 weights; at the same time, a single 7 × 7 conv. layer would require 72C2= 49C^2parameters, i.e. 81% more

  • 也可以把他们看成是一种正则化,因为网络必须把一层分解成三层。

This can be seen as imposing a regularisation on the 7 × 7 conv. filters, forcing them to have a decomposition through the 3 × 3 filters (with non-linearity injected in between).

  • 其实小卷积核的网络在之前就有人做了,但是他们的网络不是很深,直到2014年,Goodfellow使用了11层的网络才表明增加网络深度确实可以提高performance

Goodfellow et al. (2014) applied deep ConvNets (11 weight layers) to the task of street number recognition, and showed that the increased depth led to better performance

四、CLASSIFICATION FRAMEWORK

  • 在训练过程中,作者使用的biathsize为256,并且momentum的mu=0.9

The batch size was set to 256, momentum to 0.9.

  • 网络在卷积层中使用了L2正则,在全连接层使用了dropout,并且keep-prop=0.5

The training was regularised by weight decay (the L2penalty multiplier set to 5·10−4) and dropout regularisation for the first two fully-connected layers (dropout ratio set to 0.5)

  • 学习率被初始化为0.01,当验证集上的准确率下降时,学习率开始衰减

The learning rate was initially set to 10−2, and then decreased by a factor of 10 when the validation set accuracy stopped improving

  • 作者在训练的过程中,发现它的网络比别人提出深度较少的网络训练收敛更快,作者猜测可能是因为将较大卷积核分解成三层较小卷积核带有正则化作用

the nets required less epochs to converge due to (a) implicit regularisation imposed by greater depth and smaller conv. filter sizes; (b) pre-initialisation of certain layers

  • 网络参数的初始化工作上非常重要的,一个糟糕的初始化会让网络无法学习。为了规避这个问题,作者使用了预训练技术,因为模型A的深度要稍微小一点,相对好训练一些,所以先随机初始化模型A然后进行训练,完成训练后在使用模型A中的参数对模型E进行初始化。不过作者也提到了,使用Xavier初始化可以替代预训练过程。

五、CLASSIFICATION EXPERIMENTS

论文笔记:Very Deep Convolutional Networks for Large-Scale Image Recognition_第2张图片

  • 从图中可以看出用不用LRN实际上对performance影响不大
  • 随着网络的深度的增加,模型的performance越来越好。对比C模型和D模型可以看出使用1x1的的卷积核没有使用3x3的卷积核效果好。但是对比B和C可以看出,1x1的卷积核效果还是更好一些的
  • 当网络深度到19层时,其performance就陷入了瓶颈了,可能是因为数据集size的影响。作者推测,如果数据集足够多的话,网络越深越好。
  • 作者尝试对比13层的B模型和将用5x5作替换的8层B‘模型,最终结果是后者在top-err上比前者高了7%,这就说明了,将一层拥有较大卷积核的卷积层分解成拥有较小卷积核的卷积层确实可以提高performance

论文笔记:Very Deep Convolutional Networks for Large-Scale Image Recognition_第3张图片

  • 后来作者尝试综合7个模型的预测概率,最终test-error达到了7%,如果综合两
    个最好的模型的预测概率,最终的test-error可达6,8%。

论文笔记:Very Deep Convolutional Networks for Large-Scale Image Recognition_第4张图片

  • 作者又将其模型与业界其他的成果进行对比,略输于GoogleNet,但是比之前几年ILSVRC比赛中第一名的模型都要好一些。如果只看单个模型的performance,VGG效果略好于单个的GoogleLeNet

你可能感兴趣的:(论文学习)