Deep Residual Learning for Image Recognition--ResNet论文阅读笔记

Deep Residual Learning for Image Recognition--ResNet

“ease the training of networks that are substantially deeper than those used previously”




1.    vanishing/exploding gradients

normalized initialization and intermediate normalization layers


2.    degradation



I.  Basic Ideas

1.    residual mapping

denoting the desired underlying mapping as H(x), we let the stacked nonlinear layers t another mapping of F(x) := H(x)x. The original mapping is recast into F(x)+x


2.    shortcut connections

The formulation of F(x)+x can be realized by feed forward neural networks with “shortcut connections” (Fig. 2). Shortcut connections are those skipping one or more layers.


II.  Notice

1.    The dimensions of x and F mustbe equal. If this is not the case (e.g., when changing the input/output channels), we can perform a linear projection Ws by the shortcut connections tomatch the dimensions


2.    a function F that has two or three layers. if F has only a single layer we have not observed advantages.


III.  Network Architectures

1.    plain network 基于VGG设计,residual network 在plain network的基础上加了shortcut connections.

2.    When the dimensions increase (dotted line shortcutsin Fig. 3), we consider two options:

(A) The shortcut still performs identity mapping, with extra zero entries padded for increasing dimensions. This option introduces no extra parameter;

(B) The projection shortcut is used to match dimensions (done by 1×1 convolutions).



IV.  Implementation

multi-scale, standard color augmentation, BN, SGD with a mini-batch size of 256

learning rate starts from0.1 and is divided by 10 when the error plateaus and the models are trained forup to 60×104 iterations.

weight decay : 0.0001  momentum : 0.9. no dropout


V.  Experiments

plain network从18-layer到34-layer出现了退化现象,而ResNet-34效果要优于ResNet-18

(A)zero-padding shortcuts areused for increasing dimensions, and all shortcuts are parameter free

(B) projection shortcuts areused for increasing dimensions, and other shortcuts are identity;

(C) all shortcuts areprojections

ResNet-34 ABC都优于plain-34,B优于A,C略优于B,说明projection shortcuts在效果上更好


VI.  Deeper Bottleneck Architectures

The three layers are 1×1, 3×3, and 1×1 convolutions, where the 1×1 layers are responsible for reducing and then increasing (restoring) dimensions, leaving the 3×3 layer a bottleneck with smaller input/output dimensions.


identity shortcuts lead to more efficient models for the bottleneck designs

对于bottleneck,identity shortcut要优于projection shortcut
