ResNet小结

Key Point list

1 Vanishing/exploding gradients are largely addressed by normalized initialization and intermediate normalization layers(BN)

2 The plain network with depth increasing gets saturated and then degrades rapidly, which is not caused by overfitting. (degradation problem)

ResNet小结_第1张图片

3 ResNet assump that it is easier to optimize the residual mapping than optize the origin, unreferenced mapping
Denote the desired underlying mapping as H(x) , residual mapping as F(x)
F(x):=H(x)x
ResNet小结_第2张图片

Reasoning: The degradation problem suggests that the solvers might have difficulties in approximating identity mappings by multiple nonlinear layers. With the residual learning reformulation, if identity mappings are optimal, the solvers may simply drive the weights of the multiple nonlinear layers toward zero to approach identity mappings.
If the optimal function is closer to an identity mapping than to a zero mapping, it should be easier for the solver to find the perturbations with reference to an identity mapping, than to learn the function as a new one

4 The Design of ResNet:

  • Bottleneck

    ResNet小结_第3张图片

    Reduce the parameters and calculations to develop more deeper network by reducing the channels.
    The parameter-free identity shortcuts are particularly important for the bottleneck architectures

  • Identity Mapping by Shortcuts(See Figure 2 )
    when dimension increases,
    method:
    A) identity mapping with zeros padding
    B) projection shortcut by 1 x 1 convolutions

  • Network Framework

    ResNet小结_第4张图片

5 Experiment on ImageNet show:
- ResNet are easier to optimize, the plain net exhibit higher training error with depth increasing
- ResNet can easily enjoy accuracy from increased depth producing results substantially better than previous networks

ResNet小结_第5张图片

ResNet小结_第6张图片

6 Experiment Result:

ResNet小结_第7张图片

ResNet小结_第8张图片

ResNet小结_第9张图片

你可能感兴趣的:(deep-learning)