WDSR论文阅读笔记

WDSR论文阅读笔记_第1张图片

  1. 在ReLU前有更宽的channel有助于提高acc,在相同数量的参数下,使需要做res-identity传播的层的channel数量减少而增多ReLU层前的channel的数量。
  2. 要Weight-Normalization而不是batch-normalization,能提供更好的acc和更快的收敛,batch normalization 在SR中被抛弃了
  3. linear-low-rank convolution(1*1卷积核)有助于提供更宽的channel(在相同数量的参数下且能提供更好的acc)
  4. 在论文中有以下有趣的conv方式,日后有时间可以仔细看看:
    1. Flattened convolution :Flattened convolutions [13] consist of consecutive sequence of onedimensional filters across all directions in 3D space (lateral, vertical and horizontal) to approximate 3 conventional convolutions.
    2. Group convolution :Group convolutions [38] divide features into groups channel-wisely and perform convolutions inside the group individually, followed by a concatenation to form the final output.
    3. Depthwise separable convolution :Depthwise separable convolution is a stack of depthwise convolution (i.e. a spatial convolution performed independently over each channel of an input) followed by a pointwise convolution (i.e. a 1x1 convolution) without non-linearities
    4. Inverted residuals
  5. WDSR-A:channel变宽的比例不要超过4,否则做res的层会太过slim,导致甚至小于最终的3通道输出,一般2-4是合适的
    Assume the width of identity mapping pathway (Fig. 2) is w1 and width before activation inside residual block is w2.
    We introduce expansion factor before activation as r thus w2 = r × w1. In the vanilla residual networks (e.g., used in EDSR and MDSR) we have w2 = w1 and the number of parameters are 2 × w1^2 × k^2 in each residual block. The computational (Mult-Add operations) complexity is a constant scaling of parameter numbers when we fix the input patch size. To have same complexity w1^2  = w1' × w2' = r × w1'^2, the residual identity mapping pathway need to be slimmed as a factor of pr and the activation can be expanded with r^0.5 times meanwhile.
     
  6. WDSR-B:在1 * 1 conv 和 3*3 conv之间如果加ReLU会显著降低准确率,文中认为这也是个支持wide activation的现象
  7. 在SR中,几乎不会有过拟合现象,所以正则化是不需要的,而BN提供的正则化效果多余了;
  8. 而且SR需要在train和test的时候有相同的formulation,否则会降低acc,而BN在train和test过程中有不同的formulation
  9. SR用的batch大小和patch大小都太小,所以BN并不适用
  10. WDSR论文阅读笔记_第2张图片
  11. WDSR论文阅读笔记_第3张图片
    去除了Global residual pathway前的conv,去除了Global residual pathway后的conv, 去除了pixel Shuffle后的conv,发现这些去除并不影响acc,而且能提高运算速度
  12. 最后总结,其实论文提出了三个关键点,主要是提高了efficiency,acc并没有太大提高:
    1. wider activation
    2. linear low-rank convolution
    3. WB

你可能感兴趣的:(论文阅读笔记,deeplearning,WDSR)