【caffe】Layer解读之:BatchNorm/Scale

  • Layer type: BatchNorm
  • 头文件位置:./include/caffe/layers/batch_norm_layer.hpp
  • CPU 执行源文件位置: ./src/caffe/layers/batch_norm_layer.cpp
  • CUDA GPU 执行源文件位置: ./src/caffe/layers/batch_norm_layer.cu
  • BatchNorm层的功能:对minibatch的数据作归一化(Normalized)。

  • Layer type: Scale
  • 头文件位置:./include/caffe/layers/scale_layer.hpp
  • CPU 执行源文件位置: ./src/caffe/layers/scale_layer.cpp
  • CUDA GPU 执行源文件位置: ./src/caffe/layers/scale_layer.cu
  • Scale层的功能:。

为什么BatchNorm要和Scale结合起来使用

首先batchnorm论文中,这个操作想实习的功能如下:
1) 输入归一化 x_norm = (x-u)/std, 其中u和std是个累计计算的均值和方差注意还有滑动系数。
2)y=alpha×x_norm + beta,对归一化后的x进行比例缩放和位移。其中alpha和beta是通过迭代学习的。
而caffe中的BatchNorm层实现了功能一,Scale层实现了功能二。
caffe中bn和scale的公式


参数解释

layer {
    bottom: "conv1"
    top: "conv1"
    name: "bn_conv1"
    type: "BatchNorm"
    batch_norm_param {
        use_global_stats: true
    }
    param {
        name: "bn_conv1_0"
          lr_mult: 0
    }
    param {
        name: "bn_conv1_1"
          lr_mult: 0
        }
    param {
        name: "bn_conv1_2"
          lr_mult: 0
        }
}

layer {
    bottom: "conv1"
    top: "conv1"
    name: "scale_conv1"
    type: "Scale"
    param {
        name: "scale_conv1_0"
          lr_mult: 0
    }
    param {
        name: "scale_conv1_1"
          lr_mult: 0
    }
    scale_param{
        filler{
            value: 1
        }
        bias_term: true
        bias_filler{
            value: 0
        }
    }
}

所以我们发现BatchNorm中有三个参数:均值和方差和滑动系数。Scale中有两个参数:alpha和beta。

参数定义

Parameters (BatchNormParameter batch_norm_param)
From ./src/caffe/proto/caffe.proto:

message BatchNormParameter {
  // 如果 use_global_stats = 0,则对当前 mini-batch 内的数据归一化; 
  // 同时 global statistics 通过滑动平均逐渐累加(训练阶段).
  // 如果 use_global_stats = 1,则采用累加的均值和方差对数据进行归一化(测试阶段).
  // 默认情况下,网络训练时 use_global_stats = 0;网络测试时 use_global_stats = 1. 
  optional bool use_global_stats = 1;
  // What fraction of the moving average remains each iteration?
  // Smaller values make the moving average decay faster, giving more
  // weight to the recent values.
  // Each iteration updates the moving average @f$S_{t-1}@f$ with the
  // current mean @f$ Y_t @f$ by
  // @f$ S_t = (1-\beta)Y_t + \beta \cdot S_{t-1} @f$, where @f$ \beta @f$
  // is the moving_average_fraction parameter.
  optional float moving_average_fraction = 2 [default = .999];
  // Small value to add to the variance estimate so that we don't divide by
  // zero.
  optional float eps = 3 [default = 1e-5];
}

Parameters (ScaleParameter scale_param)
From ./src/caffe/proto/caffe.proto:

message ScaleParameter {
  // The first axis of bottom[0] (the first input Blob) along which to apply
  // bottom[1] (the second input Blob).  May be negative to index from the end
  // (e.g., -1 for the last axis).
  //
  // For example, if bottom[0] is 4D with shape 100x3x40x60, the output
  // top[0] will have the same shape, and bottom[1] may have any of the
  // following shapes (for the given value of axis):
  //    (axis == 0 == -4) 100; 100x3; 100x3x40; 100x3x40x60
  //    (axis == 1 == -3)          3;     3x40;     3x40x60
  //    (axis == 2 == -2)                   40;       40x60
  //    (axis == 3 == -1)                                60
  // Furthermore, bottom[1] may have the empty shape (regardless of the value of
  // "axis") -- a scalar multiplier.
  optional int32 axis = 1 [default = 1];

  // (num_axes is ignored unless just one bottom is given and the scale is
  // a learned parameter of the layer.  Otherwise, num_axes is determined by the
  // number of axes by the second bottom.)
  // The number of axes of the input (bottom[0]) covered by the scale
  // parameter, or -1 to cover all axes of bottom[0] starting from `axis`.
  // Set num_axes := 0, to multiply with a zero-axis Blob: a scalar.
  optional int32 num_axes = 2 [default = 1];

  // (filler is ignored unless just one bottom is given and the scale is
  // a learned parameter of the layer.)
  // The initialization for the learned scale parameter.
  // Default is the unit (1) initialization, resulting in the ScaleLayer
  // initially performing the identity operation.
  optional FillerParameter filler = 3;

  // Whether to also learn a bias (equivalent to a ScaleLayer+BiasLayer, but
  // may be more efficient).  Initialized with bias_filler (defaults to 0).
  optional bool bias_term = 4 [default = false];
  optional FillerParameter bias_filler = 5;
}

///////////////////////////////////////////
//简化
optional int32 axis [default = 1] ; 默认的处理维度
optional int32 num_axes [default = 1] ; //在BN中可以忽略,主要决定第二个bottom
optional FillerParameter filler ; //初始alpha和beta的填充方式。
optional FillerParameter bias_filler;
optional bool bias_term = 4 [default = false]; //是否学习bias,若不学习,则简化为 y = alpha*x

你可能感兴趣的:(深度学习框架,caffe,Caffe源码学习)