Batch Normalization详解

Batch Normalization详解

  • 1 理论基础
  • 2 深度学习框架中momentum参数理解
    • 2.1 Keras
    • 2.2 Pytorch

1 理论基础

参考: Batch Normalization 学习笔记
参考: Batch Normalization详解和momentum参数理解

来自“文章 Batch Normalization详解和momentum参数理解”

Keras 现在的Batch Normalization里有一个momentum参数, 该参数作用于mean和variance的计算上, 这里保留了历史batch里的mean和variance值,即 moving_mean和moving_variance, 借鉴优化算法里的momentum算法将历史batch里的mean和variance的作用延续到当前batch. 一般momentum的值为0.9 , 0.99等. 多个batch后, 即多个0.9连乘后,最早的batch的影响会变弱.

2 深度学习框架中momentum参数理解

2.1 Keras

来自: Keras API reference / Layers API / Normalization layers / BatchNormalization layer

tf.keras.layers.BatchNormalization(
    axis=-1,
    momentum=0.99,
    epsilon=0.001,
    center=True,
    scale=True,
    beta_initializer="zeros",
    gamma_initializer="ones",
    moving_mean_initializer="zeros",
    moving_variance_initializer="ones",
    beta_regularizer=None,
    gamma_regularizer=None,
    beta_constraint=None,
    gamma_constraint=None,
    **kwargs

self.moving_mean and self.moving_var are non-trainable variables that are updated each time the layer in called in training mode, as such:
moving_mean = moving_mean * momentum + mean(batch) * (1 - momentum)
moving_var = moving_var * momentum + var(batch) * (1 - momentum)

即: moving_mean = moving_mean * 0.99 + mean(batch) * (1 - 0.99)

2.2 Pytorch

来自: Docs > torch.nn > BatchNorm2d

CLASStorch.nn.BatchNorm2d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True, device=None, dtype=None)

Also by default, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a default momentum of 0.1.

If track_running_stats is set to False, this layer then does not keep running estimates, and batch statistics are instead used during evaluation time as well.

Batch Normalization详解_第1张图片
即: moving_mean = (1 - 0.1) * moving_mean + 0.1 * mean(batch)

你可能感兴趣的:(深度学习,pytorch,batch,深度学习,神经网络)