torch.nn.BatchNorm2d(num_features, eps=1e-0.5, momentum=0.1,
affine=True, tracking_running_stats=True, device=None, dtype=None)
num_features
: 输入特征图(B, C, H, W)中C的值,即通道数
eps
: ϵ \epsilon ϵ值,为了让分母不为零
momentum
: 更新策略 x ^ n e w = ( 1 − m o m e n t u m ) × x ^ + m o m e n t u m × x t \hat{x}_{new}=(1-momentum)\times\hat{x}+momentum\times x_t x^new=(1−momentum)×x^+momentum×xt,其中 x ^ \hat{x} x^是统计量, x t x_t xt是观测值
affine
: γ \gamma γ和 β \beta β是否为可学习数值,若不可学习,则为定值(1, 0)
track_running_stats
: 均值和方差的统计是否记录,为了使得学习到的参数符合总体的统计规律,一般设置为True
对一个四维输入(batch, channel, height, width)使用,其中 x x x指的是同一个channel下,不同batch、不同height、width元素,这些元素所组成集合的均值、方差记为E(x)、Var(x),其输出值的计算公式如下:
y = x − E [ x ] V a r [ x ] + ϵ ∗ γ + β y=\frac{x-E[x]}{Var[x]+\epsilon}*\gamma+\beta y=Var[x]+ϵx−E[x]∗γ+β
其中 ϵ \epsilon ϵ和 γ \gamma γ是可学习的参数,在反向传播中学习得到,初始值为(1, 0)
import torch
input = torch.Tensor([[[[1,2,3],
[4,5,6],
[7,8,9]],
[[11,12,13],
[14,15,16],
[17,18,19]]
],
[[[11,13,15],
[17,19,21],
[23,25,27]],
[[21,23,25],
[27,29,31],
[33,35,37]]
]])
m = torch.nn.BatchNorm2d(2)
output = m(input)
print(output)
>>>
tensor([[[[-1.3574, -1.2340, -1.1106],
[-0.9872, -0.8638, -0.7404],
[-0.6170, -0.4936, -0.3702]],
[[-1.3574, -1.2340, -1.1106],
[-0.9872, -0.8638, -0.7404],
[-0.6170, -0.4936, -0.3702]]],
[[[-0.1234, 0.1234, 0.3702],
[ 0.6170, 0.8638, 1.1106],
[ 1.3574, 1.6042, 1.8511]],
[[-0.1234, 0.1234, 0.3702],
[ 0.6170, 0.8638, 1.1106],
[ 1.3574, 1.6042, 1.8511]]]], grad_fn=<NativeBatchNormBackward0>)
torch.nn.LayerNorm(normalized_shape, eps=1e-05, elementwise_affine=True, device=None, dtype=None)
normalized_shape(int or list or torch size)
: 期望输入的尺寸值
eps(float)
: ϵ \epsilon ϵ值,为了求解稳定性
elementwise_affine(bool)
: γ \gamma γ和 β \beta β是否为可学习数值,若不可学习,则为定值(1, 0)
对一个四维输入(batch, channel, height, width)使用,其中 x x x指的是同一个batch内,某一个通道下,所有维度值组成的集合,例如(2,3,4)的句子,输入到layernorm后对2个batch中每个batch 3个通道中每个通道 下的4个值组成的集合;(2,2,3,3)的图片中,2个batch中每个batch 2个通道中每个通道 下的(3,3)所有元素组成的集合
import torch
from torch import nn as nn
# NLP中使用
input = torch.Tensor([[[1,2,3,4],
[5,6,7,8],
[9,10,11,12]],
[[21,22,23,24],
[25,26,27,28],
[29,30,31,32]]])
layer_norm = nn.LayerNorm(4)
layer_norm(input)
'''
tensor([[[-1.3416, -0.4472, 0.4472, 1.3416],
[-1.3416, -0.4472, 0.4472, 1.3416],
[-1.3416, -0.4472, 0.4472, 1.3416]],
[[-1.3416, -0.4472, 0.4472, 1.3416],
[-1.3416, -0.4472, 0.4472, 1.3416],
[-1.3416, -0.4472, 0.4472, 1.3416]]],
grad_fn=)
'''
# 图像处理中使用
input = torch.Tensor([[[[1, 2, 3],
[11, 12, 13],
[21, 22, 23]],
[[51, 52, 53],
[61, 62, 63],
[71, 72, 73]]],
[[[81, 82, 83],
[91, 92, 93],
[101,102,103]],
[[111, 112, 113],
[121, 122, 123],
[131, 132, 133]]]])
layer_norm = nn.LayerNorm([2, 3, 3])
layer_norm(input)
'''
tensor([[[[-1.3682, -1.3302, -1.2922],
[-0.9881, -0.9501, -0.9121],
[-0.6081, -0.5701, -0.5321]],
[[ 0.5321, 0.5701, 0.6081],
[ 0.9121, 0.9501, 0.9881],
[ 1.2922, 1.3302, 1.3682]]],
[[[-1.5207, -1.4622, -1.4037],
[-0.9358, -0.8773, -0.8188],
[-0.3509, -0.2924, -0.2339]],
[[ 0.2339, 0.2924, 0.3509],
[ 0.8188, 0.8773, 0.9358],
[ 1.4037, 1.4622, 1.5207]]]], grad_fn=)
'''