对小批量(mini-batch)3d数据组成的4d输入进行批标准化(Batch Normalization)操作
在每一个小批量(mini-batch)数据中,计算输入各个维度的均值和标准差。gamma与beta是可学习的大小为C的参数向量(C为输入大小)
在训练时,该层计算每次输入的均值与方差,并进行移动平均。移动平均默认的动量值为0.1。
在验证时,训练求得的均值/方差将用于标准化验证数据。
参数:
Shape: - 输入:(N, C)或者(N, C, L) - 输出:(N, C)或者(N,C,L)(输入输出相同)
import torch
import torch.nn as nn
x=torch.Tensor([[1,2,3,4],
[2,3,4,5],
[4,5,6,7]])
#affine参数为False说明没有gamma和beta
m=nn.BatchNorm1d(4,momentum=0.1,affine=False)
y1=m(x)
y2=(x-x.mean(0))/(x.std(0)**2*2/3+1e-5)**(1/2)
#y1和y2是一样的
print(y1)
print(y2)
out:
tensor([[-1.0690, -1.0690, -1.0690, -1.0690],
[-0.2673, -0.2673, -0.2673, -0.2673],
[ 1.3363, 1.3363, 1.3363, 1.3363]])
tensor([[-1.0690, -1.0690, -1.0690, -1.0690],
[-0.2673, -0.2673, -0.2673, -0.2673],
[ 1.3363, 1.3363, 1.3363, 1.3363]])
print(m)
out:
BatchNorm1d(4, eps=1e-05, momentum=0.1, affine=False, track_running_stats=True)
print(m.weight,m.bias)
out:
None None
m=nn.BatchNorm1d(4,momentum=0.1,affine=True)
print(m.weight,m.bias)
#参数的数目和feature的数目是一致的
out:
Parameter containing:
tensor([1., 1., 1., 1.], requires_grad=True)
Parameter containing:
tensor([0., 0., 0., 0.], requires_grad=True)
m.reset_parameters()
print(m.running_mean,m.running_var)
out:
tensor([0., 0., 0., 0.]) tensor([1., 1., 1., 1.])
m(x)
print(m.running_mean,m.running_var)
out:
tensor([0.2333, 0.3333, 0.4333, 0.5333]) tensor([1.1333, 1.1333, 1.1333, 1.1333])
m(x)
print(m.running_mean,m.running_var)
out:
tensor([0.4433, 0.6333, 0.8233, 1.0133]) tensor([1.2533, 1.2533, 1.2533, 1.2533])
m(x)
print(m.running_mean,m.running_var)
out:
tensor([0.6323, 0.9033, 1.1743, 1.4453]) tensor([1.3613, 1.3613, 1.3613, 1.3613])
m.eval()
m(x)
print(m.running_mean,m.running_var)
out:
tensor([0.6323, 0.9033, 1.1743, 1.4453]) tensor([1.3613, 1.3613, 1.3613, 1.3613])
BatchNorm1d在N方向上也就是批方向上计算均值与方差,用BatchNorm1d对象和(x-x.mean(0))/(x.std(0)**2*2/3+1e-5)**(1/2)计算的结果是一样的,x.std(0)**2*2/3的意思是pytorch计算标准差时,底下除以的是N-1。
当affine设置为False时,没有gamma和beta参数,默认为True。
指数加权平均:v_1=m*v1,v_2=(1-m)*v_1+m*v2,v_3=(1-m)*v_2+m*v3。
momentum=0.1意思是采用指数加权平均,即当下一个batch来了时,会和上一个batch的running_mean、running_var做指数加权平均,running_mean、running_var保存当前均值和方差的估计值,即0.2333=0.1*2.333,0.4433=(1-0.1)*0.2333+0.1*2.3333,0.6323=(1-0.1)*0.4433+0.1*2.3333,调用m.reset_parameters()可以重置所有参数。
调用m.eval()或m.train(mode=False)时,BatchNorm层的running_state和权重便不再变化。
Shape: - 输入:(N, C,H, W) - 输出:(N, C, H, W)(输入输出相同)
x=torch.Tensor(range(1,17))
x=x.reshape(2,2,2,2)
print(x)
print(x.shape)
out:
tensor([[[[ 1., 2.],
[ 3., 4.]],
[[ 5., 6.],
[ 7., 8.]]],
[[[ 9., 10.],
[11., 12.]],
[[13., 14.],
[15., 16.]]]])
torch.Size([2, 2, 2, 2])
m=nn.BatchNorm2d(2,affine=False)
y1=m(x)
print(y1)
out:
tensor([[[[-1.3242, -1.0835],
[-0.8427, -0.6019]],
[[-1.3242, -1.0835],
[-0.8427, -0.6019]]],
[[[ 0.6019, 0.8427],
[ 1.0835, 1.3242]],
[[ 0.6019, 0.8427],
[ 1.0835, 1.3242]]]])
其中(1+2+3+4+9+10+11+12)/8=6.5,[1,2,3,4,9,10,11,12]的标准差为4.1533,因此(1-6.5)/4.1533=-1.3242,(2-6.5)/4.1533=-1.0835,(9-6.5)/4.1533=0.60193。