LayerNorm 相比 BatchNorm 有以下两个优点:
更具体介绍参考模型优化之Layer Normalization
y = x − E [ x ] Var [ x ] + ϵ ∗ γ + β y=\frac{x-\mathrm{E}[x]}{\sqrt{\operatorname{Var}[x]+\epsilon}} * \gamma+\beta y=Var[x]+ϵx−E[x]∗γ+β
公式看上去和BN一致,但是这里统计的样本和方差是在同一个样本的不同属性上
import torch
import torch.nn as nn
import numpy as np
import math
def validation(x):
"""
验证函数
:param x:
:return:
"""
x = np.array(x)
avg = np.mean(x, axis=1)
#维度 3*1
std2 = np.var(x, axis=1)
#维度3*1
x_avg = [[item for item in avg] for _ in range(x.shape[1])]
x_std = [[math.pow(item, 1 / 2) for item in std2] for _ in range(x.shape[1])]
x_ = (x - np.array(x_avg).T) / np.array(x_std).T
return x_
x = [[1, 2, 3, 4], [2, 3, 4, 5], [3, 4, 5, 6]]
# 维度:3*4
input = torch.tensor(x, dtype=torch.float)
m = nn.LayerNorm(4)
output = m(input)
print(output)
val = validation(x)
print(val)
结果:
tensor([[-1.3416, -0.4472, 0.4472, 1.3416],
[-1.3416, -0.4472, 0.4472, 1.3416],
[-1.3416, -0.4472, 0.4472, 1.3416]],
grad_fn=<NativeLayerNormBackward>)
[[-1.34164079 -0.4472136 0.4472136 1.34164079]
[-1.34164079 -0.4472136 0.4472136 1.34164079]
[-1.34164079 -0.4472136 0.4472136 1.34164079]]
normalized_shape 输入为torch.size处理更高的维度
x = [[[1, 1], [2, 2], [3, 3], [4, 4]], [[2, 2], [3, 3], [4, 4], [5, 5]], [[3, 3], [4, 4], [5, 5], [6, 6]]]
input = torch.tensor(x, dtype=torch.float)
normalized_shape = input.size()[1:]
print(normalized_shape)
m = nn.LayerNorm(normalized_shape)
output = m(input)
print(output)
输出:
torch.Size([4, 2])
tensor([[[-1.3416, -1.3416],
[-0.4472, -0.4472],
[ 0.4472, 0.4472],
[ 1.3416, 1.3416]],
[[-1.3416, -1.3416],
[-0.4472, -0.4472],
[ 0.4472, 0.4472],
[ 1.3416, 1.3416]],
[[-1.3416, -1.3416],
[-0.4472, -0.4472],
[ 0.4472, 0.4472],
[ 1.3416, 1.3416]]], grad_fn=<NativeLayerNormBackward>)
相当于两个3*4的矩阵单独处理。