代码参考自pytorch
pytorch中的各种参数层(Linear、Conv2d、BatchNorm等)在__init__
方法中定义后,不需要手动初始化就可以直接使用,这是因为Pytorch对这些层都会进行默认初始化,因此,本文主要根据源码来了解一下不同层的默认初始化方法
def kaiming_uniform_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu'):
fan = _calculate_correct_fan(tensor, mode)
gain = calculate_gain(nonlinearity, a)
std = gain / math.sqrt(fan)
bound = math.sqrt(3.0) * std # Calculate uniform bounds from standard deviation
with torch.no_grad():
return tensor.uniform_(-bound, bound)
kaiming_uniform_
按照均匀分布初始化tensor,在 U ( − b o u n d , b o u n d ) U(-bound,bound) U(−bound,bound)中采样,其中
bound = 6 ( 1 + a 2 ) × fan_in \text{bound} = \sqrt{\frac{6}{(1 + a^2) \times \text{fan\_in}}} bound=(1+a2)×fan_in6
在二维的时候,fan_in
就是tensor.size(1)
,即输入向量的维数
def kaiming_normal_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu'):
fan = _calculate_correct_fan(tensor, mode)
gain = calculate_gain(nonlinearity, a)
std = gain / math.sqrt(fan)
with torch.no_grad():
return tensor.normal_(0, std)
kaiming_normal_
从 N ( 0 , std ) \mathcal{N}(0, \text{std}) N(0,std)中采样来初始化tensor,其中
std = 2 ( 1 + a 2 ) × fan_in \text{std} = \sqrt{\frac{2}{(1 + a^2) \times \text{fan\_in}}} std=(1+a2)×fan_in2
同样的,fan_in
在tensor为二维时,是tensor.size(1)
,注意,上面给出的初始化公式均是在mode
和nonlinearity
在默认参数下的结果
Linear
自带的初始化函数为
def reset_parameters(self):
init.kaiming_uniform_(self.weight, a=math.sqrt(5))
if self.bias is not None:
fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
bound = 1 / math.sqrt(fan_in)
init.uniform_(self.bias, -bound, bound)
W在 U ( − b o u n d , b o u n d ) U(-bound,bound) U(−bound,bound)中采样,其中
bound = 1 fan_in \text{bound} = \sqrt{\frac{1}{\text{fan\_in}}} bound=fan_in1
fan_in
即为W的第二维大小,即Linear所作用的输入向量的维度
bias也在 U ( − b o u n d , b o u n d ) U(-bound,bound) U(−bound,bound)中采样,且bound与W一样
以二维为例,卷积层的参数实际上是一个四维tensor
if transposed:
self.weight = Parameter(torch.Tensor(
in_channels, out_channels // groups, *kernel_size))
else:
self.weight = Parameter(torch.Tensor(
out_channels, in_channels // groups, *kernel_size))
if bias:
self.bias = Parameter(torch.Tensor(out_channels))
else:
self.register_parameter('bias', None)
比如一个输入channel为3,输出channel为64,kernel size=3的卷积层,其权值即为一个3×64×3×3的向量,它会这样进行初始化
def reset_parameters(self):
init.kaiming_uniform_(self.weight, a=math.sqrt(5))
if self.bias is not None:
fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
bound = 1 / math.sqrt(fan_in)
init.uniform_(self.bias, -bound, bound)
同样默认使用kaiming_uniform
,在 U ( − b o u n d , b o u n d ) U(-bound,bound) U(−bound,bound)中采样,其中
bound = 1 fan_in \text{bound} = \sqrt{\frac{1}{\text{fan\_in}}} bound=fan_in1
对于fan_in
的计算:
num_input_fmaps = tensor.size(1)
num_output_fmaps = tensor.size(0)
receptive_field_size = 1
if tensor.dim() > 2:
receptive_field_size = tensor[0][0].numel()
fan_in = num_input_fmaps * receptive_field_size
fan_out = num_output_fmaps * receptive_field_size
也就是
fan_in = in_channels × kernel_size 2 \text{fan\_in} = \text{in\_channels}\times \text{kernel\_size}^2 fan_in=in_channels×kernel_size2
def reset_parameters(self):
self.reset_running_stats()
if self.affine:
init.uniform_(self.weight)
init.zeros_(self.bias)
weigth初始化为 U ( 0 , 1 ) U(0,1) U(0,1),bias初始化为0
在各种内置的网络模型中,初始化的方法也有不同
resnet在定义各层之后,pytorch官方代码的__init__
方法会对不同的层进行手动的初始化
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
首先对于所有卷积层,其参数与卷积层默认初始化 U ( − bound , bound ) , bound = 1 in_channels × kernel_size 2 U(-\text{bound},\text{bound}),\text{bound}=\sqrt{\dfrac{1}{\text{in\_channels}\times \text{kernel\_size}^2}} U(−bound,bound),bound=in_channels×kernel_size21不同,这里采用的mode
是fan_out
,nonlinearity
是relu
,且使用的初始化函数为kaiming_normal_
,即参数在 N ( 0 , s t d ) N(0,std) N(0,std)中采样,其中
s t d = 2 f_out std =\sqrt{\dfrac{2}{\text{f\_out}}} std=f_out2
f_out = out_channels × kernel_size 2 \text{f\_out}=\text{out\_channels}\times \text{kernel\_size}^2 f_out=out_channels×kernel_size2
卷积层的bias这里没有提到,因此采用的仍然是默认的初始化方法,而BatchNorm和GroupNorm的weight均初始化为1,bias初始化为0,区别于默认的weight在0~1中均匀采样,bias为0,剩下的Linear层未被提到,仍然采用默认的初始化方法
VGG的pytorch官方初始化方法如下
def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
if m.bias is not None:
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.BatchNorm2d):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.Linear):
nn.init.normal_(m.weight, 0, 0.01)
nn.init.constant_(m.bias, 0)
卷积层的初始化方法同ResNet,只不过bias初始化为0,BatchNorm层初始化方法同ResNet,Linear层的weight
初始化为 N ( 0 , 0.01 ) N(0,0.01) N(0,0.01),bias
初始化为0