Pytorch 默认参数初始化

Pytorch 默认参数初始化

代码参考自pytorch

pytorch中的各种参数层(Linear、Conv2d、BatchNorm等)在__init__方法中定义后,不需要手动初始化就可以直接使用,这是因为Pytorch对这些层都会进行默认初始化,因此,本文主要根据源码来了解一下不同层的默认初始化方法

初始化函数

def kaiming_uniform_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu'):
    fan = _calculate_correct_fan(tensor, mode)
    gain = calculate_gain(nonlinearity, a)
    std = gain / math.sqrt(fan)
    bound = math.sqrt(3.0) * std  # Calculate uniform bounds from standard deviation
    with torch.no_grad():
        return tensor.uniform_(-bound, bound)

kaiming_uniform_按照均匀分布初始化tensor,在 U ( − b o u n d , b o u n d ) U(-bound,bound) U(bound,bound)中采样,其中
bound = 6 ( 1 + a 2 ) × fan_in \text{bound} = \sqrt{\frac{6}{(1 + a^2) \times \text{fan\_in}}} bound=(1+a2)×fan_in6
在二维的时候,fan_in就是tensor.size(1),即输入向量的维数

def kaiming_normal_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu'):
    fan = _calculate_correct_fan(tensor, mode)
    gain = calculate_gain(nonlinearity, a)
    std = gain / math.sqrt(fan)
    with torch.no_grad():
        return tensor.normal_(0, std)

kaiming_normal_ N ( 0 , std ) \mathcal{N}(0, \text{std}) N(0,std)中采样来初始化tensor,其中
std = 2 ( 1 + a 2 ) × fan_in \text{std} = \sqrt{\frac{2}{(1 + a^2) \times \text{fan\_in}}} std=(1+a2)×fan_in2
同样的,fan_in在tensor为二维时,是tensor.size(1)注意,上面给出的初始化公式均是在modenonlinearity在默认参数下的结果

Linear的初始化

Linear自带的初始化函数为

def reset_parameters(self):
        init.kaiming_uniform_(self.weight, a=math.sqrt(5))
        if self.bias is not None:
            fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
            bound = 1 / math.sqrt(fan_in)
            init.uniform_(self.bias, -bound, bound)

W在 U ( − b o u n d , b o u n d ) U(-bound,bound) U(bound,bound)中采样,其中
bound = 1 fan_in \text{bound} = \sqrt{\frac{1}{\text{fan\_in}}} bound=fan_in1
fan_in即为W的第二维大小,即Linear所作用的输入向量的维度
bias也在 U ( − b o u n d , b o u n d ) U(-bound,bound) U(bound,bound)中采样,且bound与W一样

Conv的初始化

以二维为例,卷积层的参数实际上是一个四维tensor

if transposed:
    self.weight = Parameter(torch.Tensor(
        in_channels, out_channels // groups, *kernel_size))
else:
    self.weight = Parameter(torch.Tensor(
        out_channels, in_channels // groups, *kernel_size))
if bias:
    self.bias = Parameter(torch.Tensor(out_channels))
else:
    self.register_parameter('bias', None)

比如一个输入channel为3,输出channel为64,kernel size=3的卷积层,其权值即为一个3×64×3×3的向量,它会这样进行初始化

def reset_parameters(self):
    init.kaiming_uniform_(self.weight, a=math.sqrt(5))
    if self.bias is not None:
        fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
        bound = 1 / math.sqrt(fan_in)
        init.uniform_(self.bias, -bound, bound)

同样默认使用kaiming_uniform,在 U ( − b o u n d , b o u n d ) U(-bound,bound) U(bound,bound)中采样,其中
bound = 1 fan_in \text{bound} = \sqrt{\frac{1}{\text{fan\_in}}} bound=fan_in1
对于fan_in的计算:

num_input_fmaps = tensor.size(1)
num_output_fmaps = tensor.size(0)
receptive_field_size = 1
if tensor.dim() > 2:
    receptive_field_size = tensor[0][0].numel()
fan_in = num_input_fmaps * receptive_field_size
fan_out = num_output_fmaps * receptive_field_size

也就是
fan_in = in_channels × kernel_size 2 \text{fan\_in} = \text{in\_channels}\times \text{kernel\_size}^2 fan_in=in_channels×kernel_size2

BatchNorm层初始化

def reset_parameters(self):
    self.reset_running_stats()
    if self.affine:
        init.uniform_(self.weight)
        init.zeros_(self.bias)

weigth初始化为 U ( 0 , 1 ) U(0,1) U(0,1),bias初始化为0

网络初始化

在各种内置的网络模型中,初始化的方法也有不同

ResNet

resnet在定义各层之后,pytorch官方代码的__init__方法会对不同的层进行手动的初始化

for m in self.modules():
    if isinstance(m, nn.Conv2d):
        nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
    elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
        nn.init.constant_(m.weight, 1)
        nn.init.constant_(m.bias, 0)

首先对于所有卷积层,其参数与卷积层默认初始化 U ( − bound , bound ) , bound = 1 in_channels × kernel_size 2 U(-\text{bound},\text{bound}),\text{bound}=\sqrt{\dfrac{1}{\text{in\_channels}\times \text{kernel\_size}^2}} U(bound,bound),bound=in_channels×kernel_size21 不同,这里采用的modefan_outnonlinearityrelu,且使用的初始化函数为kaiming_normal_,即参数在 N ( 0 , s t d ) N(0,std) N(0,std)中采样,其中
s t d = 2 f_out std =\sqrt{\dfrac{2}{\text{f\_out}}} std=f_out2
f_out = out_channels × kernel_size 2 \text{f\_out}=\text{out\_channels}\times \text{kernel\_size}^2 f_out=out_channels×kernel_size2
卷积层的bias这里没有提到,因此采用的仍然是默认的初始化方法,而BatchNorm和GroupNorm的weight均初始化为1,bias初始化为0,区别于默认的weight在0~1中均匀采样,bias为0,剩下的Linear层未被提到,仍然采用默认的初始化方法

VGG

VGG的pytorch官方初始化方法如下

def _initialize_weights(self):
    for m in self.modules():
        if isinstance(m, nn.Conv2d):
            nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            if m.bias is not None:
                nn.init.constant_(m.bias, 0)
        elif isinstance(m, nn.BatchNorm2d):
            nn.init.constant_(m.weight, 1)
            nn.init.constant_(m.bias, 0)
        elif isinstance(m, nn.Linear):
            nn.init.normal_(m.weight, 0, 0.01)
            nn.init.constant_(m.bias, 0)

卷积层的初始化方法同ResNet,只不过bias初始化为0,BatchNorm层初始化方法同ResNet,Linear层的weight初始化为 N ( 0 , 0.01 ) N(0,0.01) N(0,0.01)bias初始化为0

你可能感兴趣的:(pytorch)