PyTorch学习笔记(15)权值初始化

权值初始化

梯度消失 梯度爆炸

两个相互独立的随机变量乘积的期望 等于 他们各自期望的乘积

 1.  E ( X ∗ Y ) = E ( X ) ∗ E ( Y ) \text { 1. } \mathrm{E}(\boldsymbol{X} * \boldsymbol{Y})=\boldsymbol{E}(\boldsymbol{X}) * \boldsymbol{E}(\boldsymbol{Y})  1. E(XY)=E(X)E(Y)
方差的公式

 2.  D ( X ) = E ( X 2 ) − [ E ( X ) ] 2 \text { 2. } D(X)=E\left(X^{2}\right)-[E(X)]^{2}  2. D(X)=E(X2)[E(X)]2

两个相互独立的随机变量之和的方差 等于 他们各自方差的和
 3.  D ( X + Y ) = D ( X ) + D ( Y ) \text { 3. } \mathrm{D}(X+Y)=D(X)+D(Y)  3. D(X+Y)=D(X)+D(Y)
由1.2.3.式可得
1.2.3 ⇒ D ( X + Y ) = D ( X ) ∗ D ( Y ) + D ( X ) ∗ [ E ( Y ) ] 2 + D ( Y ) ∗ [ E ( X ) ] 2 1.2 .3 \Rightarrow \mathrm{D}(\mathrm{X}+\mathrm{Y})=\mathrm{D}(\mathrm{X}) * \mathrm{D}(\mathrm{Y})+\mathrm{D}(\mathrm{X}) *[E(\mathrm{Y})]^{2}+\mathrm{D}(\mathrm{Y}) *[E(X)]^{2} 1.2.3D(X+Y)=D(X)D(Y)+D(X)[E(Y)]2+D(Y)[E(X)]2

E ( X ) = 0 , E ( Y ) = 0 D ( X ∗ Y ) = D ( X ) ∗ D ( Y ) \begin{aligned} &E(X)=0, E(Y)=0\\ &\mathrm{D}(\mathrm{X} * \mathrm{Y})=\mathrm{D}(\mathrm{X}) * \mathrm{D}(\mathrm{Y}) \end{aligned} E(X)=0,E(Y)=0D(XY)=D(X)D(Y)

Xavier初始化

方差一致性:保持数据尺度维持在恰当范围,通常方差为1
激活函数:饱和函数 如Sigmoid Tanh
n i ∗ D ( W ) = 1 n i + 1 ∗ D ( W ) = 1 ⇒ D ( W ) = 2 n i + n i + 1 \begin{aligned} &\boldsymbol{n}_{i} * \boldsymbol{D}(\boldsymbol{W})=\mathbf{1}\\ &\boldsymbol{n}_{\boldsymbol{i}+\mathbf{1}} * \boldsymbol{D}(\boldsymbol{W})=\mathbf{1} \end{aligned} \Rightarrow D(W)=\frac{2}{n_{i}+n_{i+1}} niD(W)=1ni+1D(W)=1D(W)=ni+ni+12

Kaiming初始化
方差一致性:保持数据尺度维持在恰当范围,通常方差为1
激活函数:ReLU及其变种

D ( W ) = 2 n i \mathrm{D}(W)=\frac{2}{n_{i}} D(W)=ni2

D ( W ) = 2 ( 1 + a 2 ) ∗ n i \mathrm{D}(W)=\frac{2}{\left(1+\mathrm{a}^{2}\right) * n_{i}} D(W)=(1+a2)ni2

# -*- coding: utf-8 -*-

import os
import torch
import random
import numpy as np
import torch.nn as nn
from tools.common_tools import set_seed

set_seed(1)  # 设置随机种子


class MLP(nn.Module):
    # 构造100层的线性叠加 不考虑偏置
    def __init__(self, neural_num, layers):
        super(MLP, self).__init__()
        self.linears = nn.ModuleList([nn.Linear(neural_num, neural_num, bias=False) for i in range(layers)])
        self.neural_num = neural_num

    def forward(self, x):
        for (i, linear) in enumerate(self.linears):
            x = linear(x)
            x = torch.relu(x)

            print("layer:{}, std:{}".format(i, x.std()))
            if torch.isnan(x.std()):
                print("output is nan in {} layers".format(i))
                break

        return x

    def initialize(self):
        for m in self.modules():
            if isinstance(m, nn.Linear):
                # 判断是否是线性层 若是 对权值进行初始化 采用标准正态分布,0均值 1标准差 的分布
                nn.init.normal_(m.weight.data)
                # nn.init.normal_(m.weight.data, std=np.sqrt(1/self.neural_num))    # normal: mean=0, std=1

                # a = np.sqrt(6 / (self.neural_num + self.neural_num))
                #
                # tanh_gain = nn.init.calculate_gain('tanh')
                # a *= tanh_gain
                #
                # nn.init.uniform_(m.weight.data, -a, a)

                # nn.init.xavier_uniform_(m.weight.data, gain=tanh_gain)

                # nn.init.normal_(m.weight.data, std=np.sqrt(2 / self.neural_num))
                # nn.init.kaiming_normal_(m.weight.data)

# flag = 0
flag = 1

if flag:
    layer_nums = 100
    neural_nums = 256
    batch_size = 16

    net = MLP(neural_nums, layer_nums)
    net.initialize()

    inputs = torch.randn((batch_size, neural_nums))  # normal: mean=0, std=1

    output = net(inputs)
    print(output)

# ======================================= calculate gain =======================================

flag = 0
# flag = 1

if flag:

    x = torch.randn(10000)
    out = torch.tanh(x)

    gain = x.std() / out.std()
    print('gain:{}'.format(gain))

    tanh_gain = nn.init.calculate_gain('tanh')
    print('tanh_gain in PyTorch:', tanh_gain)

十种初始化方法

Xavier均匀分布
Xavier标准正态分布
Kaiming均匀分布
Kaiming标准正态分布
均匀分布
正态分布
常数分布
正交矩阵初始化
单位矩阵初始化
稀疏矩阵初始化

nn.init.calculate_gain

主要功能:计算激活函数的方差变化尺度
主要参数
nonlinearity 激活函数名称
param 激活函数的参数 如Leaky ReLU的negative_slop

你可能感兴趣的:(PyTorch学习笔记)