1、简介
神经网络的训练过程中的参数学习是基于梯度下降法进行优化的。梯度下降法需要在开始训练时给每一个参数赋一个初始值。这个初始值的选取十分关键。一般我们希望数据和参数的均值都为 0,输入和输出数据的方差一致。在实际应用中,参数服从高斯分布
或者均匀分布
都是比较有效的初始化方式。
深度学习模型参数初始化的方法
(1)Gaussian
满足mean=0,std=1的高斯分布x∼N(mean,std2)
(2)Xavier
满足x∼U(−a,+a)x∼U(−a,+a)的均匀分布, 其中 a = sqrt(3/n)
(3)MSRA
满足x∼N(0,σ2)x∼N(0,σ2)的高斯分布,其中σ = sqrt(2/n)
(4)Uniform
满足min=0,max=1的均匀分布。x∼U(min,max)x∼U(min,max)
vgg16模型模型为例子
vgg16模型代码
import math
import torch
import torch.nn as nn
from torch.autograd import Variable
__all__ = ['vgg']
defaultcfg = {
11 : [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512],
13 : [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512],
16 : [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512],
19 : [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512],
}
class vgg(nn.Module):
def __init__(self, dataset='cifar10', depth=19, init_weights=True, cfg=None):
super(vgg, self).__init__()
if cfg is None:
cfg = defaultcfg[depth]
self.cfg = cfg
self.feature = self.make_layers(cfg, True)
if dataset == 'cifar10':
num_classes = 10
elif dataset == 'cifar100':
num_classes = 100
self.classifier = nn.Sequential(
nn.Linear(cfg[-1], 512),
nn.BatchNorm1d(512),
nn.ReLU(inplace=True),
nn.Linear(512, num_classes)
)
if init_weights:
self._initialize_weights()
def make_layers(self, cfg, batch_norm=False):
layers = []
in_channels = 3
for v in cfg:
if v == 'M':
layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
else:
conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1, bias=False)
if batch_norm:
layers += [conv2d, nn.BatchNorm2d(v), nn.ReLU(inplace=True)]
else:
layers += [conv2d, nn.ReLU(inplace=True)]
in_channels = v
return nn.Sequential(*layers)
def forward(self, x):
x = self.feature(x)
x = nn.AvgPool2d(2)(x)
x = x.view(x.size(0), -1)
y = self.classifier(x)
return y
def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
m.weight.data.normal_(0, math.sqrt(2. / n))
if m.bias is not None:
m.bias.data.zero_()
elif isinstance(m, nn.BatchNorm2d):
m.weight.data.fill_(0.5)
m.bias.data.zero_()
elif isinstance(m, nn.Linear):
m.weight.data.normal_(0, 0.01)
m.bias.data.zero_()
if __name__ == '__main__':
net = vgg()
x = Variable(torch.FloatTensor(16, 3, 40, 40))
y = net(x)
print(y.data.shape)
初始化代码
pytorch文档中给出的VGG源码中使用的初始化方式。可以说只要是CNN,都可以采用该方式进行初始化。
def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
m.weight.data.normal_(0, math.sqrt(2. / n))
if m.bias is not None:
m.bias.data.zero_()
elif isinstance(m, nn.BatchNorm2d):
m.weight.data.fill_(0.5)
m.bias.data.zero_()
elif isinstance(m, nn.Linear):
m.weight.data.normal_(0, 0.01)
m.bias.data.zero_()
初始化的方式,采用的是正太分布
m.weight.data.normal_(0, math.sqrt(2. / n))
(原)模型的参数初始化 - darkknightzh - 博客园
https://www.cnblogs.com/darkknightzh/p/8297793.html
Pytorch 细节记录 - 三年一梦 - 博客园
https://www.cnblogs.com/king-lps/p/8570021.html
深度学习模型参数初始化的方法 - 下路派出所 - 博客园
https://www.cnblogs.com/callyblog/p/9714656.html
深度学习模型参数初始化的方法 - 下路派出所 - 博客园
https://www.cnblogs.com/callyblog/p/9714656.html
PyTorch学习系列(九)——参数_初始化 - CodeTutor - CSDN博客
https://blog.csdn.net/VictoriaW/article/details/72872036
PyTorch常用的初始化和正则 - 简书
https://www.jianshu.com/p/902bb29209ed
Pytorch模型训练(2) - 模型初始化 - Mingx9527 - CSDN博客
https://blog.csdn.net/u011681952/article/details/86579998#11__20
深度学习中的参数初始化 - Man - CSDN博客
https://blog.csdn.net/mzpmzk/article/details/79839047
其他一些待看文章
pytorch加载模型和初始化权重_码神岛
https://msd.misuland.com/pd/2884250137616453910
pytorch中的参数初始化方法总结 - ys1305的博客 - CSDN博客
https://blog.csdn.net/ys1305/article/details/94332007
[PyTorch]PyTorch中模型的参数初始化的几种方法(转) - 向前奔跑的少年 - 博客园
https://www.cnblogs.com/kk17/p/10088301.html#_labelTop
pytorch权重初始化(2) - CS_lcylmh的CSDN博客 - CSDN博客
https://blog.csdn.net/qq_19598705/article/details/80935786
Search results for ‘data.normal_ topic:157’ - PyTorch Forums
https://discuss.pytorch.org/search?q=data.normal_%20topic%3A157
Weight initilzation - PyTorch Forums
https://discuss.pytorch.org/t/weight-initilzation/157