在pytorch中,定义一个层(继承自Module),它在实例化的时候会自动初始化。但是有的场景是已知参数值的一个大概范围,那我们就可以将参数值初始化为我们想要的值,使得收敛更加快速!
pytorch中提供了多种初始化函数:
torch.nn.init.constant(tensor, val)
torch.nn.init.constant_(tensor, val)
前者是对没有梯度的纯tensor进行初始化赋值constant(tensor.data, 0)
后者是对有梯度的tensor进行赋值constat_(weight, 0)
将tensor
中的每个元素初始化为val
torch.nn,init.normal(tensor, mean=0, std=1)
torch.nn,init.normal_(tensor, mean=0, std=1)
将tensor
中初始化为正态分布中的值,tensor
中的值整体体现为正态分布
torch.nn.init.xavier_uniform(tensor, gain=1)
torch.nn.init.xavier_uniform_(tensor, gain=1)
将tensor
中初始化为均值分布,tensor
整体体现为均值分布
更多查看官网:https://pytorch.org/docs/stable/nn.init.html#torch-nn-init
直接对Layer进行初始化
linear = torch.nn.Linear(10, 10)
torch.nn.init.constant(linear.weight.data, 0)
# 或者
torch.nn.init.constant_(linear.weight, 0)
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.layer1 = nn.Linear(3, 5)
self.layer2 = nn.Linear(5, 2)
def forward(self, x):
return self.layer2(self.layer1(x))
def weight_init(model):
# 剔除模型树种的节点的初始化
if model.__class__.__name__ != "Model":
nn.init.constant_(model.weight, 1)
nn.init.constant_(model.bias, 0)
model = Model()
print("之前:")
for layer in model.children():
print(layer.weight, layer.bias)
model.apply(weight_init)
print("之后")
for layer in model.children():
print(layer.weight, layer.bias)
模型的构造相当于一颗树,Model是树根。它的children
是它的子节点。是递归调用的(深度优先),applay
对每个层进行weight_init
调用。
所以迟早会经过根节点,因为我自定义的模型根节点Model
中没有weight
所以需要对Model
进行去除:if model.__class__.__name__ != "Model":
。假如模型中有类似于Sigmoid
这些层也需要去除,因为它们也没有weight
结果:
之前:
Parameter containing:
tensor([[-0.5532, -0.2544, 0.4763],
[ 0.1327, -0.3355, 0.1664],
[-0.3940, -0.1186, -0.1992],
[-0.0346, -0.3184, -0.4851],
[-0.1112, 0.0715, -0.0815]], requires_grad=True) Parameter containing:
tensor([ 0.1434, 0.5613, 0.0764, 0.4066, -0.1172], requires_grad=True)
Parameter containing:
tensor([[ 0.4156, 0.4196, -0.1659, 0.3462, 0.2828],
[ 0.3337, 0.1536, -0.4318, -0.1299, -0.4243]], requires_grad=True) Parameter containing:
tensor([ 0.3845, -0.0840], requires_grad=True)
之后
Parameter containing:
tensor([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]], requires_grad=True) Parameter containing:
tensor([0., 0., 0., 0., 0.], requires_grad=True)
Parameter containing:
tensor([[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.]], requires_grad=True) Parameter containing:
tensor([0., 0.], requires_grad=True)
上述的模型初始化使用的是递归方式的,如果模型的深度特别深的话会造成递归深度过深的错误。
所以我们使用 在构建的模型中的__ init __中实例化的时候就将模型初始化。这种方法类似于将子模型组件化,在其实例化的时候就进行初始化。或者将子模型整个初始化(包括子模型的子模型) 这样就可以防止递归过深了
class Model(nn.Module):
def __init__(self, in_dim, n_hidden_1, n_hidden_2, out_dim):
super().__init__()
self.layer = nn.Sequential(
nn.Linear(in_dim, n_hidden_1),
nn.ReLU(True),
nn.Linear(n_hidden_1, n_hidden_2),
nn.ReLU(True),
nn.Linear(n_hidden_2, out_dim)
)
# 将Model这个子模型中的所有子模型进行初始化
# 这样局部初始化后,全局模型就初始化了
for m in self.modules():
if isinstance(m, nn.Linear):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, -100)
# 也可以判断是否为conv2d,使用相应的初始化方式
elif isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out',nonlinearity='relu')
elif isinstance(m, nn.BatchNorm2d):
nn.init.constant_(m.weight.item(), 1)
nn.init.constant_(m.bias.item(), 0)
def forward(self, x):
x = self.layer(x)
return x
class Residual(nn.Module):
def __init__(self, input_dim):
super(Residual, self).__init__()
# 残差块中使用全连接层为例
self.linear1 = nn.Linear(input_dim, 64)
self.linear2 = nn.Linear(64, input_dim)
self.sigmoid1 = nn.ReLU()
self.sigmoid2 = nn.ReLU()
def forward(self, x):
x_hat = self.sigmoid1(self.linear1(x))
y_hat = self.sigmoid2(self.linear2(x_hat))
# 加法实现残差
y = y_hat + x
return y
class Model(nn.Module):
def __init__(self, input_dim):
super(Model, self).__init__()
self.linear = nn.Linear(input_dim, 128)
self.residual = Residual(128)
def forward(self, x):
y = self.linear(x)
print("残差块的输入:", y)
y = self.residual(y)
return y
# 初始化模型的所有参数:将Model中的全连接层权重设为1,偏置设为0。其余各层中的权重、偏置都是0
def init_weight(model):
for layer in model.children():
if layer.__class__.__name__ != "Residual":
nn.init.constant(layer.weight.data, 1)
nn.init.constant(layer.bias.data, 0)
else:
for res_layer in layer.children():
if res_layer.__class__.__name__ == "Linear":
nn.init.constant(res_layer.weight.data, 0)
nn.init.constant(res_layer.bias.data, 0)
features = torch.ones(10, dtype=torch.float)
print(features)
model = Model(10)
init_weight(model)
print("=================================参数 ==================================")
print(model.linear.weight, model.linear.bias)
print(model.residual.linear1.weight, model.residual.linear1.weight)
print(model.residual.linear2.weight, model.residual.linear2.weight)
print("========================================================================")