PyTorch 中 model 的各种方法总结:
首先定义网络模型 Net:
import torch
import torch.nn as nn
class Net(nn.Module):
def __init__(self, num_class=10):
super().__init__()
self.backbone = nn.Sequential(
nn.Conv2d(in_channels=3, out_channels=6, kernel_size=3),
nn.ReLU(inplace=True),
nn.BatchNorm2d(6),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(in_channels=6, out_channels=12, kernel_size=3),
nn.ReLU(inplace=True),
nn.BatchNorm2d(12),
nn.MaxPool2d(kernel_size=2, stride=2)
)
self.classifier = nn.Sequential(
nn.Linear(9*8*8, 128),
nn.ReLU(inplace=True),
nn.Dropout(),
nn.Linear(128, num_class)
)
def forward(self, x):
output = self.backbone(x)
output = output.view(output.size()[0], -1)
output = self.classifier(output)
return output
model = Net()
网络 Net 本身是一个 nn.Module 的子类,包含了 backbone 和 classifier 两个由 Sequential 容器组成的 nn.Module 子类,backbone 和 classifier 各自又包含一些网络层,这些网络层也都属于 nn.Module 子类,所以从外到内共有三级:
model 各种方法的返回值:
model.modules()
>>> <generator object Module.modules at 0x7fb381953740>
model.named_modules()
>>> <generator object Module.named_modules at 0x7fb3819537b0>
model.children()
>>> <generator object Module.children at 0x7fb381953ac0>
model.named_children()
>>> <generator object Module.named_children at 0x7fb3819539e0>
model.parameters()
>>> <generator object Module.parameters at 0x7fb381953f90>
model.named_parameters()
>>> <generator object Module.named_parameters at 0x7fb3818b95f0>
model.state_dict()
>>>
OrderedDict([('backbone.0.weight', tensor([[[[ 0.1200, -0.1627, -0.0841],
[-0.1369, -0.1525, 0.0541],
[ 0.1203, 0.0564, 0.0908]],
……
除了 model.state_dict() 返回值为一个有序字典,其他方法的返回值都是一个生成器,通过 for 循环将内容保存在一个列表里:
model_modules = [m for m in model.modules()]
model_named_modules = [m for m in model.named_modules()]
model_children = [m for m in model.children()]
model_named_children = [m for m in model.named_children()]
model_parameters = [m for m in model.parameters()]
model_named_parameters = [m for m in model.named_parameters()]
model.modules() 迭代遍历模型的 所有子层,子层是指继承了 nn.Module 类的层。
定义的网络模型 Net 中,Net() 本身,backbone()、classifier() 以及二者包含的所有的 layer 都继承了 nn.Module 类,因此会被迭代遍历,且遍历方式符合深度优先遍历。比如对 Net 使用 .modules() 方法,会按照如下顺序遍历:Net --> backbone --> backbone layer --> classifier --> classifier layer。
而 model.named_modules() 就是 带有 layer name 的 model.modules(),也就是它在 model.modules() 的基础上,还返回这些 layer 的名字,返回的每个元素是一个 tuple,tuple 都一个元素是 layer 名称,第二个元素才是 layer 本身。除了在 model 定义时有明确命名的 backbone 和 classifier,其他 layer 都是按照 PyTorch 内部规则自动命名的。
>>> model_modules
>>> len(model_modules) # 15
>>> model_named_modules
>>> len(model_named_modules) # 15
##########################
## output model_modules ##
##########################
[Net(
(backbone): Sequential(
(0): Conv2d(3, 6, kernel_size=(3, 3), stride=(1, 1))
(1): ReLU(inplace=True)
(2): BatchNorm2d(6, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(4): Conv2d(6, 12, kernel_size=(3, 3), stride=(1, 1))
(5): ReLU(inplace=True)
(6): BatchNorm2d(12, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(7): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(classifier): Sequential(
(0): Linear(in_features=576, out_features=128, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.5, inplace=False)
(3): Linear(in_features=128, out_features=10, bias=True)
)
),
Sequential(
(0): Conv2d(3, 6, kernel_size=(3, 3), stride=(1, 1))
(1): ReLU(inplace=True)
(2): BatchNorm2d(6, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(4): Conv2d(6, 12, kernel_size=(3, 3), stride=(1, 1))
(5): ReLU(inplace=True)
(6): BatchNorm2d(12, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(7): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
),
Conv2d(3, 6, kernel_size=(3, 3), stride=(1, 1)),
ReLU(inplace=True),
BatchNorm2d(6, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False),
Conv2d(6, 12, kernel_size=(3, 3), stride=(1, 1)),
ReLU(inplace=True),
BatchNorm2d(12, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False),
Sequential(
(0): Linear(in_features=576, out_features=128, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.5, inplace=False)
(3): Linear(in_features=128, out_features=10, bias=True)
),
Linear(in_features=576, out_features=128, bias=True),
ReLU(inplace=True),
Dropout(p=0.5, inplace=False),
Linear(in_features=128, out_features=10, bias=True)]
################################
## output model_named_modules ##
################################
[('',
Net(
(backbone): Sequential(
(0): Conv2d(3, 6, kernel_size=(3, 3), stride=(1, 1))
(1): ReLU(inplace=True)
(2): BatchNorm2d(6, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(4): Conv2d(6, 12, kernel_size=(3, 3), stride=(1, 1))
(5): ReLU(inplace=True)
(6): BatchNorm2d(12, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(7): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(classifier): Sequential(
(0): Linear(in_features=576, out_features=128, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.5, inplace=False)
(3): Linear(in_features=128, out_features=10, bias=True)
)
)),
('backbone',
Sequential(
(0): Conv2d(3, 6, kernel_size=(3, 3), stride=(1, 1))
(1): ReLU(inplace=True)
(2): BatchNorm2d(6, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(4): Conv2d(6, 12, kernel_size=(3, 3), stride=(1, 1))
(5): ReLU(inplace=True)
(6): BatchNorm2d(12, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(7): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)),
('backbone.0', Conv2d(3, 6, kernel_size=(3, 3), stride=(1, 1))),
('backbone.1', ReLU(inplace=True)),
('backbone.2', BatchNorm2d(6, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)),
('backbone.3', MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)),
('backbone.4', Conv2d(6, 12, kernel_size=(3, 3), stride=(1, 1))),
('backbone.5', ReLU(inplace=True)),
('backbone.6', BatchNorm2d(12, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)),
('backbone.7', MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)),
('classifier',
Sequential(
(0): Linear(in_features=576, out_features=128, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.5, inplace=False)
(3): Linear(in_features=128, out_features=10, bias=True)
)),
('classifier.0', Linear(in_features=576, out_features=128, bias=True)),
('classifier.1', ReLU(inplace=True)),
('classifier.2', Dropout(p=0.5, inplace=False)),
('classifier.3', Linear(in_features=128, out_features=10, bias=True))]
基于 model.modules() 和 model.named_modules() 方法,都能够修改特定的层。
1)基于 model.modules(),可使用 isinstance() 函数挑选特定层进行处理:
for layer in model.modules():
if isinstance(layer, nn.Conv2d):
<对layer进行处理>
2)基于 model.named_modules(),如果在模型定义时给每个 layer 定义了 name,比如卷积层都是 conv1,conv2…,就可以这样处理:
for name, layer in model.named_modules():
if 'conv' in name:
<对layer进行处理>
前面说过,Net 可以分为三级,分别是 1)Net,2)Net 的子网络层 backbone/classifier,3)backbone/classifier 的子网络层 conv、relu、batchnorm 等。
model.modules() 会遍历 model 的所有子层,也包括所有子层的子层。举个不严谨的例子,就是会遍历树形结构从 root 到 leaf 的所有节点。在上面的例子里,会遍历三级结构的每一个元素。
model.children() 只会获取 model 第二层 网络结构,比如在上面的例子里,只会获取 backbone 和 classifier,既没有 Net,也没有 backbone/classifier 的子层。model.named_children() 和前面同理,就是带有 layer name 的 model.children()。
>>> model_children
>>> len(model_children) # 2
>>> model_named_children
>>> len(model_named_children) # 2
###########################
## output model_children ##
###########################
[Sequential(
(0): Conv2d(3, 6, kernel_size=(3, 3), stride=(1, 1))
(1): ReLU(inplace=True)
(2): BatchNorm2d(6, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(4): Conv2d(6, 12, kernel_size=(3, 3), stride=(1, 1))
(5): ReLU(inplace=True)
(6): BatchNorm2d(12, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(7): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
),
Sequential(
(0): Linear(in_features=576, out_features=128, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.5, inplace=False)
(3): Linear(in_features=128, out_features=10, bias=True)
)]
#################################
## output model_named_children ##
#################################
[('backbone',
Sequential(
(0): Conv2d(3, 6, kernel_size=(3, 3), stride=(1, 1))
(1): ReLU(inplace=True)
(2): BatchNorm2d(6, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(4): Conv2d(6, 12, kernel_size=(3, 3), stride=(1, 1))
(5): ReLU(inplace=True)
(6): BatchNorm2d(12, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(7): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)),
('classifier',
Sequential(
(0): Linear(in_features=576, out_features=128, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.5, inplace=False)
(3): Linear(in_features=128, out_features=10, bias=True)
))]
model.parameters() 迭代地返回 模型所有可学习参数,有些 layer 不含有可学习参数(比如 relu、maxpool),因此 model.parameters() 不会输出这些层。
相应地,model.named_parameters() 就是带有 layer name 的 model.parameters(),每个 tuple 打包了两个元素,分别是 layer name 和 layer param。layer name 的后缀 .weight
和 .bias
用于区分权重和偏置。
>>> model_parameters
>>> len(model_parameters) # 12
>>> model_named_parameters
>>> len(model_named_parameters) # 12
#############################
## output model_parameters ##
#############################
[Parameter containing:
tensor([[[[ 0.1871, 0.0998, -0.1136],
[ 0.0104, 0.1804, 0.0761],
[ 0.0331, 0.0311, 0.0843]],
...
[[ 0.0990, 0.0997, 0.0398],
[ 0.1182, -0.0016, 0.1722],
[-0.1830, 0.0451, 0.0737]]]], requires_grad=True),
Parameter containing:
tensor([ 0.0617, 0.1688, -0.0237, -0.1017, 0.0201, 0.0849],
requires_grad=True),
Parameter containing:
tensor([1., 1., 1., 1., 1., 1.], requires_grad=True),
Parameter containing:
tensor([0., 0., 0., 0., 0., 0.], requires_grad=True),
Parameter containing:
tensor([[[[ 0.1346, 0.0129, -0.1315],
[-0.0621, -0.1044, 0.0091],
[-0.0638, -0.0477, -0.0327]],
...
[[-0.1067, -0.1073, 0.1203],
[-0.1091, -0.0542, -0.0008],
[ 0.0517, 0.0297, 0.1107]]]], requires_grad=True),
Parameter containing:
tensor([-0.0344, 0.1320, 0.0165, 0.0100, 0.0784, -0.0792, 0.0044, 0.0419,
0.0234, -0.0159, -0.0053, -0.1342], requires_grad=True),
Parameter containing:
tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], requires_grad=True),
Parameter containing:
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], requires_grad=True),
Parameter containing:
tensor([[ 0.0234, 0.0177, 0.0184, ..., -0.0057, -0.0067, 0.0385],
[-0.0355, 0.0340, -0.0090, ..., 0.0243, 0.0241, -0.0264],
[-0.0203, 0.0172, -0.0239, ..., -0.0279, -0.0294, -0.0038],
...,
[ 0.0090, -0.0009, 0.0363, ..., 0.0019, -0.0086, -0.0304],
[ 0.0032, 0.0007, 0.0056, ..., -0.0060, 0.0083, -0.0253],
[-0.0102, 0.0276, 0.0365, ..., 0.0016, 0.0248, 0.0273]],
requires_grad=True),
Parameter containing:
tensor([-0.0288, -0.0141, 0.0232, -0.0006, -0.0001, 0.0311, -0.0135, -0.0219,
...
-0.0285, -0.0271, -0.0366, 0.0029, -0.0206, 0.0245, -0.0203, -0.0208],
requires_grad=True),
Parameter containing:
tensor([[ 0.0330, -0.0665, -0.0036, ..., -0.0092, 0.0171, 0.0699],
[ 0.0871, -0.0311, 0.0330, ..., 0.0013, -0.0871, 0.0667],
[-0.0146, -0.0383, -0.0370, ..., 0.0261, 0.0599, 0.0240],
...,
[ 0.0058, -0.0125, -0.0157, ..., -0.0055, -0.0823, -0.0664],
[-0.0488, 0.0545, -0.0859, ..., -0.0786, -0.0524, -0.0451],
[ 0.0201, -0.0197, -0.0538, ..., -0.0369, -0.0202, -0.0865]],
requires_grad=True),
Parameter containing:
tensor([ 0.0654, 0.0766, -0.0597, -0.0595, -0.0724, -0.0484, 0.0121, -0.0212,
0.0234, -0.0146], requires_grad=True)]
###################################
## output model_named_parameters ##
###################################
# 这里的数据和上面完全相同,简洁起见只print所有层的name
for k, v in model.named_parameters():
print(k)
backbone.0.weight
backbone.0.bias
backbone.2.weight
backbone.2.bias
backbone.4.weight
backbone.4.bias
backbone.6.weight
backbone.6.bias
classifier.0.weight
classifier.0.bias
classifier.3.weight
classifier.3.bias
model.state_dict() 能够获取 模型中的所有参数,包括可学习参数和不可学习参数,其返回值是一个有序字典 OrderedDict。
从例子中可以看出,model.state_dict() 获取了 model 中所有的可学习参数(weight、bias),同时还获取了不可学习参数(BN layer 的 running mean 和 running var 等)。可以将 model.state_dict() 看作是在 model.parameters() 功能的基础上,又额外获取了所有不可学习参数。
OrderedDict([('backbone.0.weight',
tensor([[[[ 0.1796, 0.0621, 0.1027],
[-0.0723, -0.0971, 0.0218],
[-0.0835, -0.0479, 0.0305]],
...
[[-0.0544, -0.1858, 0.1559],
[-0.0589, 0.0146, -0.1285],
[-0.1033, 0.0743, 0.1137]]]])),
('backbone.0.bias',
tensor([ 0.0202, 0.1326, 0.0124, -0.1895, -0.1094, -0.1045])),
('backbone.2.weight', tensor([1., 1., 1., 1., 1., 1.])),
('backbone.2.bias', tensor([0., 0., 0., 0., 0., 0.])),
('backbone.2.running_mean', tensor([0., 0., 0., 0., 0., 0.])),
('backbone.2.running_var', tensor([1., 1., 1., 1., 1., 1.])),
('backbone.2.num_batches_tracked', tensor(0)),
('backbone.4.weight',
tensor([[[[ 1.3451e-01, -7.3591e-02, -1.0690e-01],
[-5.4909e-02, -3.3993e-02, 3.3203e-02],
[-6.4427e-02, 1.2523e-01, -3.7897e-02]],
...
[[-1.0125e-01, 1.7249e-02, -6.3623e-02],
[ 4.0353e-02, -7.0894e-02, 6.0606e-03],
[ 6.2089e-02, 8.5485e-02, 1.0689e-01]]]])),
('backbone.4.bias',
tensor([ 0.0999, -0.1271, 0.0010, 0.1151, -0.1221, 0.0144, 0.1088, 0.1214,
-0.0175, -0.1071, 0.0937, -0.0058])),
('backbone.6.weight',
tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])),
('backbone.6.bias',
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])),
('backbone.6.running_mean',
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])),
('backbone.6.running_var',
tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])),
('backbone.6.num_batches_tracked', tensor(0)),
('classifier.0.weight',
tensor([[ 0.0359, 0.0245, 0.0020, ..., 0.0282, -0.0255, -0.0319],
[ 0.0020, 0.0196, 0.0011, ..., -0.0412, 0.0179, 0.0288],
[ 0.0251, -0.0245, 0.0152, ..., 0.0136, 0.0084, -0.0052],
...,
[ 0.0235, -0.0100, -0.0348, ..., 0.0160, -0.0249, -0.0007],
[-0.0385, 0.0202, -0.0359, ..., 0.0367, 0.0155, -0.0367],
[ 0.0092, 0.0375, -0.0229, ..., -0.0322, -0.0065, 0.0008]])),
('classifier.0.bias',
tensor([ 3.7528e-02, -2.4906e-02, -3.0417e-02, -2.9277e-02, 3.8544e-02,
...
-1.4599e-02, 3.6207e-02, 1.8414e-02])),
('classifier.3.weight',
tensor([[-0.0793, -0.0080, 0.0755, ..., 0.0225, 0.0632, 0.0223],
[-0.0861, -0.0295, 0.0301, ..., -0.0664, -0.0458, 0.0044],
[-0.0646, 0.0225, -0.0640, ..., -0.0004, 0.0289, -0.0165],
...,
[-0.0760, -0.0517, -0.0625, ..., 0.0393, -0.0475, -0.0070],
[ 0.0558, -0.0860, -0.0813, ..., -0.0578, -0.0843, -0.0303],
[-0.0077, 0.0227, 0.0247, ..., -0.0424, 0.0134, -0.0196]])),
('classifier.3.bias',
tensor([-0.0307, 0.0848, 0.0686, 0.0819, 0.0455, 0.0711, 0.0073, 0.0117,
0.0293, 0.0431]))])
1. 返回值类型不同
model.parameters() 返回的是一个生成器 generator object,而 model.state_dict() 返回的是有序列表 OrderedDict。
model.parameters()
>>> <generator object Module.parameters at 0x7fb381953f90>
model.state_dict()
>>>
OrderedDict([('backbone.0.weight', tensor([[[[ 0.1200, -0.1627, -0.0841],
[-0.1369, -0.1525, 0.0541],
[ 0.1203, 0.0564, 0.0908]],
...
2. 存储的模型参数种类不同
为了直观展示区别,这里使用 model.named_parameters() 与 model.parameters() 做比较:
model.parameters() 获取了模型中所有可学习的参数,而 model.state_dict() 在 model.parameters() 功能的基础上,又额外获取了所有不可学习参数(BN layer 的 running mean 和 running var 等)。
model_state_dict = model.state_dict()
model_named_parameters = model.named_parameters()
for k,v in model_named_parameters:
print(k)
for k in model_state_dict:
print(k)
###################################
## output model_named_parameters ##
###################################
backbone.0.weight
backbone.0.bias
backbone.2.weight
backbone.2.bias
backbone.4.weight
backbone.4.bias
backbone.6.weight
backbone.6.bias
classifier.0.weight
classifier.0.bias
classifier.3.weight
classifier.3.bias
#############################
## output model_state_dict ##
#############################
backbone.0.weight
backbone.0.bias
backbone.2.weight
backbone.2.bias
backbone.2.running_mean
backbone.2.running_var
backbone.2.num_batches_tracked
backbone.4.weight
backbone.4.bias
backbone.6.weight
backbone.6.bias
backbone.6.running_mean
backbone.6.running_var
backbone.6.num_batches_tracked
classifier.0.weight
classifier.0.bias
classifier.3.weight
classifier.3.bias