pytorch保存与加载模型

保存与加载模型

在这一部分内容中,我将向您展示如何使用PyTorch保存和加载模型。这很重要,因为您经常需要加载先前训练过的模型,用于进行预测或继续训练新数据。

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import matplotlib.pyplot as plt

import torch
from torch import nn
from torch import optim
import torch.nn.functional as F
from torchvision import datasets, transforms

import helper
import fc_model
# Define a transform to normalize the data
transform = transforms.Compose([transforms.ToTensor(),
                                transforms.Normalize((0.5,), (0.5,))])
# Download and load the training data
trainset = datasets.FashionMNIST('~/.pytorch/F_MNIST_data/', download=True, train=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)

# Download and load the test data
testset = datasets.FashionMNIST('~/.pytorch/F_MNIST_data/', download=True, train=False, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=True)

这里我们查看数据集中的一张图像。

image, label = next(iter(trainloader))
helper.imshow(image[0,:]);

pytorch保存与加载模型_第1张图片

训练神经网络模型

为了使内容更简洁,我将模型结构和训练代码从上次最后部分内容移至fc_model.py的文件中。导入此文件后,我们可以轻松地使用fc_model.Network创建一个全连接网络,并使用fc_model.train训练网络。我将使用此模型(经过训练)来演示如何保存和加载模型。

# Create the network, define the criterion and optimizer

model = fc_model.Network(784, 10, [512, 256, 128])
criterion = nn.NLLLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
fc_model.train(model, trainloader, testloader, criterion, optimizer, epochs=2)
Epoch: 1/2..  Training Loss: 1.714..  Test Loss: 0.977..  Test Accuracy: 0.647
Epoch: 1/2..  Training Loss: 1.025..  Test Loss: 0.737..  Test Accuracy: 0.729
'''省略'''
Epoch: 2/2..  Training Loss: 0.516..  Test Loss: 0.439..  Test Accuracy: 0.841
Epoch: 2/2..  Training Loss: 0.497..  Test Loss: 0.456..  Test Accuracy: 0.836
Epoch: 2/2..  Training Loss: 0.470..  Test Loss: 0.435..  Test Accuracy: 0.845

保存和加载模型

可以想象,每次需要使用模型训练网络都是不切实际的。 相反,我们可以保存经过训练的网络模型,然后再加载它们以进行更多训练或将其用于预测。

PyTorch网络的参数存储在模型的state_dict中。 我们可以看到state_dict包含每个图层的权重和偏差矩阵。

print("Our model: \n\n", model, '\n')
print("The state dict keys: \n\n", model.state_dict().keys())
Our model: 

 Network(
  (hidden_layers): ModuleList(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): Linear(in_features=256, out_features=128, bias=True)
  )
  (output): Linear(in_features=128, out_features=10, bias=True)
  (dropout): Dropout(p=0.5, inplace=False)
) 

The state dict keys: 

 odict_keys(['hidden_layers.0.weight', 'hidden_layers.0.bias', 'hidden_layers.1.weight', 'hidden_layers.1.bias', 'hidden_layers.2.weight', 'hidden_layers.2.bias', 'output.weight', 'output.bias'])

最简单的事情就是用torch.save保存state_dict。 例如,我们可以将其保存到文件checkpoint.pth

torch.save(model.state_dict(), 'checkpoint.pth')

使用 torch.load来加载state_dict

state_dict = torch.load('checkpoint.pth')
print(state_dict.keys())
odict_keys(['hidden_layers.0.weight', 'hidden_layers.0.bias', 'hidden_layers.1.weight', 'hidden_layers.1.bias', 'hidden_layers.2.weight', 'hidden_layers.2.bias', 'output.weight', 'output.bias'])

要将state_dict加载到网络中,请执行model.load_state_dict(state_dict)

model.load_state_dict(state_dict)

看起来似乎很简单,但还是有点复杂的。 只有在模型结构与checkpoint结构完全相同时,才能加载state_dict。 如果我创建模型的网络结构不同,就会加载失败。

# Try this
model = fc_model.Network(784, 10, [400, 200, 100])
# This will throw an error because the tensor sizes are wrong!
model.load_state_dict(state_dict)
---------------------------------------------------------------------------

RuntimeError                              Traceback (most recent call last)

 in 
      2 model = fc_model.Network(784, 10, [400, 200, 100])
      3 # This will throw an error because the tensor sizes are wrong!
----> 4 model.load_state_dict(state_dict)
D:\Anaconda3\envs\AI_study\lib\site-packages\torch\nn\modules\module.py in load_state_dict(self, state_dict, strict)
    843         if len(error_msgs) > 0:
    844             raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
--> 845                                self.__class__.__name__, "\n\t".join(error_msgs)))
    846         return _IncompatibleKeys(missing_keys, unexpected_keys)
    847 
RuntimeError: Error(s) in loading state_dict for Network:
	size mismatch for hidden_layers.0.weight: copying a param with shape torch.Size([512, 784]) from checkpoint, the shape in current model is torch.Size([400, 784]).
	size mismatch for hidden_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([400]).
	size mismatch for hidden_layers.1.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([200, 400]).
	size mismatch for hidden_layers.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([200]).
	size mismatch for hidden_layers.2.weight: copying a param with shape torch.Size([128, 256]) from checkpoint, the shape in current model is torch.Size([100, 200]).
	size mismatch for hidden_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([100]).
	size mismatch for output.weight: copying a param with shape torch.Size([10, 128]) from checkpoint, the shape in current model is torch.Size([10, 100]).

这意味着我们需要完全按照训练时的模型重建模型。 有关模型结构的信息需要与state_dict一起保存在checkpoint中。 为此,您需要构建一个字典,其中包含您需要完全重建模型的所有信息。

checkpoint = {'input_size': 784,
              'output_size': 10,
              'hidden_layers': [each.out_features for each in model.hidden_layers],
              'state_dict': model.state_dict()}

torch.save(checkpoint, 'checkpoint.pth')

现在,checkpoint具有所有必要的信息来重建训练后的模型。 您可以根据需要轻松地将该功能设置为函数。 同样,我们可以编写一个函数来加载checkpoint

def load_checkpoint(filepath):
    checkpoint = torch.load(filepath)
    model = fc_model.Network(checkpoint['input_size'],
                             checkpoint['output_size'],
                             checkpoint['hidden_layers'])
    model.load_state_dict(checkpoint['state_dict'])
    
    return model
model = load_checkpoint('checkpoint.pth')
print(model)
Network(
  (hidden_layers): ModuleList(
    (0): Linear(in_features=784, out_features=400, bias=True)
    (1): Linear(in_features=400, out_features=200, bias=True)
    (2): Linear(in_features=200, out_features=100, bias=True)
  )
  (output): Linear(in_features=100, out_features=10, bias=True)
  (dropout): Dropout(p=0.5, inplace=False)
)

你可能感兴趣的:(pytorch)