【入门教程】使用预训练模型进行训练、预测(以VGG16为例)

【新手教程】使用预训练模型进行训练、预测(以VGG16为例)

本文参考[csdn博文]( Pytorch学习笔记(I)——预训练模型(一):加载与使用_lockonlxf的博客-CSDN博客_pytorch使用预训练模型),修改了一些小问题

本文环境:win10、torch>=1.6

本文所有相关代码:阿里云盘

1、基础知识

VGG16是一个简单的深度学习模型,可以实现图像的分类。PyTorch的库中有VGG16的模型构架,在torchvision.models中:

VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
	......
    (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
  (classifier): Sequential(
    (0): Linear(in_features=25088, out_features=4096, bias=True)
    (1): ReLU(inplace=True)
    (2): Dropout(p=0.5, inplace=False)
    (3): Linear(in_features=4096, out_features=4096, bias=True)
    (4): ReLU(inplace=True)
    (5): Dropout(p=0.5, inplace=False)
    (6): Linear(in_features=4096, out_features=1000, bias=True)
  )
)

(C,W,H)格式输入,输入RGB图像,通过(features)和(avgpool)得到一个(512,7,7)的特征图,将特征图输入到分类器中,通过线性化等一系列操作输出一个维度为1000的特征向量,对应1000个类别,其值可以简单理解为对应各个类别的可能性,通过值大小来判断图像类别。

2、具体实现

1 模型的加载与修改

本项目例子是猫狗分类,即给一张图片判断是猫片还是狗片,对应只有2个类别,所以需要把VGG分类器的最后一层输出改为2,具体实现为:

model = orchvision.models.vgg16(pretrained=True) # 加载torch原本的vgg16模型,设置pretrained=True,即使用预训练模型
num_fc = model.classifier[6].in_features # 获取最后一层的输入维度
model.classifier[6] = torch.nn.Linear(num_fc, num_cls)# 修改最后一层的输出维度,即分类数
# 对于模型的每个权重,使其不进行反向传播,即固定参数
for param in model.parameters():
    param.requires_grad = False
# 将分类器的最后层输出维度换成了num_cls,这一层需要重新学习
for param in model.classifier[6].parameters():
    param.requires_grad = True

修改完之后可以直接print(model)查看模型结构:

VGG(
  ......
  (classifier): Sequential(
    (0): Linear(in_features=25088, out_features=4096, bias=True)
    (1): ReLU(inplace=True)
    (2): Dropout(p=0.5, inplace=False)
    (3): Linear(in_features=4096, out_features=4096, bias=True)
    (4): ReLU(inplace=True)
    (5): Dropout(p=0.5, inplace=False)
    (6): Linear(in_features=4096, out_features=2, bias=True)
  )
)

可以看到分类器最后的out_features=2

2 模型训练

1 数据准备

本文使用的是torch自带的ImageFolder进行数据读取,需要注意的是:读取的文件夹必须在一个大的子文件下,按类别归好类。示例数据集整理如图:

【入门教程】使用预训练模型进行训练、预测(以VGG16为例)_第1张图片

cat、dog即为类别名称,训练集和测试集都需要保持一样的命名。读取数据代码如下:

def dataload(trainData, testData):
    # 训练数据
    train_data = torchvision.datasets.ImageFolder(trainData, transform=transforms.Compose(
        [
            transforms.Resize(256),
            transforms.CenterCrop(224),
            transforms.ToTensor()
        ]))
    train_loader = DataLoader(train_data, batch_size=20, shuffle=True)

    # 测试数据
    test_data = torchvision.datasets.ImageFolder(testData, transform=transforms.Compose(
        [
            transforms.Resize(256),
            transforms.CenterCrop(224),
            transforms.ToTensor()
        ]))
    test_loader = DataLoader(test_data, batch_size=20, shuffle=True)
    return train_data, test_data, train_loader, test_loader

2 模型训练

代码如下:

def train(model, trainData, testData):
    criterion = torch.nn.CrossEntropyLoss()  # 损失函数
    optimizer = torch.optim.Adam(model.parameters(), lr=0.001)  # 优化器
    train_data, test_data, train_loader, test_loader = dataload(trainData, testData)

    log = []
    # 启动训练
    epoches = 10
    for epoch in range(epoches):
        train_loss = 0.
        train_acc = 0.
        for step, data in enumerate(train_loader):
            batch_x, batch_y = data
            batch_x, batch_y = Variable(batch_x), Variable(batch_y)
            batch_x, batch_y = batch_x.cuda(), batch_y.cuda()  # GPU
            out = model(batch_x)
            loss = criterion(out, batch_y)
            train_loss += loss.item()
            # pred is the expect class
            # batch_y is the true label
            pred = torch.max(out, 1)[1]
            train_correct = (pred == batch_y).sum()
            train_acc += train_correct.item()
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            if step % 100 == 0:
                print('Epoch: ', epoch, 'Step', step,
                      'Train_loss: ', train_loss / ((step + 1) * 20), 'Train acc: ', train_acc / ((step + 1) * 20))

        print('Epoch: ', epoch, 'Train_loss: ', train_loss / len(train_data), 'Train acc: ',
              train_acc / len(train_data))

        # 保存训练过程数据
        info = dict()
        info['Epoch'] = epoch
        info['Train_loss'] = train_loss / len(train_data)
        info['Train_acc'] = train_acc / len(train_data)
        log.append(info)

    # 模型保存
    model_without_ddp = model
    os.chdir('log')
    dir_name = time.strftime('%m-%d-%Hh%Mm')
    os.mkdir(dir_name)
    utils.save_on_master({
        'model': model_without_ddp.state_dict()},
        os.path.join(dir_name, 'model.pth'))
    draw(log, dir_name)
    model.eval()

    os.chdir('../')
    eval_loss = 0
    eval_acc = 0
    for step, data in enumerate(test_loader):
        batch_x, batch_y = data
        batch_x, batch_y = Variable(batch_x), Variable(batch_y)
        batch_x, batch_y = batch_x.cuda(), batch_y.cuda()
        out = model(batch_x)
        loss = criterion(out, batch_y)
        eval_loss += loss.item()
        # pred is the expect class
        # batch_y is the true label
        pred = torch.max(out, 1)[1]
        test_correct = (pred == batch_y).sum()
        eval_acc += test_correct.item()
    print('Test_loss: ', eval_loss / len(test_data), 'Test acc: ', eval_acc / len(test_data))

3 训练指标绘图

def draw(logs: list):
    plt.figure()
    epoch = []
    loss = []
    acc = []
    for log_ in logs:
        epoch.append(log_['Epoch'])
        loss.append(log_['Train_loss'])
        acc.append(log_['Train_acc'])
    plt.plot(epoch, loss, 'r-', label='loss')
    plt.plot(epoch, acc, 'b-', label='accuracy')
    plt.xlabel('epoch')
    plt.legend()
    plt.show()

效果如图:

【入门教程】使用预训练模型进行训练、预测(以VGG16为例)_第2张图片

3 已训练模型使用

调用训练过的模型进行分类任务,整体代码如下:

import torch
import torchvision
from torchvision import transforms
from PIL import Image

# 待预测类别
classes = ['cat', 'dog']


def predict_class(img_path, model):
    img = Image.open(img_path)
    transform = transforms.Compose([transforms.ToTensor()])
    img = transform(img).cuda()
    img = torch.unsqueeze(img, dim=0)
    out = model(img)
    # print('out = ', out)
    pre = torch.max(out, 1)[1]
    cls = classes[pre.item()]
    print('This is {}!'.format(cls))


def model_struct(num_cls):
    mode1_vgg16 = torchvision.models.vgg16(pretrained=True)
    num_fc = mode1_vgg16.classifier[6].in_features
    mode1_vgg16.classifier[6] = torch.nn.Linear(num_fc, num_cls)
    for param in mode1_vgg16.parameters():
        param.requires_grad = False
    for param in mode1_vgg16.classifier[6].parameters():
        param.requires_grad = True
    mode1_vgg16.to('cuda')
    return mode1_vgg16


def main():
    device = torch.device('cuda')
    model = model_struct(2)
    model.to(device)
    model.eval()
    save = torch.load('./log/11-17-20h15m/model.pth') # 希望调用的权重
    model.load_state_dict(save['model'])
    img = 'cat.jpg'
    predict_class(img, model)


if __name__ == '__main__':
    main()

结果显示

本例子直接将类别打印在控制台,可以定义一个阈值来控制输出,评分多少才输出,本文项目只做了2分类,不管啥图片输进去都只会输出cat或者dog,改动不大,读者可自行修改predict.py文件中的代码。

this is cat!

你可能感兴趣的:(pytorch,深度学习,人工智能)