Pytorch迁移学习实战:基于vgg11做MNIST数据集的分类

在对一个新的数据集进行分类时,经常使用一些预训练好的模型,如AlexNet,VGG,Resnet等,并且将网络的最后几层fc进行修改,以适应新数据集的类别个数。

这样做的可行性在于,深度神经网络通过预训练得到数据分层级的特征,比较shallow的层学习到的是一些低层级的语义特征,如边缘信息、颜色、纹理等等,这些特征在不同的分类任务中大多是不变的;最后几层学习到的是一些高层的特征,这样是不同数据集的区别所在,因此通常对VGG,Resnet的最后几层进行修改,利用预训练好的模型进行微调,以在新的数据集上进行分类。

本文利用pytorch中自带的VGG11模型进行迁移学习,使得模型在MNIST数据集上进行分类。首先可以打印看一下vgg11模型的具体结构,模型概述如下:

VGG(
(features): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
(2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(3): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): ReLU(inplace=True)
(5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(6): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(7): ReLU(inplace=True)
(8): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(9): ReLU(inplace=True)
(10): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(11): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(12): ReLU(inplace=True)
(13): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(14): ReLU(inplace=True)
(15): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(16): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(17): ReLU(inplace=True)
(18): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(19): ReLU(inplace=True)
(20): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
(classifier): Sequential(
(0): Linear(in_features=25088, out_features=4096, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.5, inplace=False)
(3): Linear(in_features=4096, out_features=4096, bias=True)
(4): ReLU(inplace=True)
(5): Dropout(p=0.5, inplace=False)
(6): Linear(in_features=4096, out_features=1000, bias=True)
)
)

可见vgg11最后的输出是1000维向量,如果要满足MNIST数据集,需要将其改为10维的输出。目前我了解到的模型改法有两种:其一是对VGG某一层进行修改,如改变Linear的维度,或Conv2d的通道数;第二种是在VGG的基础上进行layer的添加,如在全连接层前面再加一个Conv+Relu+MaxPool的block。

一、在layer的基础上进行修改

这种做法相对简单。如果仅仅是对已有的layer进行改造,比如将最后的全连接层的神经元个数修改一下,并且最终得到的输出为10维,训练代码如下(下面的代码中将calssifier的(0),(3),(6)三个Linear层的神经元个数进行修改,并修改了输入的卷积层维度):

import torch
from torchvision import models
from torchvision import datasets
from torchvision import transforms
import torch.nn as nn
from torch.optim import Adam

epoch = 1
lr = 0.0001
batch_size = 64

def build_model():
    vgg11 = models.vgg11(pretrained=True)
    #MNIST为单通道,而原模型输入为3通道,这里仅对卷积层输入维度做出改变
    vgg11.features[0] = nn.Conv2d(1, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    in_channel = 25088
    out_channel = [4096, 1024, 10]
    #修改最后三个fc层
    for i in range(7):
        if i % 3 == 0:
            vgg11.classifier[i] = nn.Linear(in_channel, out_channel[int(i/3)])
            in_channel = out_channel[int(i/3)]

    return vgg11

model = build_model()

transform = transforms.Compose([
    transforms.Resize(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5], std=[0.5])
])

train_data = datasets.MNIST(root='./data/', train=True, transform=transform, download=True)
test_data = datasets.MNIST(root='./data/', train=False, transform=transform, download=True)
train_loader = torch.utils.data.DataLoader(
    dataset=train_data, batch_size=batch_size, shuffle=True
)
test_loader = torch.utils.data.DataLoader(
    dataset=test_data, batch_size=1, shuffle=True
)
optim = Adam(model.parameters(), lr=lr, betas=[0.5, 0.99])
criterion = nn.CrossEntropyLoss()

for epoch in range(epoch):
    for i, (img, y) in enumerate(train_loader):
        print(i)
        pred_y = model(img)
        loss = criterion(pred_y, y)
        optim.zero_grad()
        loss.backward()
        optim.step()
        if i % 1 == 0:
            print('epoch:[{}/1] loss:{}'.format(epoch, loss))

torch.save(model.state_dict(), './vgg_mnist.pth')

训练过程只迭代了一个epoch,得到的模型在测试集上跑得到了97.55%的分类准确率。

以上代码的训练过程中,无论是预训练好的layer还是新改变的全连接层和输入卷积层,都会受梯度下降的影响改变参数。如果仅仅想对新修改的layer进行训练,而冻结前面预训练好的layer,可以在上述代码中定义vgg11的后面加入下述代码:

    for param in vgg11.parameters():
        param.require_grad = False

这样没有改变过的layer的参数就不会更新,保持pre-trained好的参数;训练过程中仅更新修改过的layer的参数,即最后的三层fc和输入卷积层。

二、改变模型的结构

这种做法相对复杂。首先要定义一个和VGG源码中一样的class,然后在这个模型类中对模型的结构(forward方法)进行修改。

并且这是不能直接加载预训练参数:在第一种方法里,首先创建了一个与源码vgg11一样的模型,并且在调用vgg11时对原始模型已经加载了预训练参数,layer的修改在参数的加载之后,因此不存在维度不一致(layer改动)或layer不存在(layer删除)的现象。而在这一种方法中,由于重新定义了一个模型class,所以在加载预训练参数前,模型结构可能已经发生了改变,所以需要筛选出需要加载的参数。

如何判断pre-trained中哪些参数有用?首先从模型文件中,得到原始vgg11的模型数据字典,记为pretrained_dict;在实例化新定义的模型model之后,得到model这个对象的state_dict,记作model_dict。随后就是判断一下pretrained_dict中的key是否在model_dict中出现,如果没出现,则表明这个key对应的layer在新模型中消失了,因此不加载这层数据;若存在这个key,但是对应的value的维度与新模型中该层对应的维度不一致,说明这一个layer在新模型中修改过,因此这层pre-trained的数据也不能加载。

训练代码如下(写的比较混乱,大佬轻喷~):

import sys
sys.path.append('/anaconda3/lib/python3.7/site-packages/torch/')

import torch
from torchvision import models
from torchvision import datasets
from torchvision import transforms
import torch.nn as nn
from torch.optim import Adam
from hub import load_state_dict_from_url

epoch = 1
lr = 0.0001
batch_size = 64


model_urls = {
    'vgg11': 'https://download.pytorch.org/models/vgg11-bbd30ac9.pth',
}

class VGG(nn.Module):

    def __init__(self, features, extra, num_classes=10, init_weights=True):
        super(VGG, self).__init__()
        self.features = features
        self.extra = extra
        self.avgpool = nn.AdaptiveAvgPool2d((7, 7))
        self.classifier = nn.Sequential(
            nn.Linear(512 * 7 * 7, 4096),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(4096, num_classes),
        )
        if init_weights:
            self._initialize_weights()

    def forward(self, x):
        x = self.features(x)
        #将额外增加的layer接入计算图做传播
        x = self.extra(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.constant_(m.bias, 0)


def make_layers(cfg, batch_norm=False):
    layers = []
    in_channels = 1
    for v in cfg:
        if v == 'M':
            layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
        else:
            conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1)
            if batch_norm:
                layers += [conv2d, nn.BatchNorm2d(v), nn.ReLU(inplace=True)]
            else:
                layers += [conv2d, nn.ReLU(inplace=True)]
            in_channels = v
    return nn.Sequential(*layers)


#在原始VGG11全连接层前加入两层Conv两层Relu和一层Maxpool
def extra_layers(cfg):
    layer = []
    in_channel = 512
    extra1 = nn.Conv2d(in_channel, cfg[0], kernel_size=3, padding=1)
    extra2 = nn.Conv2d(cfg[0], cfg[1], kernel_size=3, padding=1)
    extra3 = nn.MaxPool2d(kernel_size=2, stride=2)
    layer += [extra1, nn.ReLU(inplace=True)]
    layer += [extra2, nn.ReLU(inplace=True)]
    layer += [extra3]
    return nn.Sequential(*layer)


cfgs = {
    'A': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
}

cfgs_extra = {
    'A': [512, 512],
}


def _vgg(arch, cfg, pretrained, progress, **kwargs):
    if pretrained:
        kwargs['init_weights'] = False
    model = VGG(make_layers(cfgs[cfg]), extra_layers(cfgs_extra[cfg]), **kwargs)

    return model


#将不变的layer参数加载进去,筛除掉已经改动过的layer预训练参数
def transfer_model(model, arch, progress=True):
    pretrained_dict = load_state_dict_from_url(model_urls[arch],
                                              progress=progress)
    model_dict = model.state_dict()
    pretrained_dict = transfer_state_dict(pretrained_dict, model_dict)
    model_dict.update(pretrained_dict)
    model.load_state_dict(model_dict)
    return model


#筛除掉pre-train参数中已经改变的layer,只保留现在model中存有layer及其对应pre-trained参数
def transfer_state_dict(pretrained_dict, model_dict):
    state_dict = {}
    for k, v in pretrained_dict.items():
        if k in model_dict.keys() and v.shape == model_dict[k].shape:
            # print(k, v.shape)
            state_dict[k] = v
        else:
            print("Missing key(s) in state_dict :{}".format(k))

    return state_dict


def vgg11(pretrained=False, progress=True, **kwargs):
    r"""VGG 11-layer model (configuration "A") from
    `"Very Deep Convolutional Networks For Large-Scale Image Recognition" `_

    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    return _vgg('vgg11', 'A', pretrained, progress, **kwargs)


def build_model():
    vgg = vgg11(pretrained=True)
    vgg = transfer_model(vgg, 'vgg11')

    return vgg


model = build_model()

transform = transforms.Compose([
    transforms.Resize(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5], std=[0.5])
])

train_data = datasets.MNIST(root='./data/', train=True, transform=transform, download=True)
test_data = datasets.MNIST(root='./data/', train=False, transform=transform, download=True)
train_loader = torch.utils.data.DataLoader(
    dataset=train_data, batch_size=batch_size, shuffle=True
)
test_loader = torch.utils.data.DataLoader(
    dataset=test_data, batch_size=1, shuffle=True
)

optim = Adam(model.parameters(), lr=lr, betas=[0.5, 0.99])
criterion = nn.CrossEntropyLoss()

for epoch in range(epoch):
    for i, (img, y) in enumerate(train_loader):
        print(i)
        pred_y = model(img)
        loss = criterion(pred_y, y)
        optim.zero_grad()
        loss.backward()
        optim.step()
        if i % 1 == 0:
            print('epoch:[{}/1] loss:{}'.format(epoch, loss))

torch.save(model.state_dict(), './vgg_mnist.pth')

测试集上的结果凉凉,这或许验证了为什么迁移学习一般改动最后的fc层的原因,但这可以作为改变预训练好模型结构的一种方法。

你可能感兴趣的:(Pytorch,深度学习,python)