在对一个新的数据集进行分类时,经常使用一些预训练好的模型,如AlexNet,VGG,Resnet等,并且将网络的最后几层fc进行修改,以适应新数据集的类别个数。
这样做的可行性在于,深度神经网络通过预训练得到数据分层级的特征,比较shallow的层学习到的是一些低层级的语义特征,如边缘信息、颜色、纹理等等,这些特征在不同的分类任务中大多是不变的;最后几层学习到的是一些高层的特征,这样是不同数据集的区别所在,因此通常对VGG,Resnet的最后几层进行修改,利用预训练好的模型进行微调,以在新的数据集上进行分类。
本文利用pytorch中自带的VGG11模型进行迁移学习,使得模型在MNIST数据集上进行分类。首先可以打印看一下vgg11模型的具体结构,模型概述如下:
VGG(
(features): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
(2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(3): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): ReLU(inplace=True)
(5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(6): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(7): ReLU(inplace=True)
(8): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(9): ReLU(inplace=True)
(10): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(11): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(12): ReLU(inplace=True)
(13): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(14): ReLU(inplace=True)
(15): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(16): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(17): ReLU(inplace=True)
(18): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(19): ReLU(inplace=True)
(20): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
(classifier): Sequential(
(0): Linear(in_features=25088, out_features=4096, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.5, inplace=False)
(3): Linear(in_features=4096, out_features=4096, bias=True)
(4): ReLU(inplace=True)
(5): Dropout(p=0.5, inplace=False)
(6): Linear(in_features=4096, out_features=1000, bias=True)
)
)
可见vgg11最后的输出是1000维向量,如果要满足MNIST数据集,需要将其改为10维的输出。目前我了解到的模型改法有两种:其一是对VGG某一层进行修改,如改变Linear的维度,或Conv2d的通道数;第二种是在VGG的基础上进行layer的添加,如在全连接层前面再加一个Conv+Relu+MaxPool的block。
这种做法相对简单。如果仅仅是对已有的layer进行改造,比如将最后的全连接层的神经元个数修改一下,并且最终得到的输出为10维,训练代码如下(下面的代码中将calssifier的(0),(3),(6)三个Linear层的神经元个数进行修改,并修改了输入的卷积层维度):
import torch
from torchvision import models
from torchvision import datasets
from torchvision import transforms
import torch.nn as nn
from torch.optim import Adam
epoch = 1
lr = 0.0001
batch_size = 64
def build_model():
vgg11 = models.vgg11(pretrained=True)
#MNIST为单通道,而原模型输入为3通道,这里仅对卷积层输入维度做出改变
vgg11.features[0] = nn.Conv2d(1, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
in_channel = 25088
out_channel = [4096, 1024, 10]
#修改最后三个fc层
for i in range(7):
if i % 3 == 0:
vgg11.classifier[i] = nn.Linear(in_channel, out_channel[int(i/3)])
in_channel = out_channel[int(i/3)]
return vgg11
model = build_model()
transform = transforms.Compose([
transforms.Resize(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.5], std=[0.5])
])
train_data = datasets.MNIST(root='./data/', train=True, transform=transform, download=True)
test_data = datasets.MNIST(root='./data/', train=False, transform=transform, download=True)
train_loader = torch.utils.data.DataLoader(
dataset=train_data, batch_size=batch_size, shuffle=True
)
test_loader = torch.utils.data.DataLoader(
dataset=test_data, batch_size=1, shuffle=True
)
optim = Adam(model.parameters(), lr=lr, betas=[0.5, 0.99])
criterion = nn.CrossEntropyLoss()
for epoch in range(epoch):
for i, (img, y) in enumerate(train_loader):
print(i)
pred_y = model(img)
loss = criterion(pred_y, y)
optim.zero_grad()
loss.backward()
optim.step()
if i % 1 == 0:
print('epoch:[{}/1] loss:{}'.format(epoch, loss))
torch.save(model.state_dict(), './vgg_mnist.pth')
训练过程只迭代了一个epoch,得到的模型在测试集上跑得到了97.55%的分类准确率。
以上代码的训练过程中,无论是预训练好的layer还是新改变的全连接层和输入卷积层,都会受梯度下降的影响改变参数。如果仅仅想对新修改的layer进行训练,而冻结前面预训练好的layer,可以在上述代码中定义vgg11的后面加入下述代码:
for param in vgg11.parameters():
param.require_grad = False
这样没有改变过的layer的参数就不会更新,保持pre-trained好的参数;训练过程中仅更新修改过的layer的参数,即最后的三层fc和输入卷积层。
这种做法相对复杂。首先要定义一个和VGG源码中一样的class,然后在这个模型类中对模型的结构(forward方法)进行修改。
并且这是不能直接加载预训练参数:在第一种方法里,首先创建了一个与源码vgg11一样的模型,并且在调用vgg11时对原始模型已经加载了预训练参数,layer的修改在参数的加载之后,因此不存在维度不一致(layer改动)或layer不存在(layer删除)的现象。而在这一种方法中,由于重新定义了一个模型class,所以在加载预训练参数前,模型结构可能已经发生了改变,所以需要筛选出需要加载的参数。
如何判断pre-trained中哪些参数有用?首先从模型文件中,得到原始vgg11的模型数据字典,记为pretrained_dict;在实例化新定义的模型model之后,得到model这个对象的state_dict,记作model_dict。随后就是判断一下pretrained_dict中的key是否在model_dict中出现,如果没出现,则表明这个key对应的layer在新模型中消失了,因此不加载这层数据;若存在这个key,但是对应的value的维度与新模型中该层对应的维度不一致,说明这一个layer在新模型中修改过,因此这层pre-trained的数据也不能加载。
训练代码如下(写的比较混乱,大佬轻喷~):
import sys
sys.path.append('/anaconda3/lib/python3.7/site-packages/torch/')
import torch
from torchvision import models
from torchvision import datasets
from torchvision import transforms
import torch.nn as nn
from torch.optim import Adam
from hub import load_state_dict_from_url
epoch = 1
lr = 0.0001
batch_size = 64
model_urls = {
'vgg11': 'https://download.pytorch.org/models/vgg11-bbd30ac9.pth',
}
class VGG(nn.Module):
def __init__(self, features, extra, num_classes=10, init_weights=True):
super(VGG, self).__init__()
self.features = features
self.extra = extra
self.avgpool = nn.AdaptiveAvgPool2d((7, 7))
self.classifier = nn.Sequential(
nn.Linear(512 * 7 * 7, 4096),
nn.ReLU(True),
nn.Dropout(),
nn.Linear(4096, 4096),
nn.ReLU(True),
nn.Dropout(),
nn.Linear(4096, num_classes),
)
if init_weights:
self._initialize_weights()
def forward(self, x):
x = self.features(x)
#将额外增加的layer接入计算图做传播
x = self.extra(x)
x = self.avgpool(x)
x = torch.flatten(x, 1)
x = self.classifier(x)
return x
def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
if m.bias is not None:
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.BatchNorm2d):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.Linear):
nn.init.normal_(m.weight, 0, 0.01)
nn.init.constant_(m.bias, 0)
def make_layers(cfg, batch_norm=False):
layers = []
in_channels = 1
for v in cfg:
if v == 'M':
layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
else:
conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1)
if batch_norm:
layers += [conv2d, nn.BatchNorm2d(v), nn.ReLU(inplace=True)]
else:
layers += [conv2d, nn.ReLU(inplace=True)]
in_channels = v
return nn.Sequential(*layers)
#在原始VGG11全连接层前加入两层Conv两层Relu和一层Maxpool
def extra_layers(cfg):
layer = []
in_channel = 512
extra1 = nn.Conv2d(in_channel, cfg[0], kernel_size=3, padding=1)
extra2 = nn.Conv2d(cfg[0], cfg[1], kernel_size=3, padding=1)
extra3 = nn.MaxPool2d(kernel_size=2, stride=2)
layer += [extra1, nn.ReLU(inplace=True)]
layer += [extra2, nn.ReLU(inplace=True)]
layer += [extra3]
return nn.Sequential(*layer)
cfgs = {
'A': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
}
cfgs_extra = {
'A': [512, 512],
}
def _vgg(arch, cfg, pretrained, progress, **kwargs):
if pretrained:
kwargs['init_weights'] = False
model = VGG(make_layers(cfgs[cfg]), extra_layers(cfgs_extra[cfg]), **kwargs)
return model
#将不变的layer参数加载进去,筛除掉已经改动过的layer预训练参数
def transfer_model(model, arch, progress=True):
pretrained_dict = load_state_dict_from_url(model_urls[arch],
progress=progress)
model_dict = model.state_dict()
pretrained_dict = transfer_state_dict(pretrained_dict, model_dict)
model_dict.update(pretrained_dict)
model.load_state_dict(model_dict)
return model
#筛除掉pre-train参数中已经改变的layer,只保留现在model中存有layer及其对应pre-trained参数
def transfer_state_dict(pretrained_dict, model_dict):
state_dict = {}
for k, v in pretrained_dict.items():
if k in model_dict.keys() and v.shape == model_dict[k].shape:
# print(k, v.shape)
state_dict[k] = v
else:
print("Missing key(s) in state_dict :{}".format(k))
return state_dict
def vgg11(pretrained=False, progress=True, **kwargs):
r"""VGG 11-layer model (configuration "A") from
`"Very Deep Convolutional Networks For Large-Scale Image Recognition" `_
Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
progress (bool): If True, displays a progress bar of the download to stderr
"""
return _vgg('vgg11', 'A', pretrained, progress, **kwargs)
def build_model():
vgg = vgg11(pretrained=True)
vgg = transfer_model(vgg, 'vgg11')
return vgg
model = build_model()
transform = transforms.Compose([
transforms.Resize(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.5], std=[0.5])
])
train_data = datasets.MNIST(root='./data/', train=True, transform=transform, download=True)
test_data = datasets.MNIST(root='./data/', train=False, transform=transform, download=True)
train_loader = torch.utils.data.DataLoader(
dataset=train_data, batch_size=batch_size, shuffle=True
)
test_loader = torch.utils.data.DataLoader(
dataset=test_data, batch_size=1, shuffle=True
)
optim = Adam(model.parameters(), lr=lr, betas=[0.5, 0.99])
criterion = nn.CrossEntropyLoss()
for epoch in range(epoch):
for i, (img, y) in enumerate(train_loader):
print(i)
pred_y = model(img)
loss = criterion(pred_y, y)
optim.zero_grad()
loss.backward()
optim.step()
if i % 1 == 0:
print('epoch:[{}/1] loss:{}'.format(epoch, loss))
torch.save(model.state_dict(), './vgg_mnist.pth')
测试集上的结果凉凉,这或许验证了为什么迁移学习一般改动最后的fc层的原因,但这可以作为改变预训练好模型结构的一种方法。