Index of /slides/2022一切图片来自官网
目录
什么是迁移学习
为什么要迁移学习
迁移学习方式
案例
导包
定义和初始化模型
微调模型
注意事项
别人在一个非常大的数据集上训练CNN,然后我们直接使用该模型(全部或部分)结构和参数,用于自己的目标任务
最重要的决定因素是新数据集的大小和与原来数据集的相似程度
把网络前面的层当做feature extractor固定住,最后一个全连接层换成新的(最后一层通常是不一样的,因为分类的数量不同),随机初始化该层的模型参数,在⽬标数据集上只训练这一层。这是因为模型前面的卷积层从图片提取到的是更一般的特征(比如边缘或者颜色),对很多识别任务都有用,但是CNN后面的卷积层学习到的特征就更针对于原来的数据集了
输出层将从头开始进行训练,而部分层的参数将根据源模型的参数进行微调。First you can freeze the feature extractor, and train the head. After that, you can unfreeze the feature extractor (or part of it), set the learning rate to something smaller, and continue training.
网络所有层都可以调整,只是用预训练的参数作为初始化参数,训练过程和往常一样。It is common in this method to set the learning rate to a smaller number. This is done because the network is already trained, and only minor changes are required to “finetune” it to a new dataset
在一个小型数据集上微调ResNet模型,用于热狗识别。取自14.2. Fine-Tuning — Dive into Deep Learning 1.0.0-alpha1.post0 documentation (d2l.ai)
%matplotlib inline
import os
import torch
import torchvision
from torch import nn
from d2l import torch as d2l
train_imgs = torchvision.datasets.ImageFolder(os.path.join(data_dir, 'train'))
test_imgs = torchvision.datasets.ImageFolder(os.path.join(data_dir, 'test'))
# 使用RGB通道的均值和标准差,以标准化每个通道
normalize = torchvision.transforms.Normalize(
[0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
train_augs = torchvision.transforms.Compose([
torchvision.transforms.RandomResizedCrop(224),
torchvision.transforms.RandomHorizontalFlip(),
torchvision.transforms.ToTensor(),
normalize])
test_augs = torchvision.transforms.Compose([
torchvision.transforms.Resize(256),
torchvision.transforms.CenterCrop(224),
torchvision.transforms.ToTensor(),
normalize])
在下面的代码中,目标模型finetune_net
中输出层之前的参数被初始化为源模型相应层的模型参数。 由于模型参数是在ImageNet数据集上预训练的,并且足够好,因此通常只需要较小的学习率即可微调这些参数。输出层
的参数是随机初始化的,通常需要更高的学习率。 假设基础
学习率为,我们将输出层
中参数的学习率设置为10。
finetune_net = torchvision.models.resnet18(pretrained=True)
finetune_net.fc = nn.Linear(finetune_net.fc.in_features, 2)
nn.init.xavier_uniform_(finetune_net.fc.weight);
# 如果param_group=True,输出层中的模型参数将使用十倍的学习率
def train_fine_tuning(net, learning_rate, batch_size=128, num_epochs=5,
param_group=True):
train_iter = torch.utils.data.DataLoader(torchvision.datasets.ImageFolder(
os.path.join(data_dir, 'train'), transform=train_augs),
batch_size=batch_size, shuffle=True)
test_iter = torch.utils.data.DataLoader(torchvision.datasets.ImageFolder(
os.path.join(data_dir, 'test'), transform=test_augs),
batch_size=batch_size)
devices = d2l.try_all_gpus()
loss = nn.CrossEntropyLoss(reduction="none")
if param_group:
params_1x = [param for name, param in net.named_parameters()
if name not in ["fc.weight", "fc.bias"]]
trainer = torch.optim.SGD([{'params': params_1x},
{'params': net.fc.parameters(),
'lr': learning_rate * 10}],
lr=learning_rate, weight_decay=0.001)
else:
trainer = torch.optim.SGD(net.parameters(), lr=learning_rate,
weight_decay=0.001)
train(net, train_iter, test_iter, loss, trainer, num_epochs,
devices)
其中用到的训练函数:
def train(net, train_iter, test_iter, loss, trainer, num_epochs,
devices=d2l.try_all_gpus()):
"""Train a model with mutiple GPUs (defined in Chapter 13)."""
timer, num_batches = d2l.Timer(), len(train_iter)
animator = d2l.Animator(xlabel='epoch', xlim=[1, num_epochs], ylim=[0, 1],
legend=['train loss', 'train acc', 'test acc'])
net = nn.DataParallel(net, device_ids=devices).to(devices[0])
for epoch in range(num_epochs):
# Sum of training loss, sum of training accuracy, no. of examples,
# no. of predictions
metric = d2l.Accumulator(4)
for i, (features, labels) in enumerate(train_iter):
timer.start()
l, acc = train_batch(net, features, labels, loss, trainer,
devices)
metric.add(l, acc, labels.shape[0], labels.numel())
timer.stop()
if (i + 1) % (num_batches // 5) == 0 or i == num_batches - 1:
animator.add(
epoch + (i + 1) / num_batches,
(metric[0] / metric[2], metric[1] / metric[3], None))
test_acc = d2l.evaluate_accuracy_gpu(net, test_iter)
animator.add(epoch + 1, (None, None, test_acc))
print(f'loss {metric[0] / metric[2]:.3f}, train acc '
f'{metric[1] / metric[3]:.3f}, test acc {test_acc:.3f}')
print(f'{metric[2] * num_epochs / timer.sum():.1f} examples/sec on '
f'{str(devices)}')
def train_batch(net, X, y, loss, trainer, devices):
"""Train for a minibatch with mutiple GPUs (defined in Chapter 13)."""
if isinstance(X, list):
# Required for BERT fine-tuning (to be covered later)
X = [x.to(devices[0]) for x in X]
else:
X = X.to(devices[0])
y = y.to(devices[0])
net.train()
trainer.zero_grad()
pred = net(X)
l = loss(pred, y)
l.sum().backward()
trainer.step()
train_loss_sum = l.sum()
train_acc_sum = d2l.accuracy(pred, y)
return train_loss_sum, train_acc_sum
使用较小的学习率,通过微调预训练获得的模型参数
train_fine_tuning(finetune_net, 5e-5)
TensorFlow: https://github.com/tensorflow/models
PyTorch: https://github.com/pytorch/vision
参考:CS231n Convolutional Neural Networks for Visual Recognitionchapter4/4.1-fine-tuning.ipynb · 基础简单工具代码/pytorch-handbook - Gitee.com
官方推荐资料