迁移学习是指在解决目标任务时,利用其他相关任务的已有知识,从而提高目标任务的性能。可以将迁移学习看作是“续骨”:将已有的学习成果迁移到新的环境下,不断地累加、拓展自己的知识储备,使得新的任务可以更快速地得到解决,成为一种经验共享的学习方式。
(1) 相同任务不同领域(Same Task, Different Domain)
特点:源领域和目标领域使用的是相同的任务,但是这两个领域的分布不同。
(2) 不同任务相同领域(Different Task, Same Domain)
特点:源任务和目标任务不同,但是两个任务用到了相同的特征表示。
(3) 不同任务不同领域(Different Task, Different Domain)
特点:目标任务与源任务和领域都不同。
(1) 选择预训练模型,即迁移的源领域模型。(Pre-trained Model Selection)
(2) 适应性方法,即在目标领域上重新调整模型参数的过程。(Adaptation)
(3) 对模型进行微调,让模型在目标领域表现得尽可能好。(Fine-tuning)
采用kaggle猫狗二分类数据集,该数据集中打好标签的猫狗图片各12500张,
我们对标签进行处理,训练集上标签设置为猫狗各10000张,验证集上valid猫狗各2500张
数据集链接
import torch
import torchvision
from torchvision import datasets,transforms,models
import os
import matplotlib.pyplot as plt
import time
from torch.autograd import Variable # torch 中 Variable 模块
%matplotlib inline
data_dir = "dogs-vs-cats"
data_tansform = { x:transforms.Compose([transforms.Resize([64,64]), # 固定图像大小
transforms.ToTensor()])
for x in ["train","valid"]}
image_datasets = {x:datasets.ImageFolder(root=os.path.join(data_dir,x),
transform = data_tansform[x])
for x in ["train","valid"]}
dataloader = {x:torch.utils.data.DataLoader(dataset=image_datasets[x],
batch_size=16,
shuffle=True)
for x in ["train","valid"]}
# 获取一个批次,并进行数据预览和分析(看看数据大概长什么样)
x_example,y_example = next(iter(dataloader["train"]))
print(u"x_example个数{}".format(len(x_example)))
print(u"y_example个数{}".format(len(y_example)))
## out:x_example个数16、y_example个数16
index_classes = image_datasets["train"].class_to_idx
print(index_classes)
# out {'cat': 0, 'dog': 1}
print(y_example)
# out tensor([0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0])
example_clasees = image_datasets["train"].classes
print(example_clasees)
输出:[‘cat’, ‘cat’, ‘dog’, ‘dog’, ‘cat’, ‘cat’, ‘dog’, ‘dog’, ‘dog’, ‘dog’, ‘dog’, ‘cat’, ‘cat’, ‘cat’, ‘cat’, ‘cat’]
class Models(torch.nn.Module):
def __init__(self):
super(Models,self).__init__()
self.Conv = torch.nn.Sequential(
torch.nn.Conv2d(3,64,kernel_size=3,stride=1,padding=1),
torch.nn.ReLU(),
torch.nn.Conv2d(64,64,kernel_size=3,stride=1,padding=1),
torch.nn.ReLU(),
torch.nn.MaxPool2d(kernel_size=2,stride=2),
torch.nn.Conv2d(64,128,kernel_size=3,stride=1,padding=1),
torch.nn.ReLU(),
torch.nn.Conv2d(128,128,kernel_size=3,stride=1,padding=1),
torch.nn.ReLU(),
torch.nn.MaxPool2d(kernel_size=2,stride=2),
torch.nn.Conv2d(128,256,kernel_size=3,stride=1,padding=1),
torch.nn.ReLU(),
torch.nn.Conv2d(256,256,kernel_size=3,stride=1,padding=1),
torch.nn.ReLU(),
torch.nn.Conv2d(256,256,kernel_size=3,stride=1,padding=1),
torch.nn.ReLU(),
torch.nn.MaxPool2d(kernel_size=2,stride=2),
torch.nn.Conv2d(256,512,kernel_size=3,stride=1,padding=1),
torch.nn.ReLU(),
torch.nn.Conv2d(512,512,kernel_size=3,stride=1,padding=1),
torch.nn.ReLU(),
torch.nn.Conv2d(512,512,kernel_size=3,stride=1,padding=1),
torch.nn.ReLU(),
torch.nn.MaxPool2d(kernel_size=2,stride=2)
)
self.Classes = torch.nn.Sequential(
torch.nn.Linear(4*4*512,1024),
torch.nn.ReLU(),
torch.nn.Dropout(p=.5),
torch.nn.Linear(1024,1024),
torch.nn.ReLU(),
torch.nn.Dropout(p=.5),
torch.nn.Linear(1024,2)
)
def forward(self,input):
x = self.Conv(input)
x = x.view(-1,4*4*512)
x = self.Classes(x)
return x
实例化模型,并输出模型内部结构
model = Models()
print(model)
Models(
(Conv): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU()
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU()
(4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(6): ReLU()
(7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(8): ReLU()
(9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(11): ReLU()
(12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): ReLU()
(14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(15): ReLU()
(16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(18): ReLU()
(19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(20): ReLU()
(21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(22): ReLU()
(23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(Classes): Sequential(
(0): Linear(in_features=8192, out_features=1024, bias=True)
(1): ReLU()
(2): Dropout(p=0.5, inplace=False)
(3): Linear(in_features=1024, out_features=1024, bias=True)
(4): ReLU()
(5): Dropout(p=0.5, inplace=False)
(6): Linear(in_features=1024, out_features=2, bias=True)
)
)
开始训练
loss_f = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(),lr=.00001)
定义lossfunction和优化器
# 检查GPU的情况,Use_gpu=True调用GPU,否则调用CPU
print(torch.cuda.is_available())
Use_gpu = torch.cuda.is_available()
if Use_gpu:
model = model.cuda()
loss_f = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(),lr=.00001)
epoch_n = 10
time_open = time.time()
for epoch in range(epoch_n):
print("Epoch{}/{}".format(epoch,epoch_n-1))
print("-"*10)
for phase in ["train","valid"]:
if phase == "train":
print("Training...")
model.train(True)
else:
print("Validing...")
model.train(False)
running_loss = .0
running_corrects = 0
for batch,data in enumerate(dataloader[phase],1):
x,y=data
if Use_gpu:
x,y = Variable(x.cuda()),Variable(y.cuda())
else:
x,y = Variable(X),Variable(y)
# print(x.shape)
y_pred = model(x)
_,pred = torch.max(y_pred.data,1)
optimizer.zero_grad()
loss = loss_f(y_pred,y)
if phase == "train":
loss.backward()
optimizer.step()
running_loss += loss.data.item()
running_corrects += torch.sum(pred==y.data)
if batch%500 == 0 and phase == "train":
print("Batch{},Train Loss:{:.4f},Train ACC:{:.4f}".format(batch,running_loss/batch,100*running_corrects/(16*batch)))
epoch_loss = running_loss*16/len(image_datasets[phase])
epoch_acc = 100*running_corrects/len(image_datasets[phase])
print("{} Loss:{:.4f} Acc:{:.4f}%".format(phase,epoch_loss,epoch_acc))
time_end = time.time() - time_open
print(time_end)
Epoch0/9
/ ----------
Training…
Batch500,Train Loss:0.6931,Train ACC:50.2125
Batch1000,Train Loss:0.6932,Train ACC:50.2063
train Loss:0.6931Acc:50.3450%
Validing…
valid Loss:0.6942 Acc:50.0000% Epoch1/9
/----------
Training…
Batch500,Train Loss:0.6887,Train ACC:53.6750
Batch1000,Train Loss:0.6852,Train ACC:54.9438
train Loss:0.6842 Acc:55.2200%
Validing…
valid Loss:0.6797 Acc:56.2600%
Epoch2/9
/----------
Training…
Batch500,Train Loss:0.6721,Train ACC:58.2125
Batch1000,Train Loss:0.6676,Train ACC:58.6313
train Loss:0.6655 Acc:59.2450%
Validing…
valid Loss:0.6432 Acc:63.1600%
Epoch3/9
/----------
Training…
Batch500,Train Loss:0.6511,Train ACC:62.0375
Batch1000,Train Loss:0.6439,Train ACC:63.2125
train Loss:0.6425 Acc:63.5550%
Validing…
valid Loss:0.6273 Acc:65.3200%
Epoch4/9
/----------
Training…
Batch500,Train Loss:0.6288,Train ACC:65.1375
Batch1000,Train Loss:0.6266,Train ACC:65.1750
train Loss:0.6243 Acc:65.3000%
Validing…
valid Loss:0.6296 Acc:65.3400%
Epoch5/9
/----------
Training…
Batch500,Train Loss:0.6136,
Train ACC:66.6125 Batch1000,
Train Loss:0.6117,Train ACC:66.8188
train Loss:0.6092 Acc:67.1200%
Validing…
valid Loss:0.6161 Acc:65.5800%
Epoch6/9
/----------
Training…
Batch500,Train Loss:0.6008,Train ACC:67.7500
Batch1000,Train Loss:0.5919,Train ACC:68.6438
train Loss:0.5911 Acc:68.7850%
Validing…
valid Loss:0.5672 Acc:70.7800%
Epoch7/9
/----------
Training…
Batch500,Train Loss:0.5758,Train ACC:69.8875
Batch1000,Train Loss:0.5722,Train ACC:70.4188
train Loss:0.5702 Acc:70.4900%
Validing…
valid Loss:0.5513 Acc:71.5200%
Epoch8/9
/----------
Training…
Batch500,Train Loss:0.5564,Train ACC:71.0000
Batch1000,Train Loss:0.5519,Train ACC:71.4188
train Loss:0.5518 Acc:71.5250%
Validing…
valid Loss:0.5683 Acc:69.6800%
epoch9/9
/ --------------------
Training…
Batch500,Train Loss:0.5524,Train ACC:71.5500
Batch1000,Train Loss:0.5425,Train ACC:72.2750
train Loss:0.5428 Acc:72.2800%
Validing…
valid Loss:0.5262 Acc:73.5400%
用时1598.806438446045
总共训练了10代,耗时1598s,最后识别率ACC:73%
迁移所需要的包和上面一样,不需要改动
import torch
import torchvision
from torchvision import datasets,transforms,models
import os
import matplotlib.pyplot as plt
import time
from torch.autograd import Variable # torch 中 Variable 模块
%matplotlib inline
这里两处改动
transforms.Resize([224,224]),我们调用已经训练好的VGG16,其输入是224×224的图像,所以这里需要将图像大小进行调整,同时transforms.Normalize(mean=[.5,.5,.5],std=[.5,.5,.5]) 对像素矩阵进行标准化。
data_dir = "dogs-vs-cats"
data_tansform = { x:transforms.Compose([transforms.Resize([224,224]), # 固定图像大小
transforms.ToTensor(),
transforms.Normalize(mean=[.5,.5,.5],std=[.5,.5,.5])])
for x in ["train","valid"]}
image_datasets = {x:datasets.ImageFolder(root=os.path.join(data_dir,x),
transform = data_tansform[x])
for x in ["train","valid"]}
dataloader = {x:torch.utils.data.DataLoader(dataset=image_datasets[x],
batch_size=16,
shuffle=True)
for x in ["train","valid"]}
定义别人训练好的vgg模型(这项参数pretrained=True),并将该模型打印
model = models.vgg16(pretrained=True)
print(model)
VGG(
(features): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU(inplace=True)
(4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(6): ReLU(inplace=True)
(7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(8): ReLU(inplace=True)
(9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(11): ReLU(inplace=True)
(12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): ReLU(inplace=True)
(14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(15): ReLU(inplace=True)
(16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(18): ReLU(inplace=True)
(19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(20): ReLU(inplace=True)
(21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(22): ReLU(inplace=True)
(23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(25): ReLU(inplace=True)
(26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(27): ReLU(inplace=True)
(28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(29): ReLU(inplace=True)
(30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
(classifier): Sequential(
(0): Linear(in_features=25088, out_features=4096, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.5, inplace=False)
(3): Linear(in_features=4096, out_features=4096, bias=True)
(4): ReLU(inplace=True)
(5): Dropout(p=0.5, inplace=False)
(6): Linear(in_features=4096, out_features=1000, bias=True)
)
)
需要注意classifier模块将输入特征(in_features)的25088经过Linear、ReLU、Dropout共六层调整到了out_feature的1000,这是因为vgg的原始任务是进行1000的分类。
由于我们的猫狗分类只需要进行二分类,所以我们改变classifier模块,将其1000的输出改成out_features=2.
同时我们需要注意
cost = torch.nn.CrossEntropyLoss() optimizer =
torch.optim.Adam(model.classifier.parameters(),lr=.00001)
定义整个训练的lossfunction采用交叉熵、优化采用Adam、学习率设置为.00001,需要优化的参数设置为classifier,不改变卷积模块的参数
for parma in model.parameters():
parma.requires_grad = False
model.classifier = torch.nn.Sequential(torch.nn.Linear(25088,4096),
torch.nn.ReLU(),
torch.nn.Dropout(p=.5),
torch.nn.Linear(4096,4096),
torch.nn.ReLU(),
torch.nn.Dropout(p=.5),
torch.nn.Linear(4096,2))
if Use_gpu:
model = model.cuda()
cost = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.classifier.parameters(),lr=.00001)
print(model) # 查看新模型结构
VGG(
(features): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU(inplace=True)
(4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(6): ReLU(inplace=True)
(7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(8): ReLU(inplace=True)
(9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(11): ReLU(inplace=True)
(12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): ReLU(inplace=True)
(14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(15): ReLU(inplace=True)
(16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(18): ReLU(inplace=True)
(19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(20): ReLU(inplace=True)
(21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(22): ReLU(inplace=True)
(23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(25): ReLU(inplace=True)
(26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(27): ReLU(inplace=True)
(28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(29): ReLU(inplace=True)
(30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
(classifier): Sequential(
(0): Linear(in_features=25088, out_features=4096, bias=True)
(1): ReLU()
(2): Dropout(p=0.5, inplace=False)
(3): Linear(in_features=4096, out_features=4096, bias=True)
(4): ReLU()
(5): Dropout(p=0.5, inplace=False)
(6): Linear(in_features=4096, out_features=2, bias=True)
)
)
开始训练
print(torch.cuda.is_available())
Use_gpu = torch.cuda.is_available()
if Use_gpu:
model = model.cuda()
loss_f = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(),lr=.00001)
epoch_n = 10
time_open = time.time()
for epoch in range(epoch_n):
print("Epoch{}/{}".format(epoch,epoch_n-1))
print("-"*10)
for phase in ["train","valid"]:
if phase == "train":
print("Training...")
model.train(True)
else:
print("Validing...")
model.train(False)
running_loss = .0
running_corrects = 0
for batch,data in enumerate(dataloader[phase],1):
x,y=data
if Use_gpu:
x,y = Variable(x.cuda()),Variable(y.cuda())
else:
x,y = Variable(X),Variable(y)
# print(x.shape)
y_pred = model(x)
_,pred = torch.max(y_pred.data,1)
optimizer.zero_grad()
loss = loss_f(y_pred,y)
if phase == "train":
loss.backward()
optimizer.step()
running_loss += loss.data.item()
running_corrects += torch.sum(pred==y.data)
if batch%500 == 0 and phase == "train":
print("Batch{},Train Loss:{:.4f},Train ACC:{:.4f}".format(batch,running_loss/batch,100*running_corrects/(16*batch)))
epoch_loss = running_loss*16/len(image_datasets[phase])
epoch_acc = 100*running_corrects/len(image_datasets[phase])
print("{} Loss:{:.4f} Acc:{:.4f}%".format(phase,epoch_loss,epoch_acc))
time_end = time.time() - time_open
print(time_end)
Epoch0/9
/----------
Training…
Batch500,Train Loss:0.0987,Train ACC:96.6250
Batch1000,Train Loss:0.0744,Train ACC:97.3000
train Loss:0.0693 Acc:97.4600%
Validing…
valid Loss:0.0509 Acc:98.1600%
Epoch1/9
/----------
Training…
Batch500,Train Loss:0.0181,Train ACC:99.4375
Batch1000,Train Loss:0.0226,Train ACC:99.2750
train Loss:0.0219 Acc:99.2800%
Validing…
valid Loss:0.0638 Acc:97.8400%
Epoch2/9
/----------
Training…
Batch500,Train Loss:0.0087,Train ACC:99.7625
Batch1000,Train Loss:0.0100,Train ACC:99.7063
train Loss:0.0100 Acc:99.7000%
Validing…
valid Loss:0.0690 Acc:98.1400%
Epoch3/9
/----------
Training…
Batch500,Train Loss:0.0040,Train ACC:99.9000
Batch1000,Train Loss:0.0048,Train ACC:99.9063
train Loss:0.0045 Acc:99.9050%
Validing…
valid Loss:0.0743 Acc:98.3600%
Epoch4/9
/----------
Training…
Batch500,Train Loss:0.0020,Train ACC:99.9500
Batch1000,Train Loss:0.0019,Train ACC:99.9563
train Loss:0.0022 Acc:99.9400%
Validing…
valid Loss:0.0904 Acc:98.3200%
Epoch5/9
/----------
Training…
Batch500,Train Loss:0.0006,Train ACC:100.0000
Batch1000,Train Loss:0.0011,Train ACC:99.9625
train Loss:0.0016 Acc:99.9450%
Validing…
valid Loss:0.1041 Acc:98.4800%
Epoch6/9
/----------
Training…
Batch500,Train Loss:0.0032,Train ACC:99.8250
Batch1000,Train Loss:0.0024,Train ACC:99.8813
train Loss:0.0022 Acc:99.8950%
Validing…
valid Loss:0.1268 Acc:98.0600%
Epoch7/9
/----------
Training…
Batch500,Train Loss:0.0055,Train ACC:99.8625
Batch1000,Train Loss:0.0042,Train ACC:99.8813
train Loss:0.0048 Acc:99.8600%
Validing…
valid Loss:0.1435 Acc:97.8400%
Epoch8/9
/----------
Training…
Batch500,Train Loss:0.0031,Train ACC:99.9000
Batch1000,Train Loss:0.0025,Train ACC:99.9125
train Loss:0.0028 Acc:99.9000%
Validing…
valid Loss:0.1417 Acc:98.2400%
Epoch9/9
/----------
Training…
Batch500,Train Loss:0.0015,Train ACC:99.9500
Batch1000,Train Loss:0.0021,Train ACC:99.9313
train Loss:0.0037 Acc:99.8950%
Validing…
valid Loss:0.1521 Acc:98.0800%
4309.86524772644
还是训练了10代,耗时4309.S,准确率达到了98%
(1) 数据量和数据质量。小数据集下进行训练时,需要使用迁移学习减少过拟合,同时注意数据质量,保证不同任务上实例间的相似度。
(2) 迁移学习应用到的领域、任务等应尽可能地匹配,排除相似度低的任务,减少噪声。
(3) 预训练模型的选择应与目标任务相关,特征向量的维度、粒度、可扩展性等要符合目标任务的特征。
(4) 避免“语义漂移”,即源任务与目标任务之间概念之间存在差异的情况。