- 赛事地址
比赛详情和baseline见:《如何打一个CV比赛V2.0》。本次比赛我是在colab上跑的,用的是datawhale采样数据集。
from google.colab import drive
drive.mount('/content/drive')
import os
os.chdir('/content/drive/MyDrive/CV/华为车道检测')
下载比赛数据集:
!wget https://mirror.coggle.club/digix-2022-cv-sample-0829.zip
# 解压文件,文件夹重命名为dataset
!unzip digix-2022-cv-sample-0829.zip
!mv digix-2022-cv-sample-0829 dataset # 重命名文件夹为dataset
import os
import glob
from PIL import Image
import csv, time
import numpy as np
# pytorch相关
import torch
import torchvision
import torch.optim as optim
import torch.nn as nn
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
import torch.utils.data as data
import csv
from torch.utils.tensorboard import SummaryWriter
import pprint
# 设置随机种子 固定结果
def set_seeds(seed):
torch.manual_seed(seed) # 固定随机种子(CPU)
if torch.cuda.is_available(): # 固定随机种子(GPU)
torch.cuda.manual_seed(seed) # 为当前GPU设置
torch.cuda.manual_seed_all(seed) # 为所有GPU设置
np.random.seed(seed) # 保证后续使用random函数时,产生固定的随机数
torch.backends.cudnn.deterministic = True # 固定网络结构
# 自定义读取数据集
class ImageSet(data.Dataset):
def __init__(
self,
images,
labels,
transform):
self.transform = transform
self.images = images
self.labels = labels
def __getitem__(self, item):
# 防止文件出错,这里生成一个随机的照片
try:
image = Image.open(self.images[item]).convert('RGB')
except:
image = Image.fromarray(np.zeros((448,448), dtype=np.int8))
image = image.convert('RGB')
image = self.transform(image)
return image, torch.tensor(self.labels[item])
def __len__(self):
return len(self.images)
import pandas as pd
import codecs
# 训练集标注数据
lines = codecs.open('dataset/train_label.csv').readlines()
train_label = pd.DataFrame({
'image': ['dataset/train_image/' + x.strip().split(',')[0] for x in lines],
'label': [x.strip().split(',')[1:] for x in lines],
})
# 将标签进行二值化处理(原数据集有7种标签,6中问题图片和正常图片,本次比赛只区分是否有问题)
train_label['new_label'] = train_label['label'].apply(lambda x: int('0'not in x))
train_label
- 参考《数据增强 - AutoAugment 系列论文(1)》、数据增强 - Cutout、Random Erasing、Mixup、Cutmix、mixup介绍《AUGMIX》
- PILImage对象size属性返回的是w, h,而resize的参数顺序是h, w
- 训练验证测试集的transformer应该一样,不然效果会很差
import cv2, os
def check_image(path):
try:
if os.path.exists(path):
return True
else:
return False
except:
return False
# 筛选路径存在的训练集
train_is_valid = train_label['image'].apply(lambda x: check_image(x) )
train_label = train_label[train_is_valid]
print(len(train_label))
# 数据扩增方法,trfs是baseline的数据增强方法。
trfs = transforms.Compose([
transforms.Resize((224,224)),
transforms.RandomHorizontalFlip(),
transforms.RandomAdjustSharpness(sharpness_factor=2,p=0.5),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
# trfs_shap是目前改进的数据增强方法。
trfs_sharp = transforms.Compose([
transforms.Resize((352,176)),
transforms.CenterCrop([320,160]),
transforms.RandomAdjustSharpness(sharpness_factor=2,p=1),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
from sklearn.model_selection import train_test_split
# 将特征划分到 X 中,标签划分到 Y 中
x = train_label['image'].values # 不加values结果不是列表,划分后也不是列表。还得转换一次列表,麻烦
y = train_label['new_label'].values
x_train, x_valid, y_train, y_valid = train_test_split(x,y,test_size=0.15,random_state=0)#还不是列表
1969
- 设置num_workers=4可以大大加快后面预测时的推理速度。
- 定义get_dataloader函数,参数是bs、数据增强方式transformer等。因为在试验模型时不停地改参数,有时候都不记得用了哪一些了。
# 训练集和验证集dataset
# 训练集和验证集dataset
def get_dataloadr(bs,transforms)
train_dataset = ImageSet(x_train,y_train,transform=transforms)
train_all_dataset = ImageSet(x,y,transform=transforms)
valid_dataset = ImageSet(x_valid,y_valid,transform=transforms)
# 测试集dataset
test_images = glob.glob('dataset/test_images/*')
test_dataset = ImageSet(test_images, [0] * len(test_images),transforms)
train_loader = DataLoader(train_dataset,batch_size=bs,shuffle=True,num_workers=4,pin_memory=True)
valid_loader = DataLoader(valid_dataset,batch_size=bs,shuffle=False,num_workers=4,pin_memory=True)
train_all_loader=DataLoader(train_all_dataset,batch_size=bs,num_workers=4,shuffle=True)
test_loader = DataLoader(test_dataset,batch_size=bs,shuffle=False,num_workers=4,pin_memory=True)
train_dataset[0][0].shape,len(train_loader),len(train_all_loader),len(test_loader)
return train_loader,valid_loader,train_all_loader,test_loader
(torch.Size([3, 320, 160]), 27, 31, 157)
# 查看图片尺寸
from PIL import Image
image_test=x_train[0]
image = Image.open(image_test)
arry_img=np.asarray(image)
image.size,arry_img.shape,type(image)
((1080, 2400), (2400, 1080, 4), PIL.PngImagePlugin.PngImageFile)
Only one class present in y_true. ROC AUC score is not defined in that case
.roc_auc_score(pred_all,label_all)
会报错continuous format is not supported
。第一个参数必须是标签from torch._C import NoneType
#编写训练和验证循环
import time
import numpy as np
from sklearn.metrics import f1_score,precision_score,recall_score,accuracy_score,roc_auc_score
#加载进度条
from tqdm.auto import tqdm
def train_and_eval(train_loader,valid_loader=None,epoch=None,scheduler=None,save_name=None):
best_auc=0.0 # 后面设置了每个epoch都保存数据,所以best_auc其实没用到
num_training_steps=len(train_loader)*epoch
progress_bar=tqdm(range(num_training_steps))
writer = SummaryWriter(log_dir='runs/'+save_name)
for i in range(epoch):
"""训练模型"""
start=time.time()
model.train()
print("***** Running training epoch {} *****".format(i+1))
train_loss_sum,total_train_acc=0.0,0
pred_all,label_all=[],[]
for idx,(X,labels) in enumerate(train_loader):
if isinstance(X, list):
# Required for BERT fine-tuning (to be covered later)
X = [x.to(device) for x in X]
else:
X = X.to(device)
labels = labels.to(device)
#计算输出和loss
pred=model(X)
loss=criterion(pred,labels)
loss.backward()
optimizer.step()
if scheduler is not None:
scheduler=scheduler
scheduler.step()
optimizer.zero_grad()
progress_bar.update(1)
train_loss_sum+=loss.item()
pred=pred.clone().detach().cpu().numpy() # detach表示复制且不可求导,原tensor不变,仍可求导
# 计算acc需要argmax算标签,计算auc需要概率值,而不是预测标签
predictions=np.argmax(pred,axis=-1) # 预测标签,用来计算acc
labels=labels.to('cpu').numpy()
label_all.extend(labels)
pred_all.extend(pred[:,1])
total_train_acc+=accuracy_score(predictions,labels)
avg_train_acc=total_train_acc/len(train_loader)
train_auc=roc_auc_score(label_all,pred_all)
# 将需要显示的数据加入tensorboard
writer.add_scalar(tag="loss/train", scalar_value=train_loss_sum,
global_step=i*len(train_loader)+idx)
writer.add_scalar(tag="acc/train", scalar_value=avg_train_acc,
global_step=i*len(train_loader)+idx)
writer.add_scalar(tag="auc/train", scalar_value=train_auc,
global_step=i*len(train_loader)+idx)
if i%1==0: # 每个epoch打印一次结果
print("Epoch {:03d} | Step {:03d}/{:03d} | Loss {:.4f} | train_acc {:.4f} | train_auc {:.4f} | \
Time {:.4f} | lr = {} \n".format(i+1,idx+1,len(train_loader),train_loss_sum/(idx+1),
avg_train_acc,train_auc,time.time()-start,optimizer.state_dict()['param_groups'][0]['lr']))
if valid_loader is not None: #有传入验证集就验证模型
model.eval()
pred_all,label_all=[],[]
total_eval_loss,total_eval_accuracy=0,0
for (X,labels) in valid_loader:
with torch.no_grad():#只有这一块是不需要求导的
if isinstance(X, list):
# Required for BERT fine-tuning (to be covered later)
X = [x.to(device) for x in X]
else:
X = X.to(device)
labels = labels.to(device)
pred=model(X)
loss=criterion(pred,labels)#计算loss和准确率
total_eval_loss+=loss.item()
pred=pred.clone().detach().cpu().numpy()#detach表示复制且不可求导,原tensor不变,仍可求导
predictions=np.argmax(pred,axis=-1)
labels=labels.to('cpu').numpy()
label_all.extend(labels)
pred_all.extend(pred[:,1])
total_eval_accuracy+=accuracy_score(predictions,labels)
avg_val_acc=total_eval_accuracy/len(valid_loader)
val_auc=roc_auc_score(label_all,pred_all)
writer.add_scalar(tag="loss/valid", scalar_value=total_eval_loss,
global_step=i*len(valid_loader)+idx)
writer.add_scalar(tag="acc/valid", scalar_value=avg_val_acc,
global_step=i*len(valid_loader)+idx)
writer.add_scalar(tag="auc/valid", scalar_value=val_auc,
global_step=i*len(valid_loader)+idx)
torch.save(model.state_dict(),'model/'+'%s'%save_name+'_'+'%d'%i)
print("val_accuracy:%.4f" % (avg_val_acc),'\t',"val_auc:%.4f" % (val_auc))
print("val_loss: %.4f"%(total_eval_loss/len(valid_loader)),' \t',"time costed={}s \n".format(round(time.time()-start,5)))
print("-------------------------------")
else:
# 没有验证集也要保存模型
best_auc=train_auc
torch.save(model.state_dict(),'model/'+'%s'%save_name+'_'+'%d'%i)
选手提交csv文件,编码采用无BOM 的UTF-8。格式如下:
dataloader的num_workers
,结果swin_s预测测试集结果50min没跑完气死。下载模型到kaggle预测,结果kaggle是torch 1.12和television1.10,没有swin模型,怎么安装升级版本都不行。上传到kaggle作为dataset到colab不挂载device跑结果死活下不了def predict(model,model_path=None):
if model_path is not None:
model.load_state_dict(torch.load(model_path))
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = model.to(device)
model.eval()
to_prob = nn.Softmax(dim=1)
with torch.no_grad():
imagenames, probs = list(), list()
for batch_idx, batch in enumerate(test_loader):
image, _ = batch
image = image.to('cuda')
pred = model(image)
prob = to_prob(pred)
prob = list(prob.data.cpu().numpy())
probs += prob
print(probs[0],len(probs))
with open('dataset/submission.csv', 'w',newline = '', encoding='utf8') as fp:
writer = csv.writer(fp)
writer.writerow(['imagename', 'defect_prob'])
for imagename, prob in zip(test_images, probs):
imagename = os.path.basename(imagename)
writer.writerow([imagename, str(prob[1])])
# 加载resnet18预训练模型
train_loader,valid_loader,train_all_loader,test_loader=get_dataloader(bs=32,transforms=trfs)
model = torchvision.models.resnet18(pretrained=True)
model.fc = torch.nn.Linear(512, 2)
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = model.to(device) #使用GPU
# 模型优化器
optimizer = optim.SGD(model.parameters(), lr=0.001)
# 模型损失函数
criterion= nn.CrossEntropyLoss()
train_and_eval(train_loader,valid_loader,epoch=3,save_path='resnet')
0%| | 0/159 [00:00, ?it/s]
***** Running training epoch 1 *****
Epoch 001 | Step 053/053 | Loss 0.3689 | train_acc 0.8782 | train_auc 0.5587 | Time 106.9236 | lr = 0.001
val_accuracy:0.8625 val_auc:0.7425
val_loss: 0.3398 time costed=124.82111s
-------------------------------
***** Running training epoch 2 *****
Epoch 002 | Step 053/053 | Loss 0.3087 | train_acc 0.8850 | train_auc 0.7146 | Time 105.2105 | lr = 0.001
val_accuracy:0.8656 val_auc:0.8386
val_loss: 0.2950 time costed=123.19321s
-------------------------------
***** Running training epoch 3 *****
Epoch 003 | Step 053/053 | Loss 0.2766 | train_acc 0.9027 | train_auc 0.7803 | Time 104.9202 | lr = 0.001
val_accuracy:0.9031 val_auc:0.8385
val_loss: 0.2590 time costed=123.11665s
-------------------------------
# 加载上次跑的模型
model = torchvision.models.resnet18(pretrained=True)
model.fc = torch.nn.Linear(512, 2)
model.load_state_dict(torch.load('resnet_01'))
model = model.to(device)
# 全量数据再训练一次
train_all_loader = DataLoader(train_dataset,batch_size=32,shuffle=True)
optimizer = optim.SGD(model.parameters(), lr=0.0002)
train_and_eval(train_all_loader,epoch=1,save_path='resnet_all')
0%| | 0/53 [00:00, ?it/s]
***** Running training epoch 1 *****
Epoch 001 | Step 053/053 | Loss 0.2586 | train_acc 0.9136 | train_auc 0.8179 | Time 109.0911 | lr = 0.0002
predict(model,model_path=None)
# 加载resnet18预训练模型
set_seeds(2022)
train_loader,valid_loader,train_all_loader,test_loader=get_dataloader(bs=32,transforms=trfs_sharp)
model = torchvision.models.resnet18(pretrained=True)
# weights='DEFAULT'
model.fc = torch.nn.Linear(512,2)
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = model.to(device) #使用GPU
# 模型优化器
optimizer = optim.SGD(model.parameters(), lr=0.001)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer,T_max=3)
# 模型损失函数
criterion= nn.CrossEntropyLoss()
train_and_eval(train_loader,valid_loader,epoch=6,save_name='resnet18')
***** Running training epoch 1 *****
Epoch 001 | Step 053/053 | Loss 0.3774 | train_acc 0.8617 | train_auc 0.4651 | Time 144.9953 | lr = 0.001
val_accuracy:0.8625 val_auc:0.6024
val_loss: 0.3489 time costed=169.10428s
-------------------------------
***** Running training epoch 2 *****
Epoch 002 | Step 053/053 | Loss 0.3024 | train_acc 0.8844 | train_auc 0.6599 | Time 124.3967 | lr = 0.001
val_accuracy:0.8688 val_auc:0.7516
val_loss: 0.3118 time costed=145.55964s
-------------------------------
***** Running training epoch 3 *****
Epoch 003 | Step 053/053 | Loss 0.2708 | train_acc 0.8956 | train_auc 0.7791 | Time 124.2991 | lr = 0.001
val_accuracy:0.8812 val_auc:0.8182
val_loss: 0.2830 time costed=146.05037s
-------------------------------
***** Running training epoch 4 *****
Epoch 004 | Step 053/053 | Loss 0.2555 | train_acc 0.9026 | train_auc 0.8296 | Time 123.9550 | lr = 0.001
val_accuracy:0.8969 val_auc:0.8488
val_loss: 0.2573 time costed=145.67704s
-------------------------------
***** Running training epoch 5 *****
Epoch 005 | Step 053/053 | Loss 0.2417 | train_acc 0.9171 | train_auc 0.8541 | Time 122.8209 | lr = 0.001
val_accuracy:0.9250 val_auc:0.8596
val_loss: 0.2418 time costed=144.79443s
-------------------------------
***** Running training epoch 6 *****
Epoch 006 | Step 053/053 | Loss 0.2208 | train_acc 0.9325 | train_auc 0.8818 | Time 123.2759 | lr = 0.001
val_accuracy:0.9219 val_auc:0.8661
val_loss: 0.2326 time costed=144.71817s
-------------------------------
# 加载上次跑的模型
model = torchvision.models.resnet18(pretrained=True)
model.fc = torch.nn.Linear(512, 2)
model.load_state_dict(torch.load('resnet18_5'))
model = model.to(device)
# 全量数据再训练一次
optimizer = optim.SGD(model.parameters(), lr=0.0001)
scheduler = torch.optim.lr_scheduler.LinearLR(optimizer,start_factor=0.001)
train_and_eval(train_all_loader,epoch=3,save_name='resnet18_all')
***** Running training epoch 1 *****
Epoch 001 | Step 062/062 | Loss 0.2153 | train_acc 0.9311 | train_auc 0.8909 | Time 148.2866 | lr = 0.0001
***** Running training epoch 2 *****
Epoch 002 | Step 062/062 | Loss 0.2130 | train_acc 0.9325 | train_auc 0.8965 | Time 147.7210 | lr = 0.0001
***** Running training epoch 3 *****
Epoch 003 | Step 062/062 | Loss 0.2126 | train_acc 0.9311 | train_auc 0.8991 | Time 146.8041 | lr = 0.0001
predict(model,model_path=None)
pytorch上SwinTransformer有三个T(Tiny),S(Small),B(Base)。其预训练权重在《Table of all available classification weights》。
train_loader,valid_loader,train_all_loader,test_loader=get_dataloader(bs=64,transforms=trfs)
model = torchvision.models.swin_transformer.swin_s(weights='DEFAULT')
model.head=torch.nn.Linear(768,2)
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = model.to(device) #使用GPU
lr,weight_decay=0.001,0.0003
# 模型优化器
optimizer = optim.SGD(model.parameters(), lr=lr)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer,T_max=5)
# 模型损失函数
criterion= nn.CrossEntropyLoss()
train_and_eval(train_loader,valid_loader,epoch=10,scheduler=scheduler,save_name='swin_s')
0%| | 0/270 [00:00<?, ?it/s]
***** Running training epoch 1 *****
Epoch 001 | Step 027/027 | Loss 0.5020 | train_acc 0.8003 | train_auc 0.5437 | Time 756.2816 | lr = 0.00034549150281252655
val_accuracy:0.8588 val_auc:0.6714
val_loss: 0.3851 time costed=888.98904s
-------------------------------
***** Running training epoch 2 *****
Epoch 002 | Step 027/027 | Loss 0.3553 | train_acc 0.8825 | train_auc 0.6178 | Time 120.7405 | lr = 9.549150281252699e-05
val_accuracy:0.8588 val_auc:0.7586
val_loss: 0.3732 time costed=140.11592s
-------------------------------
***** Running training epoch 3 *****
Epoch 003 | Step 027/027 | Loss 0.3430 | train_acc 0.8825 | train_auc 0.6562 | Time 120.7618 | lr = 0.0009045084971874806
val_accuracy:0.8588 val_auc:0.8103
val_loss: 0.3599 time costed=140.08969s
-------------------------------
***** Running training epoch 4 *****
Epoch 004 | Step 027/027 | Loss 0.3325 | train_acc 0.8825 | train_auc 0.6738 | Time 120.9380 | lr = 0.0006545084971874633
val_accuracy:0.8588 val_auc:0.7986
val_loss: 0.3569 time costed=140.3202s
-------------------------------
***** Running training epoch 5 *****
Epoch 005 | Step 027/027 | Loss 0.3288 | train_acc 0.8830 | train_auc 0.6897 | Time 120.4576 | lr = 0.0
val_accuracy:0.8588 val_auc:0.8407
val_loss: 0.3447 time costed=139.84589s
-------------------------------
***** Running training epoch 6 *****
Epoch 006 | Step 027/027 | Loss 0.3240 | train_acc 0.8789 | train_auc 0.7275 | Time 120.7112 | lr = 0.0006545084971874866
val_accuracy:0.8588 val_auc:0.8521
val_loss: 0.3297 time costed=140.0113s
-------------------------------
***** Running training epoch 7 *****
Epoch 007 | Step 027/027 | Loss 0.3157 | train_acc 0.8836 | train_auc 0.7355 | Time 120.6098 | lr = 0.0009045084971875055
val_accuracy:0.8588 val_auc:0.8899
val_loss: 0.3175 time costed=139.8843s
-------------------------------
***** Running training epoch 8 *****
Epoch 008 | Step 027/027 | Loss 0.3179 | train_acc 0.8765 | train_auc 0.7576 | Time 120.4713 | lr = 9.549150281252627e-05
val_accuracy:0.8588 val_auc:0.8959
val_loss: 0.3099 time costed=139.96295s
-------------------------------
***** Running training epoch 9 *****
Epoch 009 | Step 027/027 | Loss 0.3025 | train_acc 0.8848 | train_auc 0.7701 | Time 120.4666 | lr = 0.00034549150281254536
val_accuracy:0.8637 val_auc:0.8945
val_loss: 0.2947 time costed=139.81461s
-------------------------------
***** Running training epoch 10 *****
Epoch 010 | Step 027/027 | Loss 0.3001 | train_acc 0.8824 | train_auc 0.7777 | Time 120.6441 | lr = 0.000999999999999998
val_accuracy:0.8669 val_auc:0.8997
val_loss: 0.2868 time costed=139.93732s
-------------------------------
启动tensorboard查看训练效果
%load_ext tensorboard
%tensorboard --logdir runs/swin_s
Output hidden; open in https://colab.research.google.com to view.
# 最后用全量数据跑2个epoch
model = torchvision.models.swin_transformer.swin_s(weights='DEFAULT')
model.head=torch.nn.Linear(768,2)
model.load_state_dict(torch.load('model/swin_s_9'))
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = model.to(device)
scheduler = torch.optim.lr_scheduler.LinearLR(optimizer,start_factor=0.0002)
# 模型损失函数
criterion= nn.CrossEntropyLoss()
train_and_eval(train_all_loader,scheduler=scheduler,epoch=2,save_name='swin_s_all')
0%| | 0/62 [00:00<?, ?it/s]
***** Running training epoch 1 *****
Epoch 001 | Step 031/031 | Loss 0.2886 | train_acc 0.8860 | train_auc 0.8052 | Time 142.2524 | lr = 0.0009999999999999974
***** Running training epoch 2 *****
Epoch 002 | Step 031/031 | Loss 0.2897 | train_acc 0.8865 | train_auc 0.7930 | Time 141.9387 | lr = 0.0009999999999999974
predict(model,model_path=None)
train_loader,valid_loader,train_all_loader,test_loader=get_dataloader(bs=64,transforms=trfs_sharp)
model = torchvision.models.swin_transformer.swin_s(weights='DEFAULT')
model.head=torch.nn.Linear(768,2)
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = model.to(device) #使用GPU
lr,weight_decay=0.001,0.0003 # 学习率为0.001时验证集acc不变
# 模型优化器
optimizer = optim.SGD(model.parameters(), lr=lr)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer,T_max=5)
# 模型损失函数
criterion= nn.CrossEntropyLoss()
train_and_eval(train_loader,valid_loader,epoch=10,scheduler=scheduler,save_name='swins_sharp')
Downloading: "https://download.pytorch.org/models/swin_s-5e29d889.pth" to /root/.cache/torch/hub/checkpoints/swin_s-5e29d889.pth
0%| | 0.00/190M [00:00<?, ?B/s]
0%| | 0/270 [00:00<?, ?it/s]
***** Running training epoch 1 *****
Epoch 001 | Step 027/027 | Loss 0.5272 | train_acc 0.7702 | train_auc 0.4723 | Time 149.8585 | lr = 0.00034549150281252655
val_accuracy:0.8588 val_auc:0.5290
val_loss: 0.4006 time costed=187.8727s
-------------------------------
***** Running training epoch 2 *****
Epoch 002 | Step 027/027 | Loss 0.3652 | train_acc 0.8825 | train_auc 0.5693 | Time 56.3001 | lr = 9.549150281252699e-05
val_accuracy:0.8588 val_auc:0.5995
val_loss: 0.3884 time costed=68.73339s
-------------------------------
***** Running training epoch 3 *****
Epoch 003 | Step 027/027 | Loss 0.3348 | train_acc 0.8860 | train_auc 0.6288 | Time 56.2766 | lr = 0.0009045084971874806
val_accuracy:0.8588 val_auc:0.6558
val_loss: 0.3918 time costed=68.7855s
-------------------------------
***** Running training epoch 4 *****
Epoch 004 | Step 027/027 | Loss 0.3279 | train_acc 0.8860 | train_auc 0.6313 | Time 56.7157 | lr = 0.0006545084971874633
val_accuracy:0.8588 val_auc:0.6815
val_loss: 0.3914 time costed=69.05862s
-------------------------------
***** Running training epoch 5 *****
Epoch 005 | Step 027/027 | Loss 0.3363 | train_acc 0.8825 | train_auc 0.6533 | Time 56.2564 | lr = 0.0
val_accuracy:0.8588 val_auc:0.7486
val_loss: 0.3646 time costed=68.72446s
-------------------------------
***** Running training epoch 6 *****
Epoch 006 | Step 027/027 | Loss 0.3415 | train_acc 0.8754 | train_auc 0.7024 | Time 56.3478 | lr = 0.0006545084971874866
val_accuracy:0.8588 val_auc:0.7658
val_loss: 0.3583 time costed=69.10426s
-------------------------------
***** Running training epoch 7 *****
Epoch 007 | Step 027/027 | Loss 0.3127 | train_acc 0.8860 | train_auc 0.7012 | Time 56.2839 | lr = 0.0009045084971875055
val_accuracy:0.8588 val_auc:0.7843
val_loss: 0.3557 time costed=68.70668s
-------------------------------
***** Running training epoch 8 *****
Epoch 008 | Step 027/027 | Loss 0.3155 | train_acc 0.8836 | train_auc 0.7328 | Time 56.3986 | lr = 9.549150281252627e-05
val_accuracy:0.8588 val_auc:0.8015
val_loss: 0.3494 time costed=68.8231s
-------------------------------
***** Running training epoch 9 *****
Epoch 009 | Step 027/027 | Loss 0.3045 | train_acc 0.8866 | train_auc 0.7426 | Time 55.9236 | lr = 0.00034549150281254536
val_accuracy:0.8588 val_auc:0.8112
val_loss: 0.3492 time costed=68.71339s
-------------------------------
***** Running training epoch 10 *****
Epoch 010 | Step 027/027 | Loss 0.3055 | train_acc 0.8830 | train_auc 0.7635 | Time 55.7483 | lr = 0.000999999999999998
val_accuracy:0.8588 val_auc:0.8031
val_loss: 0.3388 time costed=68.4414s
-------------------------------
%load_ext tensorboard
%tensorboard --logdir runs/swin_s_shap
最后用全量数据跑3个epoch
model = torchvision.models.swin_transformer.swin_s(weights='DEFAULT')
model.head=torch.nn.Linear(768,2)
model.load_state_dict(torch.load('model/swins_sharp_9'))
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = model.to(device)
optimizer = optim.SGD(model.parameters(),lr=0.0003)
scheduler = torch.optim.lr_scheduler.LinearLR(optimizer,start_factor=0.0003)
# 模型损失函数
criterion= nn.CrossEntropyLoss()
train_and_eval(train_all_loader,scheduler=scheduler,epoch=3,save_name='swins_sharp_all')
0%| | 0/93 [00:00<?, ?it/s]
***** Running training epoch 1 *****
Epoch 001 | Step 031/031 | Loss 0.3156 | train_acc 0.8787 | train_auc 0.7499 | Time 120.0241 | lr = 0.00030000000000000003
***** Running training epoch 2 *****
Epoch 002 | Step 031/031 | Loss 0.3086 | train_acc 0.8803 | train_auc 0.7477 | Time 63.1228 | lr = 0.00030000000000000003
***** Running training epoch 3 *****
Epoch 003 | Step 031/031 | Loss 0.2997 | train_acc 0.8816 | train_auc 0.7771 | Time 62.7066 | lr = 0.00030000000000000003
预测测试集生成结果
model = torchvision.models.swin_transformer.swin_s(weights='DEFAULT')
model.head=torch.nn.Linear(768,2)
predict(model,model_path='model/swins_sharp_all_2') # 14min
[0.90404904 0.09595092] 10000
效果并不好,容易爆显存,可能是超参数没调好。
train_loader,valid_loader,train_all_loader,test_loader=get_dataloader(bs=64,transforms=trfs_sharp)
model = torchvision.models.efficientnet_v2_s(weights='DEFAULT')
model.classifier[1]=torch.nn.Linear(1280,2)
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = model.to(device)
lr,weight_decay=0.001,0.0003
# 模型优化器
optimizer = optim.SGD(model.parameters(), lr=lr)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer,T_max=5)
# 模型损失函数
criterion= nn.CrossEntropyLoss()
train_and_eval(train_loader,valid_loader,epoch=10,scheduler=scheduler,save_name='effnet2')
Downloading: "https://download.pytorch.org/models/efficientnet_v2_s-dd5fe13b.pth" to /root/.cache/torch/hub/checkpoints/efficientnet_v2_s-dd5fe13b.pth
0%| | 0.00/82.7M [00:00<?, ?B/s]
0%| | 0/270 [00:00<?, ?it/s]
***** Running training epoch 1 *****
Epoch 001 | Step 027/027 | Loss 0.5934 | train_acc 0.7841 | train_auc 0.5029 | Time 152.3739 | lr = 0.00034549150281252655
val_accuracy:0.8019 val_auc:0.5505
val_loss: 0.5903 time costed=185.25558s
-------------------------------
***** Running training epoch 2 *****
Epoch 002 | Step 027/027 | Loss 0.5329 | train_acc 0.8437 | train_auc 0.5208 | Time 54.4941 | lr = 9.549150281252699e-05
val_accuracy:0.8300 val_auc:0.5115
val_loss: 0.5477 time costed=64.94097s
-------------------------------
***** Running training epoch 3 *****
Epoch 003 | Step 027/027 | Loss 0.5050 | train_acc 0.8656 | train_auc 0.4964 | Time 49.6710 | lr = 0.0009045084971874806
val_accuracy:0.8400 val_auc:0.6024
val_loss: 0.5248 time costed=60.15345s
-------------------------------
***** Running training epoch 4 *****
Epoch 004 | Step 027/027 | Loss 0.4718 | train_acc 0.8773 | train_auc 0.5031 | Time 49.8339 | lr = 0.0006545084971874633
val_accuracy:0.8588 val_auc:0.5236
val_loss: 0.5017 time costed=60.28123s
-------------------------------
***** Running training epoch 5 *****
Epoch 005 | Step 027/027 | Loss 0.4481 | train_acc 0.8790 | train_auc 0.5428 | Time 49.7957 | lr = 0.0
val_accuracy:0.8588 val_auc:0.4952
val_loss: 0.4818 time costed=60.31998s
-------------------------------
***** Running training epoch 6 *****
Epoch 006 | Step 027/027 | Loss 0.4234 | train_acc 0.8848 | train_auc 0.5676 | Time 50.4792 | lr = 0.0006545084971874866
val_accuracy:0.8619 val_auc:0.5764
val_loss: 0.4550 time costed=61.06234s
-------------------------------
***** Running training epoch 7 *****
Epoch 007 | Step 027/027 | Loss 0.4178 | train_acc 0.8796 | train_auc 0.5479 | Time 49.9712 | lr = 0.0009045084971875055
val_accuracy:0.8588 val_auc:0.5897
val_loss: 0.4512 time costed=60.4179s
-------------------------------
***** Running training epoch 8 *****
Epoch 008 | Step 027/027 | Loss 0.4050 | train_acc 0.8789 | train_auc 0.5802 | Time 49.6602 | lr = 9.549150281252627e-05
val_accuracy:0.8538 val_auc:0.5747
val_loss: 0.4505 time costed=60.17702s
-------------------------------
***** Running training epoch 9 *****
Epoch 009 | Step 027/027 | Loss 0.4062 | train_acc 0.8783 | train_auc 0.5692 | Time 49.7288 | lr = 0.00034549150281254536
val_accuracy:0.8588 val_auc:0.6225
val_loss: 0.4247 time costed=60.10822s
-------------------------------
***** Running training epoch 10 *****
Epoch 010 | Step 027/027 | Loss 0.3873 | train_acc 0.8807 | train_auc 0.5912 | Time 49.3028 | lr = 0.000999999999999998
val_accuracy:0.8588 val_auc:0.5934
val_loss: 0.4345 time costed=59.68535s
-------------------------------
#%load_ext tensorboard
%tensorboard --logdir runs/effnet2
效果没调好,就不写训练结果了
train_loader,valid_loader,train_all_loader,test_loader=get_dataloader(bs=64,transforms=trfs_sharp)
# 加载convnext预训练模型,
model = torchvision.models.convnext.convnext_small(pretrained=True)
model.classifier[2]=torch.nn.Linear(768,2)
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = model.to(device) #使用GPU
# 模型优化器
optimizer = optim.SGD(model.parameters(), lr=0.0005)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer,T_max=5)
# 模型损失函数
criterion= nn.CrossEntropyLoss()
train_and_eval(train_loader,valid_loader,epoch=10,scheduler=scheduler,save_name='convnext') # 21min
VIT使用了绝对位置编码,所以迁移学习的时候:
一开始没有注意到入网尺寸的问题,训练老是报错AssertionError: Wrong image height!
。
train_loader,valid_loader,train_all_loader,test_loader=get_dataloader(bs=64,transforms=trfs_sharp)
model=torchvision.models.vit_b_16() # 不使用预训练模型,直接从头开始训练
model.heads.head=torch.nn.Linear(768,2)
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = model.to(device) #使用GPU
lr,weight_decay=0.001,0.0003 # 学习率为0.001时验证集acc不变
# 模型优化器
optimizer = optim.SGD(model.parameters(), lr=lr)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer,T_max=5)
# 模型损失函数
criterion= nn.CrossEntropyLoss()
train_and_eval(train_loader,valid_loader,epoch=10,scheduler=scheduler,save_name='VITB')
数据扩增:randcrop、randnoise、flip
更加强大的模型:输入的图片的尺寸、模型的最终的精度
利用无标签的数据集:伪标签的进行打标
timm和torchvison 权重是不同,使用的超参数需要调整。
如果使用全量数据集进行训练,更加强化模型-》更长的训练时间,AUC 0.8 +
如果使用预训练模型 vs 从头训练,精度还是有差异,前者更好