华为2022校园赛——车道渲染

文章目录

    • 一、数据预处理
      • 1.1 自定义dataset
      • 1.2 划分数据集
      • 1.3 数据增强
      • 1.4载入dataloader
    • 二、模型训练和预测
      • 2.1 训练验证函数
      • 2.2 predict函数,设置num_workers很重要
    • 三、试验不同的模型、优化器,开始训练
      • 3.1 resnet18
        • 3.1.1 baseline分数0.676(resnet18+trfs,bs=32)
        • 3.1.2 数据增强使用trfs_sharp,分数0.71
      • 3.2 SwinTransformer
        • 3.2.1 数据增强为trfs,分数0.6813。
        • 3.2.2 数据增强使用trfs_shap,分数0.731
      • 3.3 EfficientNetV2
      • 3.4 convnext
      • 3.5 VIT
    • 四、提分思路

  • 赛事地址

  比赛详情和baseline见:《如何打一个CV比赛V2.0》。本次比赛我是在colab上跑的,用的是datawhale采样数据集。

from google.colab import drive
drive.mount('/content/drive')
import os
os.chdir('/content/drive/MyDrive/CV/华为车道检测')

下载比赛数据集:

!wget https://mirror.coggle.club/digix-2022-cv-sample-0829.zip
# 解压文件,文件夹重命名为dataset
!unzip digix-2022-cv-sample-0829.zip
!mv digix-2022-cv-sample-0829 dataset # 重命名文件夹为dataset
import os
import glob
from PIL import Image
import csv, time
import numpy as np

# pytorch相关
import torch
import torchvision
import torch.optim as optim
import torch.nn as nn
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
import torch.utils.data as data
import csv

from torch.utils.tensorboard import SummaryWriter
import pprint
# 设置随机种子 固定结果
def set_seeds(seed):
  torch.manual_seed(seed)  # 固定随机种子(CPU)
  if torch.cuda.is_available():  # 固定随机种子(GPU)
    torch.cuda.manual_seed(seed)  # 为当前GPU设置
    torch.cuda.manual_seed_all(seed)  # 为所有GPU设置
  np.random.seed(seed)  # 保证后续使用random函数时,产生固定的随机数
  torch.backends.cudnn.deterministic = True  # 固定网络结构

一、数据预处理

1.1 自定义dataset

# 自定义读取数据集
class ImageSet(data.Dataset):
    def __init__(
            self,
            images,
            labels,
            transform):
        self.transform = transform
        self.images = images
        self.labels = labels

    def __getitem__(self, item):        
        # 防止文件出错,这里生成一个随机的照片
        try:
            image = Image.open(self.images[item]).convert('RGB')
  
        except:
            image = Image.fromarray(np.zeros((448,448), dtype=np.int8))
            image = image.convert('RGB')

        image = self.transform(image)
        return image, torch.tensor(self.labels[item])

    def __len__(self):
        return len(self.images)

1.2 划分数据集

import pandas as pd
import codecs

# 训练集标注数据
lines = codecs.open('dataset/train_label.csv').readlines()
train_label = pd.DataFrame({
    'image': ['dataset/train_image/' + x.strip().split(',')[0] for x in lines],
    'label': [x.strip().split(',')[1:] for x in lines],
})

# 将标签进行二值化处理(原数据集有7种标签,6中问题图片和正常图片,本次比赛只区分是否有问题)
train_label['new_label'] = train_label['label'].apply(lambda x: int('0'not in x))
train_label

1.3 数据增强

  • 参考《数据增强 - AutoAugment 系列论文(1)》、数据增强 - Cutout、Random Erasing、Mixup、Cutmix、mixup介绍《AUGMIX》
  • PILImage对象size属性返回的是w, h,而resize的参数顺序是h, w
  • 训练验证测试集的transformer应该一样,不然效果会很差
  • 目前试验了锐化、Mixup、Augmix、AutoAgmentation,以及入网尺寸,发现入网尺寸为transforms.Resize((352,176)),然后 transforms.CenterCrop([320,160])效果比resize(224,224)好。因为这次图片size是(1080,2400)。Mixup、Augmix、AutoAgmentation效果不太好,锐化有提点。
  • 入网尺寸有试过增大一倍,但是efficiennet显存很快就爆了,没有再试。
import cv2, os
def check_image(path):
    try:
        if os.path.exists(path):
            return True
        else:
            return False
    except:
        return False
# 筛选路径存在的训练集
train_is_valid = train_label['image'].apply(lambda x: check_image(x) )
train_label = train_label[train_is_valid]
print(len(train_label))

# 数据扩增方法,trfs是baseline的数据增强方法。
trfs = transforms.Compose([
    transforms.Resize((224,224)),
    transforms.RandomHorizontalFlip(),
    transforms.RandomAdjustSharpness(sharpness_factor=2,p=0.5),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])
    
# trfs_shap是目前改进的数据增强方法。
trfs_sharp = transforms.Compose([
    transforms.Resize((352,176)),
    transforms.CenterCrop([320,160]),
    transforms.RandomAdjustSharpness(sharpness_factor=2,p=1),    
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])

from sklearn.model_selection import train_test_split
# 将特征划分到 X 中,标签划分到 Y 中
x = train_label['image'].values  # 不加values结果不是列表,划分后也不是列表。还得转换一次列表,麻烦
y = train_label['new_label'].values
x_train, x_valid, y_train, y_valid = train_test_split(x,y,test_size=0.15,random_state=0)#还不是列表
1969

1.4载入dataloader

  • 设置num_workers=4可以大大加快后面预测时的推理速度。
  • 定义get_dataloader函数,参数是bs、数据增强方式transformer等。因为在试验模型时不停地改参数,有时候都不记得用了哪一些了。
# 训练集和验证集dataset
# 训练集和验证集dataset
def get_dataloadr(bs,transforms)
  
  train_dataset = ImageSet(x_train,y_train,transform=transforms)
  train_all_dataset = ImageSet(x,y,transform=transforms)
  valid_dataset = ImageSet(x_valid,y_valid,transform=transforms)
  # 测试集dataset
  test_images = glob.glob('dataset/test_images/*')
  test_dataset = ImageSet(test_images, [0] * len(test_images),transforms)

  train_loader = DataLoader(train_dataset,batch_size=bs,shuffle=True,num_workers=4,pin_memory=True)
  valid_loader = DataLoader(valid_dataset,batch_size=bs,shuffle=False,num_workers=4,pin_memory=True)
  train_all_loader=DataLoader(train_all_dataset,batch_size=bs,num_workers=4,shuffle=True)

  test_loader = DataLoader(test_dataset,batch_size=bs,shuffle=False,num_workers=4,pin_memory=True)
  train_dataset[0][0].shape,len(train_loader),len(train_all_loader),len(test_loader)
  
  return train_loader,valid_loader,train_all_loader,test_loader 
(torch.Size([3, 320, 160]), 27, 31, 157)
# 查看图片尺寸
from PIL import Image

image_test=x_train[0]
image = Image.open(image_test)
arry_img=np.asarray(image)
image.size,arry_img.shape,type(image)
((1080, 2400), (2400, 1080, 4), PIL.PngImagePlugin.PngImageFile)

二、模型训练和预测

2.1 训练验证函数

  1. roc_auc_score传入的应该是标签和预测概率,不是预测标签.而且这样会报错Only one class present in y_true. ROC AUC score is not defined in that case.
  2. 如果传入roc_auc_score(pred_all,label_all)会报错continuous format is not supported。第一个参数必须是标签
  3. 可以设置为只保存最佳auc的模型,但是训练时只用采样数据集存在过拟合,最佳valid_auc不一定是最优模型,所以还是设置了每个epoch都保存模型。
from torch._C import NoneType
#编写训练和验证循环
import time
import numpy as np
from sklearn.metrics import f1_score,precision_score,recall_score,accuracy_score,roc_auc_score
#加载进度条
from tqdm.auto import tqdm

def train_and_eval(train_loader,valid_loader=None,epoch=None,scheduler=None,save_name=None):
  best_auc=0.0 # 后面设置了每个epoch都保存数据,所以best_auc其实没用到
  num_training_steps=len(train_loader)*epoch
  progress_bar=tqdm(range(num_training_steps))
  writer = SummaryWriter(log_dir='runs/'+save_name)
  for i in range(epoch):
    """训练模型"""

    start=time.time()
    model.train()
    print("***** Running training epoch {} *****".format(i+1))
    train_loss_sum,total_train_acc=0.0,0
    pred_all,label_all=[],[]
    for idx,(X,labels) in enumerate(train_loader):
      if isinstance(X, list):
        # Required for BERT fine-tuning (to be covered later)
        X = [x.to(device) for x in X]
      else:
        X = X.to(device)
      labels = labels.to(device)
      

      #计算输出和loss
      pred=model(X)
      loss=criterion(pred,labels)
      loss.backward()

      optimizer.step()
      if scheduler is not None:
        scheduler=scheduler
        scheduler.step()
      optimizer.zero_grad()  
      progress_bar.update(1)

      train_loss_sum+=loss.item()
      pred=pred.clone().detach().cpu().numpy() # detach表示复制且不可求导,原tensor不变,仍可求导
      # 计算acc需要argmax算标签,计算auc需要概率值,而不是预测标签
      predictions=np.argmax(pred,axis=-1) # 预测标签,用来计算acc
      labels=labels.to('cpu').numpy()

      label_all.extend(labels)
      pred_all.extend(pred[:,1]) 
      total_train_acc+=accuracy_score(predictions,labels)
    
    avg_train_acc=total_train_acc/len(train_loader)
    train_auc=roc_auc_score(label_all,pred_all)
    # 将需要显示的数据加入tensorboard
    writer.add_scalar(tag="loss/train", scalar_value=train_loss_sum,
                          global_step=i*len(train_loader)+idx)
    writer.add_scalar(tag="acc/train", scalar_value=avg_train_acc,
                          global_step=i*len(train_loader)+idx)
    writer.add_scalar(tag="auc/train", scalar_value=train_auc,
                          global_step=i*len(train_loader)+idx)
    if i%1==0: # 每个epoch打印一次结果
      print("Epoch {:03d} | Step {:03d}/{:03d} | Loss {:.4f} | train_acc {:.4f} | train_auc {:.4f} | \
    Time {:.4f} | lr = {} \n".format(i+1,idx+1,len(train_loader),train_loss_sum/(idx+1),
                          avg_train_acc,train_auc,time.time()-start,optimizer.state_dict()['param_groups'][0]['lr']))

    if valid_loader is not None:  #有传入验证集就验证模型      
      model.eval()
      pred_all,label_all=[],[]
      total_eval_loss,total_eval_accuracy=0,0
      for (X,labels) in valid_loader:
        with torch.no_grad():#只有这一块是不需要求导的    
          if isinstance(X, list):
          # Required for BERT fine-tuning (to be covered later)
            X = [x.to(device) for x in X]
          else:
            X = X.to(device)
          labels = labels.to(device)
          pred=model(X)
        
        loss=criterion(pred,labels)#计算loss和准确率
        total_eval_loss+=loss.item()

        pred=pred.clone().detach().cpu().numpy()#detach表示复制且不可求导,原tensor不变,仍可求导
        predictions=np.argmax(pred,axis=-1)
        labels=labels.to('cpu').numpy()

        label_all.extend(labels)
        pred_all.extend(pred[:,1])
        total_eval_accuracy+=accuracy_score(predictions,labels)

      avg_val_acc=total_eval_accuracy/len(valid_loader)
      val_auc=roc_auc_score(label_all,pred_all) 
      writer.add_scalar(tag="loss/valid", scalar_value=total_eval_loss,
                          global_step=i*len(valid_loader)+idx)
      writer.add_scalar(tag="acc/valid", scalar_value=avg_val_acc,
                          global_step=i*len(valid_loader)+idx)
      writer.add_scalar(tag="auc/valid", scalar_value=val_auc,
                          global_step=i*len(valid_loader)+idx)
      torch.save(model.state_dict(),'model/'+'%s'%save_name+'_'+'%d'%i)
      
      print("val_accuracy:%.4f" % (avg_val_acc),'\t',"val_auc:%.4f" % (val_auc))
      print("val_loss: %.4f"%(total_eval_loss/len(valid_loader)),' \t',"time costed={}s \n".format(round(time.time()-start,5)))
      print("-------------------------------")
    else:
      # 没有验证集也要保存模型
      best_auc=train_auc
      torch.save(model.state_dict(),'model/'+'%s'%save_name+'_'+'%d'%i)

2.2 predict函数,设置num_workers很重要

选手提交csv文件,编码采用无BOM 的UTF-8。格式如下:

  • imagename, defect_prob
  • imagename对应测试图片的图片名,defect_prob表示测试图片存在问题的概率。imagename, defect_prob间采用英文逗号分隔。
  • 之前没有设置dataloader的num_workers,结果swin_s预测测试集结果50min没跑完气死。下载模型到kaggle预测,结果kaggle是torch 1.12和television1.10,没有swin模型,怎么安装升级版本都不行。上传到kaggle作为dataset到colab不挂载device跑结果死活下不了
  • 最后想着跟num_workers有关,设置num_workers=4,好的时候swin_s只用6分半预测完。(colab分配的GPU貌似是会不一样的)
def predict(model,model_path=None):
  if model_path is not None:
    model.load_state_dict(torch.load(model_path))
  	device = 'cuda' if torch.cuda.is_available() else 'cpu'
  	model = model.to(device)
  
  model.eval()
  to_prob = nn.Softmax(dim=1)
  with torch.no_grad():
    imagenames, probs = list(), list()
    for batch_idx, batch in enumerate(test_loader):
      image, _ = batch
      image = image.to('cuda')
      pred = model(image)
      prob = to_prob(pred)
      prob = list(prob.data.cpu().numpy())
      probs += prob
  print(probs[0],len(probs))
  
  with open('dataset/submission.csv', 'w',newline = '', encoding='utf8') as fp:
    writer = csv.writer(fp)
    writer.writerow(['imagename', 'defect_prob'])
    for imagename, prob in zip(test_images, probs):
      imagename = os.path.basename(imagename)
      writer.writerow([imagename, str(prob[1])])

三、试验不同的模型、优化器,开始训练

3.1 resnet18

3.1.1 baseline分数0.676(resnet18+trfs,bs=32)

# 加载resnet18预训练模型
train_loader,valid_loader,train_all_loader,test_loader=get_dataloader(bs=32,transforms=trfs)

model = torchvision.models.resnet18(pretrained=True)
model.fc = torch.nn.Linear(512, 2)
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = model.to(device) #使用GPU

# 模型优化器
optimizer = optim.SGD(model.parameters(), lr=0.001)
# 模型损失函数
criterion= nn.CrossEntropyLoss()
train_and_eval(train_loader,valid_loader,epoch=3,save_path='resnet')
  0%|          | 0/159 [00:00
# 加载上次跑的模型
model = torchvision.models.resnet18(pretrained=True)
model.fc = torch.nn.Linear(512, 2)
model.load_state_dict(torch.load('resnet_01'))
model = model.to(device)
# 全量数据再训练一次
train_all_loader = DataLoader(train_dataset,batch_size=32,shuffle=True)
optimizer = optim.SGD(model.parameters(), lr=0.0002)
train_and_eval(train_all_loader,epoch=1,save_path='resnet_all')
  0%|          | 0/53 [00:00
predict(model,model_path=None)

3.1.2 数据增强使用trfs_sharp,分数0.71

# 加载resnet18预训练模型
set_seeds(2022)
train_loader,valid_loader,train_all_loader,test_loader=get_dataloader(bs=32,transforms=trfs_sharp)

model = torchvision.models.resnet18(pretrained=True)
# weights='DEFAULT'
model.fc = torch.nn.Linear(512,2)
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = model.to(device) #使用GPU

# 模型优化器
optimizer = optim.SGD(model.parameters(), lr=0.001)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer,T_max=3)
# 模型损失函数
criterion= nn.CrossEntropyLoss()
train_and_eval(train_loader,valid_loader,epoch=6,save_name='resnet18')
***** Running training epoch 1 *****
Epoch 001 | Step 053/053 | Loss 0.3774 | train_acc 0.8617 | train_auc 0.4651 |             Time 144.9953 | lr = 0.001 

val_accuracy:0.8625 	 val_auc:0.6024
val_loss: 0.3489  	 time costed=169.10428s 

-------------------------------
***** Running training epoch 2 *****
Epoch 002 | Step 053/053 | Loss 0.3024 | train_acc 0.8844 | train_auc 0.6599 |             Time 124.3967 | lr = 0.001 

val_accuracy:0.8688 	 val_auc:0.7516
val_loss: 0.3118  	 time costed=145.55964s 

-------------------------------
***** Running training epoch 3 *****
Epoch 003 | Step 053/053 | Loss 0.2708 | train_acc 0.8956 | train_auc 0.7791 |             Time 124.2991 | lr = 0.001 

val_accuracy:0.8812 	 val_auc:0.8182
val_loss: 0.2830  	 time costed=146.05037s 

-------------------------------
***** Running training epoch 4 *****
Epoch 004 | Step 053/053 | Loss 0.2555 | train_acc 0.9026 | train_auc 0.8296 |             Time 123.9550 | lr = 0.001 

val_accuracy:0.8969 	 val_auc:0.8488
val_loss: 0.2573  	 time costed=145.67704s 

-------------------------------
***** Running training epoch 5 *****
Epoch 005 | Step 053/053 | Loss 0.2417 | train_acc 0.9171 | train_auc 0.8541 |             Time 122.8209 | lr = 0.001 

val_accuracy:0.9250 	 val_auc:0.8596
val_loss: 0.2418  	 time costed=144.79443s 

-------------------------------
***** Running training epoch 6 *****
Epoch 006 | Step 053/053 | Loss 0.2208 | train_acc 0.9325 | train_auc 0.8818 |             Time 123.2759 | lr = 0.001 

val_accuracy:0.9219 	 val_auc:0.8661
val_loss: 0.2326  	 time costed=144.71817s 

-------------------------------
# 加载上次跑的模型
model = torchvision.models.resnet18(pretrained=True)
model.fc = torch.nn.Linear(512, 2)
model.load_state_dict(torch.load('resnet18_5'))
model = model.to(device)
# 全量数据再训练一次
optimizer = optim.SGD(model.parameters(), lr=0.0001)
scheduler = torch.optim.lr_scheduler.LinearLR(optimizer,start_factor=0.001)
train_and_eval(train_all_loader,epoch=3,save_name='resnet18_all')
***** Running training epoch 1 *****
Epoch 001 | Step 062/062 | Loss 0.2153 | train_acc 0.9311 | train_auc 0.8909 |             Time 148.2866 | lr = 0.0001 

***** Running training epoch 2 *****
Epoch 002 | Step 062/062 | Loss 0.2130 | train_acc 0.9325 | train_auc 0.8965 |             Time 147.7210 | lr = 0.0001 

***** Running training epoch 3 *****
Epoch 003 | Step 062/062 | Loss 0.2126 | train_acc 0.9311 | train_auc 0.8991 |             Time 146.8041 | lr = 0.0001 
predict(model,model_path=None)

3.2 SwinTransformer

3.2.1 数据增强为trfs,分数0.6813。

pytorch上SwinTransformer有三个T(Tiny),S(Small),B(Base)。其预训练权重在《Table of all available classification weights》。

  • 简单说权重使用weights=‘DEFAULT’ or weights=‘IMAGENET1K_V1’.
  • 最后分类头是head层,也是torch.nn.Linear,s和t版输入是768维,输出1000维。
  • 优化器用sgd,bs=64,lr=0.001减半,优化器用sgd,训练10epoch用了21min
  • train_acc 0.8824 ,train_auc 0.7777 |val_accuracy:0.8669 val_auc:0.8997
  • 最后用全部训练集训练2个epoch,分数是0.6813
train_loader,valid_loader,train_all_loader,test_loader=get_dataloader(bs=64,transforms=trfs)

model = torchvision.models.swin_transformer.swin_s(weights='DEFAULT')
model.head=torch.nn.Linear(768,2)
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = model.to(device) #使用GPU
lr,weight_decay=0.001,0.0003
# 模型优化器
optimizer = optim.SGD(model.parameters(), lr=lr)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer,T_max=5)
# 模型损失函数
criterion= nn.CrossEntropyLoss()
train_and_eval(train_loader,valid_loader,epoch=10,scheduler=scheduler,save_name='swin_s')
  0%|          | 0/270 [00:00<?, ?it/s]


***** Running training epoch 1 *****
Epoch 001 | Step 027/027 | Loss 0.5020 | train_acc 0.8003 | train_auc 0.5437 |     Time 756.2816 | lr = 0.00034549150281252655 

val_accuracy:0.8588 	 val_auc:0.6714
val_loss: 0.3851  	 time costed=888.98904s 

-------------------------------
***** Running training epoch 2 *****
Epoch 002 | Step 027/027 | Loss 0.3553 | train_acc 0.8825 | train_auc 0.6178 |     Time 120.7405 | lr = 9.549150281252699e-05 

val_accuracy:0.8588 	 val_auc:0.7586
val_loss: 0.3732  	 time costed=140.11592s 

-------------------------------
***** Running training epoch 3 *****
Epoch 003 | Step 027/027 | Loss 0.3430 | train_acc 0.8825 | train_auc 0.6562 |     Time 120.7618 | lr = 0.0009045084971874806 

val_accuracy:0.8588 	 val_auc:0.8103
val_loss: 0.3599  	 time costed=140.08969s 

-------------------------------
***** Running training epoch 4 *****
Epoch 004 | Step 027/027 | Loss 0.3325 | train_acc 0.8825 | train_auc 0.6738 |     Time 120.9380 | lr = 0.0006545084971874633 

val_accuracy:0.8588 	 val_auc:0.7986
val_loss: 0.3569  	 time costed=140.3202s 

-------------------------------
***** Running training epoch 5 *****
Epoch 005 | Step 027/027 | Loss 0.3288 | train_acc 0.8830 | train_auc 0.6897 |     Time 120.4576 | lr = 0.0 

val_accuracy:0.8588 	 val_auc:0.8407
val_loss: 0.3447  	 time costed=139.84589s 

-------------------------------
***** Running training epoch 6 *****
Epoch 006 | Step 027/027 | Loss 0.3240 | train_acc 0.8789 | train_auc 0.7275 |     Time 120.7112 | lr = 0.0006545084971874866 

val_accuracy:0.8588 	 val_auc:0.8521
val_loss: 0.3297  	 time costed=140.0113s 

-------------------------------
***** Running training epoch 7 *****
Epoch 007 | Step 027/027 | Loss 0.3157 | train_acc 0.8836 | train_auc 0.7355 |     Time 120.6098 | lr = 0.0009045084971875055 

val_accuracy:0.8588 	 val_auc:0.8899
val_loss: 0.3175  	 time costed=139.8843s 

-------------------------------
***** Running training epoch 8 *****
Epoch 008 | Step 027/027 | Loss 0.3179 | train_acc 0.8765 | train_auc 0.7576 |     Time 120.4713 | lr = 9.549150281252627e-05 

val_accuracy:0.8588 	 val_auc:0.8959
val_loss: 0.3099  	 time costed=139.96295s 

-------------------------------
***** Running training epoch 9 *****
Epoch 009 | Step 027/027 | Loss 0.3025 | train_acc 0.8848 | train_auc 0.7701 |     Time 120.4666 | lr = 0.00034549150281254536 

val_accuracy:0.8637 	 val_auc:0.8945
val_loss: 0.2947  	 time costed=139.81461s 

-------------------------------
***** Running training epoch 10 *****
Epoch 010 | Step 027/027 | Loss 0.3001 | train_acc 0.8824 | train_auc 0.7777 |     Time 120.6441 | lr = 0.000999999999999998 

val_accuracy:0.8669 	 val_auc:0.8997
val_loss: 0.2868  	 time costed=139.93732s 

-------------------------------

启动tensorboard查看训练效果

%load_ext tensorboard
%tensorboard --logdir runs/swin_s
Output hidden; open in https://colab.research.google.com to view.
# 最后用全量数据跑2个epoch
model = torchvision.models.swin_transformer.swin_s(weights='DEFAULT')
model.head=torch.nn.Linear(768,2)
model.load_state_dict(torch.load('model/swin_s_9'))

device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = model.to(device)
scheduler = torch.optim.lr_scheduler.LinearLR(optimizer,start_factor=0.0002)
# 模型损失函数
criterion= nn.CrossEntropyLoss()
train_and_eval(train_all_loader,scheduler=scheduler,epoch=2,save_name='swin_s_all')
  0%|          | 0/62 [00:00<?, ?it/s]


***** Running training epoch 1 *****
Epoch 001 | Step 031/031 | Loss 0.2886 | train_acc 0.8860 | train_auc 0.8052 |     Time 142.2524 | lr = 0.0009999999999999974 

***** Running training epoch 2 *****
Epoch 002 | Step 031/031 | Loss 0.2897 | train_acc 0.8865 | train_auc 0.7930 |     Time 141.9387 | lr = 0.0009999999999999974 
 predict(model,model_path=None)   

3.2.2 数据增强使用trfs_shap,分数0.731

train_loader,valid_loader,train_all_loader,test_loader=get_dataloader(bs=64,transforms=trfs_sharp)
model = torchvision.models.swin_transformer.swin_s(weights='DEFAULT')
model.head=torch.nn.Linear(768,2)

device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = model.to(device) #使用GPU
lr,weight_decay=0.001,0.0003 # 学习率为0.001时验证集acc不变
# 模型优化器
optimizer = optim.SGD(model.parameters(), lr=lr)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer,T_max=5)
# 模型损失函数
criterion= nn.CrossEntropyLoss()
train_and_eval(train_loader,valid_loader,epoch=10,scheduler=scheduler,save_name='swins_sharp')
Downloading: "https://download.pytorch.org/models/swin_s-5e29d889.pth" to /root/.cache/torch/hub/checkpoints/swin_s-5e29d889.pth



  0%|          | 0.00/190M [00:00<?, ?B/s]



  0%|          | 0/270 [00:00<?, ?it/s]


***** Running training epoch 1 *****
Epoch 001 | Step 027/027 | Loss 0.5272 | train_acc 0.7702 | train_auc 0.4723 |     Time 149.8585 | lr = 0.00034549150281252655 

val_accuracy:0.8588 	 val_auc:0.5290
val_loss: 0.4006  	 time costed=187.8727s 

-------------------------------
***** Running training epoch 2 *****
Epoch 002 | Step 027/027 | Loss 0.3652 | train_acc 0.8825 | train_auc 0.5693 |     Time 56.3001 | lr = 9.549150281252699e-05 

val_accuracy:0.8588 	 val_auc:0.5995
val_loss: 0.3884  	 time costed=68.73339s 

-------------------------------
***** Running training epoch 3 *****
Epoch 003 | Step 027/027 | Loss 0.3348 | train_acc 0.8860 | train_auc 0.6288 |     Time 56.2766 | lr = 0.0009045084971874806 

val_accuracy:0.8588 	 val_auc:0.6558
val_loss: 0.3918  	 time costed=68.7855s 

-------------------------------
***** Running training epoch 4 *****
Epoch 004 | Step 027/027 | Loss 0.3279 | train_acc 0.8860 | train_auc 0.6313 |     Time 56.7157 | lr = 0.0006545084971874633 

val_accuracy:0.8588 	 val_auc:0.6815
val_loss: 0.3914  	 time costed=69.05862s 

-------------------------------
***** Running training epoch 5 *****
Epoch 005 | Step 027/027 | Loss 0.3363 | train_acc 0.8825 | train_auc 0.6533 |     Time 56.2564 | lr = 0.0 

val_accuracy:0.8588 	 val_auc:0.7486
val_loss: 0.3646  	 time costed=68.72446s 

-------------------------------
***** Running training epoch 6 *****
Epoch 006 | Step 027/027 | Loss 0.3415 | train_acc 0.8754 | train_auc 0.7024 |     Time 56.3478 | lr = 0.0006545084971874866 

val_accuracy:0.8588 	 val_auc:0.7658
val_loss: 0.3583  	 time costed=69.10426s 

-------------------------------
***** Running training epoch 7 *****
Epoch 007 | Step 027/027 | Loss 0.3127 | train_acc 0.8860 | train_auc 0.7012 |     Time 56.2839 | lr = 0.0009045084971875055 

val_accuracy:0.8588 	 val_auc:0.7843
val_loss: 0.3557  	 time costed=68.70668s 

-------------------------------
***** Running training epoch 8 *****
Epoch 008 | Step 027/027 | Loss 0.3155 | train_acc 0.8836 | train_auc 0.7328 |     Time 56.3986 | lr = 9.549150281252627e-05 

val_accuracy:0.8588 	 val_auc:0.8015
val_loss: 0.3494  	 time costed=68.8231s 

-------------------------------
***** Running training epoch 9 *****
Epoch 009 | Step 027/027 | Loss 0.3045 | train_acc 0.8866 | train_auc 0.7426 |     Time 55.9236 | lr = 0.00034549150281254536 

val_accuracy:0.8588 	 val_auc:0.8112
val_loss: 0.3492  	 time costed=68.71339s 

-------------------------------
***** Running training epoch 10 *****
Epoch 010 | Step 027/027 | Loss 0.3055 | train_acc 0.8830 | train_auc 0.7635 |     Time 55.7483 | lr = 0.000999999999999998 

val_accuracy:0.8588 	 val_auc:0.8031
val_loss: 0.3388  	 time costed=68.4414s 

-------------------------------
%load_ext tensorboard
%tensorboard --logdir runs/swin_s_shap

最后用全量数据跑3个epoch


model = torchvision.models.swin_transformer.swin_s(weights='DEFAULT')
model.head=torch.nn.Linear(768,2)
model.load_state_dict(torch.load('model/swins_sharp_9'))

device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = model.to(device)

optimizer = optim.SGD(model.parameters(),lr=0.0003)
scheduler = torch.optim.lr_scheduler.LinearLR(optimizer,start_factor=0.0003)
# 模型损失函数
criterion= nn.CrossEntropyLoss()
train_and_eval(train_all_loader,scheduler=scheduler,epoch=3,save_name='swins_sharp_all')
  0%|          | 0/93 [00:00<?, ?it/s]


***** Running training epoch 1 *****
Epoch 001 | Step 031/031 | Loss 0.3156 | train_acc 0.8787 | train_auc 0.7499 |     Time 120.0241 | lr = 0.00030000000000000003 

***** Running training epoch 2 *****
Epoch 002 | Step 031/031 | Loss 0.3086 | train_acc 0.8803 | train_auc 0.7477 |     Time 63.1228 | lr = 0.00030000000000000003 

***** Running training epoch 3 *****
Epoch 003 | Step 031/031 | Loss 0.2997 | train_acc 0.8816 | train_auc 0.7771 |     Time 62.7066 | lr = 0.00030000000000000003 

预测测试集生成结果

model = torchvision.models.swin_transformer.swin_s(weights='DEFAULT')
model.head=torch.nn.Linear(768,2)

predict(model,model_path='model/swins_sharp_all_2') # 14min
[0.90404904 0.09595092] 10000

3.3 EfficientNetV2

效果并不好,容易爆显存,可能是超参数没调好。

train_loader,valid_loader,train_all_loader,test_loader=get_dataloader(bs=64,transforms=trfs_sharp)
model = torchvision.models.efficientnet_v2_s(weights='DEFAULT')
model.classifier[1]=torch.nn.Linear(1280,2)

device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = model.to(device)

lr,weight_decay=0.001,0.0003
# 模型优化器
optimizer = optim.SGD(model.parameters(), lr=lr)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer,T_max=5)
# 模型损失函数
criterion= nn.CrossEntropyLoss()
train_and_eval(train_loader,valid_loader,epoch=10,scheduler=scheduler,save_name='effnet2')

Downloading: "https://download.pytorch.org/models/efficientnet_v2_s-dd5fe13b.pth" to /root/.cache/torch/hub/checkpoints/efficientnet_v2_s-dd5fe13b.pth

  0%|          | 0.00/82.7M [00:00<?, ?B/s]

  0%|          | 0/270 [00:00<?, ?it/s]

***** Running training epoch 1 *****
Epoch 001 | Step 027/027 | Loss 0.5934 | train_acc 0.7841 | train_auc 0.5029 |     Time 152.3739 | lr = 0.00034549150281252655 

val_accuracy:0.8019 	 val_auc:0.5505
val_loss: 0.5903  	 time costed=185.25558s 

-------------------------------
***** Running training epoch 2 *****
Epoch 002 | Step 027/027 | Loss 0.5329 | train_acc 0.8437 | train_auc 0.5208 |     Time 54.4941 | lr = 9.549150281252699e-05 

val_accuracy:0.8300 	 val_auc:0.5115
val_loss: 0.5477  	 time costed=64.94097s 

-------------------------------
***** Running training epoch 3 *****
Epoch 003 | Step 027/027 | Loss 0.5050 | train_acc 0.8656 | train_auc 0.4964 |     Time 49.6710 | lr = 0.0009045084971874806 

val_accuracy:0.8400 	 val_auc:0.6024
val_loss: 0.5248  	 time costed=60.15345s 

-------------------------------
***** Running training epoch 4 *****
Epoch 004 | Step 027/027 | Loss 0.4718 | train_acc 0.8773 | train_auc 0.5031 |     Time 49.8339 | lr = 0.0006545084971874633 

val_accuracy:0.8588 	 val_auc:0.5236
val_loss: 0.5017  	 time costed=60.28123s 

-------------------------------
***** Running training epoch 5 *****
Epoch 005 | Step 027/027 | Loss 0.4481 | train_acc 0.8790 | train_auc 0.5428 |     Time 49.7957 | lr = 0.0 

val_accuracy:0.8588 	 val_auc:0.4952
val_loss: 0.4818  	 time costed=60.31998s 

-------------------------------
***** Running training epoch 6 *****
Epoch 006 | Step 027/027 | Loss 0.4234 | train_acc 0.8848 | train_auc 0.5676 |     Time 50.4792 | lr = 0.0006545084971874866 

val_accuracy:0.8619 	 val_auc:0.5764
val_loss: 0.4550  	 time costed=61.06234s 

-------------------------------
***** Running training epoch 7 *****
Epoch 007 | Step 027/027 | Loss 0.4178 | train_acc 0.8796 | train_auc 0.5479 |     Time 49.9712 | lr = 0.0009045084971875055 

val_accuracy:0.8588 	 val_auc:0.5897
val_loss: 0.4512  	 time costed=60.4179s 

-------------------------------
***** Running training epoch 8 *****
Epoch 008 | Step 027/027 | Loss 0.4050 | train_acc 0.8789 | train_auc 0.5802 |     Time 49.6602 | lr = 9.549150281252627e-05 

val_accuracy:0.8538 	 val_auc:0.5747
val_loss: 0.4505  	 time costed=60.17702s 

-------------------------------
***** Running training epoch 9 *****
Epoch 009 | Step 027/027 | Loss 0.4062 | train_acc 0.8783 | train_auc 0.5692 |     Time 49.7288 | lr = 0.00034549150281254536 

val_accuracy:0.8588 	 val_auc:0.6225
val_loss: 0.4247  	 time costed=60.10822s 

-------------------------------
***** Running training epoch 10 *****
Epoch 010 | Step 027/027 | Loss 0.3873 | train_acc 0.8807 | train_auc 0.5912 |     Time 49.3028 | lr = 0.000999999999999998 

val_accuracy:0.8588 	 val_auc:0.5934
val_loss: 0.4345  	 time costed=59.68535s 

-------------------------------
#%load_ext tensorboard
%tensorboard --logdir runs/effnet2

3.4 convnext

效果没调好,就不写训练结果了

train_loader,valid_loader,train_all_loader,test_loader=get_dataloader(bs=64,transforms=trfs_sharp)
# 加载convnext预训练模型,
model = torchvision.models.convnext.convnext_small(pretrained=True)
model.classifier[2]=torch.nn.Linear(768,2)

device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = model.to(device) #使用GPU

# 模型优化器
optimizer = optim.SGD(model.parameters(), lr=0.0005)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer,T_max=5)
# 模型损失函数
criterion= nn.CrossEntropyLoss()
train_and_eval(train_loader,valid_loader,epoch=10,scheduler=scheduler,save_name='convnext') # 21min

3.5 VIT

VIT使用了绝对位置编码,所以迁移学习的时候:

  • 图片入网尺寸需要和预训练模型的入网尺寸一致(比如同样使用ImageNet的入网尺寸224)
  • 或者是从头开始训练,不使用预训练权重。

一开始没有注意到入网尺寸的问题,训练老是报错AssertionError: Wrong image height!

train_loader,valid_loader,train_all_loader,test_loader=get_dataloader(bs=64,transforms=trfs_sharp)

model=torchvision.models.vit_b_16() # 不使用预训练模型,直接从头开始训练
model.heads.head=torch.nn.Linear(768,2)
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = model.to(device) #使用GPU

lr,weight_decay=0.001,0.0003 # 学习率为0.001时验证集acc不变
# 模型优化器
optimizer = optim.SGD(model.parameters(), lr=lr)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer,T_max=5)
# 模型损失函数
criterion= nn.CrossEntropyLoss()
train_and_eval(train_loader,valid_loader,epoch=10,scheduler=scheduler,save_name='VITB')

四、提分思路

  • 数据扩增:randcrop、randnoise、flip

  • 更加强大的模型:输入的图片的尺寸、模型的最终的精度

  • 利用无标签的数据集:伪标签的进行打标

    • 筛选出置信度比较高的样本 3000
    • 3000 + 训练集 再次训练
  • timm和torchvison 权重是不同,使用的超参数需要调整。

  • 如果使用全量数据集进行训练,更加强化模型-》更长的训练时间,AUC 0.8 +

  • 如果使用预训练模型 vs 从头训练,精度还是有差异,前者更好

你可能感兴趣的:(CV,赛事)